I built an n-grams text analyzer app that shows the unique and common n-grams between two bodies of text.
It’s great for analyzing n-gram usage between two pages, which can provide useful SEO insights.
Simply copy/paste two texts and the target words you want to analyze, and the app will do the rest.
It provides a table showing the n-grams, and the total occurrences for text 1 and text 2.
It also provides a prompt that you can copy/paste into ChatGPT to get a nice Venn diagram. You’ll need ChatGPT Plus with advanced data analysis for this feature.
Try it here: N-Grams Text Analyzer App
What are N-grams?
An n-gram is a contiguous sequence of ‘n’ items from a given sample of text or speech. The ‘items’ can be phonemes, syllables, letters, words, or base pairs according to the application. For example, in the context of text, a 1-gram (or unigram) is a single word, a 2-gram (or bigram) is a pair of words, and so on. This concept is crucial for analyzing and understanding language structure, context, and meaning.
Importance of N-grams in SEO
N-grams help search engines understand the context of content on web pages. Instead of focusing on single keywords, n-grams allow search engines to consider the context in which words are used. This leads to more accurate search results and a better user experience.
Google Patents and N-grams
At the end of the day, Google’s search algorithms are just code that processes website code. Google has patented various methods involving n-grams. These patents suggest that Google may use n-gram models to analyze the content of web pages, understand the relationships between words, and determine the relevance of pages to specific search queries. By analyzing sequences of words (n-grams), Google can get a better understanding of the meaning of texts and the relevance of pages to certain topics or queries. Google has many patents involving n-grams and phrases.
Shift from Keywords to Phrases and Context
Google has evolved way beyond keyword matching. Traditionally, SEO heavily focused on targeting specific keywords. However, with advancements in natural language processing (NLP), including n-grams, there’s a shift towards understanding the context and semantic meaning of phrases. This aligns with Google’s continuous updates to its algorithms, aiming to understand searcher intent and content relevance beyond mere keyword matching.
Long-Tail Keywords and User Queries
N-grams are super useful for understanding and optimizing for long-tail keywords. These are longer queries that are less competitive but highly targeted. By analyzing n-grams, SEO strategies can be tailored to match the specific phrases and queries used by searchers, leading to more effective targeting.
Content Quality and Relevance
Google’s algorithms increasingly prioritize content quality and relevance. Understanding n-grams helps in creating content that naturally covers topics in-depth, using language that reflects how people actually talk about a given subject. This approach aligns with Google’s aim to surface high-quality, relevant content.
SEO use cases for the n-grams analyzer
1. Ranking analysis
Trying to figure out why one page ranks better than another? Throw an n-grams analysis to see if the ranking page has significantly better or different n-grams usage. This can provide insights that popular keyword-counting tools cannot.
2. N-grams optimization from higher-ranking pages
This app makes it easy to optimize your text for better n-grams usage. Take a chunk of text on one of your pages and compare it with a chunk of text on a page that outranks yours in search results. You can use the outputs of this app to show you exactly how well-optimized your text is in comparison to a higher-ranking page in terms of n-gram usage.
Because of the word filter feature, you can optimize your text’s n-grams for specific words and phrases that are found on an outranking page. You can compare entire pages or one single paragraph. Even a sentence.
3. Featured snippet optimization
You can copy/paste a featured snippet text and compare it to a similar text on your page that you want to rank for that featured snippet. It will show you if the current featured snippet text has better n-gram usage than yours.
4. Keyword cannibalization
Do you have two pages that consistently rank for the same terms? Having too many common n-grams on both pages can confuse search engines. Increase the distance between the two pages by reducing common n-grams or increasing unique n-grams. Check Google Search Console queries for each page to determine which n-grams should be used on each page. Pay extra attention to link anchor text, bold text, and capitalization usage.
How the n-grams analyzer app works
1. Input two texts
First, you provide two pieces of text. These could be any two paragraphs or sentences that you want to compare. Let’s call them “Text A” and “Text B.”
2. Provide filter words
You also need to provide a list of words that you want to focus on. This list is called “Filter Words.” The script will pay more attention to these words when comparing the texts. For SEO purposes, you can take a long-tail keyword and comma separate it. The app will include both capitalized and uncapitalized mentions of each comma-separated word. It will also include the majority of plural versions of a word and semantically similar words. For example, if you put “light” as one of the filter words, it will grab “lighting” too.
3. Text Processing
The script then does some behind-the-scenes work. It breaks down the text into smaller pieces, like words. It also removes common stop words like “and,” “the,” and “is” because they don’t tell us much on their own. This is currently handled by a manually entered text list of common english stop words.
4. Creating N-grams
Next, it looks at combinations of words in the text. It groups words into n-grams. For example, it might group three words and call it a 3-gram or trigram. The app currently only looks for bigrams, trigrams, and quadgrams.
5. Finding Similar Words
The script uses a special tool to understand the meaning of words in the text. It’s like teaching the script to know which words are similar in meaning. it uses word embeddings, vector space, cosine similarity, thresholds, weighted averages, and comparisons to find these similar words.
6. Comparison between the two texts
Now, it starts comparing Text A and Text B. It checks which n-grams are similar between the two texts, especially focusing on the words you mentioned in the “Filter Words.”
7. Counting occurrences of n-grams
It counts how many times these similar word groups appear in both texts. It’s like counting how many times certain phrases show up.
8. Creating a table for n-grams analysis
Finally, it organizes this information into a table. The table shows you the similar phrases, how many words are in each phrase, and how many times they appear in each text. This table helps you see the similarities and differences between the texts more clearly.
9. Outputting the table
The script then gives you this table as a result. It’s like handing you a chart that shows which phrases are common and how often they appear in your two texts. You can simply press the little download icon to save it as a CSV and open it up in Google Sheets. I then add a dif column and sort the n-grams by greatest differences between the two texts. This makes optimizing your text for better n-grams usage easier.
10. Venn diagram prompt
The app provides a bonus prompt used to create a Venn diagram using ChatGPT Plus advanced data analysis.
Example output of the n-grams analysis app
Venn Diagram n-gram comparison
Table output with occurrences
Build your own semantic SEO apps
This app is built on the Moonlit platform. It uses a custom Python function to process the n-grams.
If you’re interested in building similar apps, join my AI SEO academy. I’ll show you how to build this SEO app and all the other apps and tools I’ve built to date.