n-grams is a analytical technique used in natural language processing and computational linguistics. Given a large amount of text (often referred to as text corpus), this technique helps identify commonly occurring patterns of words or phrases. We use it in Insight Magnet to identify top level trends in captured feedback and provide the ability for an analyst to drill up or drill down.

n-grams is a group of ā€œnā€ words connected and related by proximity of their appearance in the original text. n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram" -- not to be confused with diagram); size 3 is a "trigram" and so on. Insight Magnet will discover the n-grams and most commonly report them ranked in descending order by frequency.


How do analysts use n-grams?

Here is a question from a manager assessment survey. Users were asked to share what they appreciate about their managers. A simple cloud view with stop words filtered looks like this. Users are highlighting the good. N-grams can shed light on the association of the highlighted keyword with other words which may or may not be obvious from the cloud view.


Looking at the same data as a trigram (N-gram of three words), we can see the most commonly occuring patterns that users used in their responses. The theme word "good" appears in the top three occurences, but the associations discovered using N-grams are not as obvious if you were to just rely on the cloud view.


Analysts have the ability to define the width of the n-gram by choosing the number of words. And this provides a great tool for exploration and discovery for large amounts of text corpus in a very short time.