Keyword clouds

Keyword clouds is a trend made popular by the Web 2.0 development several years ago. From a text corpus, frequency of occurence of keywords is calculated. The keywords are then styled based on the frequency and displayed graphically. The emphasis is on more frequently occuring terms and keywords.

Insight Magnet can generate clouds in a variety of cases. You may choose to Inflecn-grams is a group of ā€œnā€ words connected and related by proximity of their appearance in the original text. n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram" -- not to be confused with diagram); size 3 is a "trigram" and so on. Insight Magnet will discover the n-grams and most commonly report them ranked in descending order by frequency.


Here you are looking at responses to a open-text question in cloud view. Instead of having to sift through all the verbatims and come up with a subjective or objective summary of the top themes, this view deterministrically provides you a summary of the themes being discussed.

Keyword clouds can be shown for any languages or as is most commonly used, the responses in non-English languages are translated back to English and viewed as keywords.


Typical responses have a lot of common words that are not relevant in theme detection and trend analysis. In natural language processing, these are known as stop words. The previous view of the clouds was with stop words filtered out so that we can focus on the top level themes and trends. You can also see the clouds without the stop word filtering and as your would expect the dominant words are stop words.


When analyzing keywords, the initial processing only involves converting the case to lowercase. The alogrithm does not distinguish between singular and plurals, tenses of verbs or other inflections of the word. Some times it is useful to be able to combine singulars and plurals, various tenses ot the same verb to be able to condense the themes into smaller set of categories.

This can be accomplished using Stemming. In linguistic morphology, stemming is the process of reducing the word to its stem or a root form. The root forms are easy to read but it does take some practice. So in the stemmed word cloud "order", "ordered" and "ordering" is all reduced to "order". But there are complex linguistic algorithms at work which reduce "shipping", "shipped", "ship" to "ship", but does not combine "shipment" with "ship".

Clouds are also possible based on categorization and sentiment extraction. Categorization is a sophisticated process of going beyond the grammar and using natural language processing and machine learning alogrithms to categorize feedback based on user defined categories that usually map to business attributes.