Word Cloud methodology

Word cloud with the word “lapsed” (“children”)

What words are used to describe the contents of the Estonian newsreels and how do they change over time? What keyword tags were used in describing footage containing children, sports, or political events? What kinds of words describe the newsreel contents in the 1930s and in the 1970s? Where do you see the most radical changes in what comes to the keywords?

The changing newsreel themes can be visualised with the help of a word cloud. These visualisations are based on the tags attached to descriptions of the newsreels in metadata. The bigger the word is in the word cloud, the more it has been used to describe the newsreel contents.

Someone interested in the different ways children have been portaryed in Estonian newsreels finds out in a word cloud that the footage on children have sometimes been tagged with words related to education, such as school uniform, classroom, or a pioneer camp, but there are also tags related to social issues, such as orphanage. Issues related to children have been discussed both in the 1930s and the 1960s.

Word clouds are a simple, but powerful way to visualise texts in a way that summarises the essential contents. The more times a word appears in a text, the bigger it is in a word cloud. For doing these word clouds we used the tags attached to the metadata descriptions of the newsreels. For creating these visualisations, we take each keyword of every film to the search results, calculate their frequency, and sort the list with thr most frequent keywords. We take 250 most frequent words and draw them. Simultaneously, we scale the frequency with the biggest frequency and smallest frequency, and calculate the corresponding font size between 8 to 80px while keeping the proportions of frequencies in the search list. The angle of the words is calculated randomly between -90 to 90 degrees. We use the tags as they are, without lemmatizing them into the base form of the word. This means that you may see different forms of the same word in the same word cloud. For example a query with the word “party” or “festivities” (“pidu”) reveals a word cloud including tags “mikrofon” and “mikrofonid”.