Data and code

The data visualised in this web interface is based on digitised newsreel videos and their metadata from the Film Archive of the National Archives of Estonia (RA) and the Estonian Film Database (EFDB). The contents of these datasets overlap a great deal, but they also have some differences. The RA digitised the newsreels into digitally accessible videos and turned the archival descriptions into newsreel metadata. After that EFDB accessed the newsreels from RA’s collection, and enriched it with more detailed descriptions of contents, film production, technical issues, and keywords related to picture and text.

Another difference between these datasets is that while RA lists whole approximately ten minute long newsreel issues as entities, EFDB has divided them into separate news stories of mean length of 2,23 minutes. The dataset contains a total of 9,902 individual news stories. This means that we have approximately 24 000 minutes of newsreel data.

For the purposes of visualisations of newsreel data in this web interface we have used a selection of EFDB metadata contents with the focus on production year, company, film-makers, journal titles, issue numbers, and textual descriptions of the newsreel contents together with extracted frames of newsreels. We have not used all the metadata provided by the EFDB, but selected to focus on the data that allows us to study how newsreels depicted the surrounding world.

To allow different visualisations of the Estonian newsreels, we have enriched the underlying metadata by conducting Named Entity Recognition (NER) of textual descriptions of the newsreel contents manually annotated by the employees of the Estonian National Archive and the Estonian Film Database. With the help of NER we were able to extract names of persons, organisations and locations as separate entities. This allowed us to further add geo locations of the places mentioned, and to place them into a map. We have further enriched the original data by adding results of automated gender detection and manual enrichment by adding places of birth of the directors, cinematographers and other professionals to the data. This allows analysing the gender balance and origin of the professionals producing Estonian newsreels. We have processed the original newsreel videos by automatic shot detection, off-the-shelf detection of objects in the newsreel frames.

The metadata contains information on over 560 film-makers. Just like any human-curated data, also this dataset contains inconsistencies and errors. However, even with errors and holes of information the dataset tells an interesting story of Estonian newsreels over time.

Data Description

Introduction to the code

You can find the code used for the enrichment process here:

Github

Introduction to the methods

This web interface uses a variety of data visualisation methods for illustrating temporal change of Estonian newsreels. We use different graphs and bar charts, social network analysis, map visualisations, and interactive graphs that use machine learning and allow the user to group and filter frames based on visual similarities. We have divided the analysis into four categories: Film-Maker Networks, Themes, Places, and Images. You can find more detailed descriptions of the used methods and references to the underlying research under each analysis section.

Introduction to DataDOI

We will share the enriched data through the DataDOI repository at the end of the project. DataDOI is an open access repository that preserves and disseminates Estonian research data. It works under FAIR principles of FAIR data, making it Findable, Accessible, Interoperable and Reusable. DataDOI provides all the data stored there with a Digital Object Identifier (DOI).

DataDOI