The project consisted of four case studies that integrated traditional interpretive analysis (close readings of texts) with two computational methods: dynamic temporal segmentation (topic modeling and segmentation) and tone analysis. We developed a dynamic temporal segmentation algorithm that automatically partitions the total time period defined by the documents in the collection such that segment boundaries indicate important periods of temporal evolution and re-organization. The goal of tone analysis was to use computational analysis to interpret some forms of textual meaning, thereby giving humanities and social science researchers the ability to automate the analysis of textual sentiment in large datasets.
Weekly Newspapers Case Study. The application of combined methods to review weekly newspapers illustrates both the richness of information about influenza contained in weekly newspapers and the complexity of analyzing textual materials using automated analysis. Our review of weekly newspapers indicates that they served a far more complex function than some scholars suggest and illustrates the value of newspaper reporting beyond the mortality data extracted by epidemiological studies. This case study demonstrates the complex information networks that functioned through newspapers and suggests the value of our methods for contemporary contexts, for example, studying the use of social media to track self-reporting on disease.
Daily Newspapers Case Study. The most important findings of this study involve the difference that scale makes in analyzing data mining outputs of news reporting. The outputs for all the papers together significantly differ from the outputs for the individual papers. It is clear that some papers influence the output of the aggregate more than others, either due to poor data quality or larger text chunks that are analyzed algorithmically. Some reporting that is evident at the local level rises to significance at the national level; however, other reporting does not appear in the aggregate, even though it may dominate reporting in certain localities.
Vaccination-Visualization Case Study. Visualizations, as ways of rendering data mining outputs, are rhetorical and require tacit knowledge. The process of research and interpretations of data outputs are affected by various forms of visualization as well as by the data input conventions underlying the algorithms. ThemeDelta is a visualization of topic modeling and segmentation that allows the researcher to follow a specific word or word cluster across temporal segments. While less oriented toward narrative interpretation than the typical rendering of topic modeling outputs in tag clouds, it focuses attention on the temporal elements of the segmentation algorithm, which is a novel contribution of this study.
Public Health Officials Case Study. By identifying proximity of medical terms, public health responses, and broad statements of impact, topic modeling and segmentation quickly identify word associations that are suggestive of broader interpretations. These methods confirm Royal S. Copeland’s prominent role in shaping the public health response to the 1918 influenza epidemic. They also reveal that his name was closely associated with public health measures, both within New York City and nationally. Tone classification calls into question historical claims that Copeland’s language was too optimistic for the circumstances.