Mining New Discoveries with ‘Digital Humanities’

A&S applies the science of data mining to uncover surprising truths from digital archives of literature and letters.

Jennifer Ferriss-Hill, assistant professor of classics in the College of Arts & Sciences, is a philologist, meaning she pores over ancient Latin texts to trace the evolution of language and literature in Western civilization. One of her favorite research subjects is the Roman poet Horace, a close friend and student of Virgil. Ferriss-Hill’s theory is that many of the “low-frequency” words found in Horace’s poetry may have been “borrowed” from his literary influences, including Virgil. 

The only way to prove that theory is to identify each low-frequency word from Horace’s 100 Odes and search for matches in Virgil’s expansive oeuvre. It’s the research project of a lifetime. Literally.

Good thing Ferris-Hill has a powerful research partner. It’s a custom-coded computer program that scans digital copies of Horace’s poems, identifies parts of speech and assembles a candidate list of low-frequency words. The software, designed by Professor and Associate Dean for Library Innovation Mitsunori Ogihara, allows Ferriss-Hill to sift through these texts not in years or months, but in minutes.

“The traditional method of humanities research is to collect books, papers, reports, photos, and physical objects that represent historical events or human ideas,” says Ogihara. “If you’re analyzing physical books, maybe you can compare ten at a time. But with digitized text, you can compare hundreds, even thousands of books. With digital source material, the scale of humanities research grows exponentially.”


A well-known scholar in computation theory and in data mining, Ogihara has collaborated on projects to build computers from DNA molecules; mine biological data for patterns of gene expression; monitor network traffic volumes for anomalies; and create a music recommendation engine. Now he has turned his imaginative, problem-solving skills on the humanities. “It’s a joy,” he says, “to tackle a new research area beyond the realm of computer science.”    

Ogihara, who was named to his new post in 2012 to more closely align the digital initiatives of A&S and the University Libraries, says that both "Dean of Arts and Sciences Leonidas Bachas and William Walker, Dean Emeritus and former University Librarian, share a vision that the collaboration between the humanists and the libraries will be strengthened by the use of digital technologies."  Projects like Ferriss-Hill's word search prove, he says, "that once enough source material has been digitized, researchers are able to do large-scale analysis to test their hypotheses or find something completely new."

Ogihara spends the bulk of his time collaborating with A&S scholars to prove the thrilling potential of digital humanities research and advocate for widespread digitization efforts worldwide. Two of his ongoing projects are supported in part by grants from the Andrew W. Mellon Foundation.

For one project, Ogihara is studying more than 6,000 letters written by the 19th-century Scottish historian and philosopher Thomas Carlyle to determine the “physics of communication” in that era.

Again, using word-search software, Ogihara is measuring the “emotional temperature” of letters between Carlyle and his wife, and comparing them to letters between Carlyle and his close friend — and suspected mistress — Harriet, Lady Ashburton. If the data skews “hotter” for Lady Ashburton, it adds objective proof to the mistress theory and deepens scholarly understanding of how written letters and emotions intertwine.

In addition to the written word, Professor Ogihara is working on applying the same concept to the spoken word. With the help of a computational linguist and some students, he is also developing a software tool that can accurately transcribe audio from digitized video clips. The clips are part of the Cuban Heritage Collection’s Luis J. Botifoll Oral History Project, which captures the life stories of Cubans in the diaspora through video-recorded interviews.

Right now, a researcher would have to watch hundreds of hours of videos to find relevant material. With a computer-generated transcription — fact-checked by a Spanish-language specialist — the same researcher could search the text for the best matches within a matter of minutes.

Ogihara, the UM Libraries, and the College of Arts & Sciences are pushing the development of the Digital Humanities in exciting directions, opening new doors of discovery for the next generation of UM scholars and students.

Browse the UM Libraries Digital Collections, including 16th century maps of Florida and interactive explorations of slavery and emancipation in the Caribbean.   

August 15, 2013