Unraveling the threads of Thrace

a text mining expedition in Pliny's Natural history

Authors

Abstract

This endeavor aims to create an innovative information extraction algorithm for Pliny’s “Natural History.” We used the state-of-the-art Python NLP library SpaCy and the Latin language models in LatinCy to develop a modern solution. The algorithm accepts a single lemma or a list of lemmas as input, producing a CSV dataset containing citations, context, and lemma variants. This facilitates efficient linguistic analysis of Pliny’s work, initially focusing on Moesia and Thrace. We curated datasets on ethnonyms, places, mountains, and waterways. Using Streamlit and Matplotlib, we improved user interaction and visualization, aiding researchers in exploring ancient Thrace in Pliny’s writings.

 

References:

Doody, A. (2010). Pliny’s Encyclopedia: The Reception of the Natural History. Cambridge: Cambridge University Press.

Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. – Computing in Science & Engineering, vol. 9/3, 90-95.

Kamboj, P., Aggarwal, M., Singla, S., Puri, S. (2011). Effect of Aqueous Extract of Tribulus Terrestris on Oxalate-Induced Oxidative Stress in Rats. – Indian Journal of Nephrology, No. 21/3, 154–159.

Pliny the Elder. Naturalis Historia. Karl Friedrich Theodor Mayhoff (ed.). Lipsiae: Teubner, 1906.

Online resources:

Burns, P. J. (2019). Tesserae Project, Classical Language Toolkit. https://github.com/cltk/latin_text_tesserae (accessed 07.06.2024).

Burns, P. J., Bernhardt, N., Geelhaar, T., Koch, V. spaCy. la_core_web_lg, version 3.7.2. https://huggingface.co/latincy/la_core_web_lg (accessed 06.06.2024).

Plotly Technologies Inc. (2015). Collaborative data science Publisher. Montréal, QC. https://plotly.com/python/plotly-express/ (accessed 08.06.2024).

Streamlit. The Fastest Way to Build Custom ML Tools. https://streamlit.io/ (accessed 07.06.2024).

 

Author Biography

  • Kristiyan S. Simeonov, Sofia University "St. Kliment Ohridski"

    Kristiyan S. Simeonov is an appointed Researcher R1 at the Department of Classics, Sofia University, holding a bachelor’s degree in Classics and a double MA degree in Digital Humanities and Cybersecurity Management. He is a collaborator for CLaDA-BG, SUMMIT, and DiGi Thrace project.

Downloads

Published

2025-09-18