Computational Tools for Research on De Bello Gallico

This project provides computational tools developed to support linguistic and philological research on the classical Latin text De Bello Gallico by Julius Caesar. These tools are designed to facilitate various types of textual analysis, enabling a detailed and automated exploration of the corpus.

1. Corpus Analysis (Thematic Glossaries)

This tool allows users to search for specific terms in the text of De Bello Gallico, generating thematic glossaries based on the classical Latin dictionary by Lewis & Short. Currently configured to locate military terms, it can easily be adapted for other thematic sets. The result is provided in JSON format.

Access: github.com/LeoVichi/caesar_lexikon

2. N-gram Analysis

Using the Stanza library from Stanford University, this script generates detailed analyses of bigrams and trigrams (semantic relations between consecutive terms) and a lexical classification of words (PoS – Part of Speech). The outputs are CSV files, all containing frequency indices of term occurrences in the text.

Access: github.com/LeoVichi/caesar_corpus

3. Lemmatizer with Frequency Analysis and Visualization

This script automatically lemmatizes the Latin text, transforming words into their dictionary forms (e.g., “partem” into “pars”, “legiones” into “legio”). It then generates a quantitative analysis of the most frequent terms in the text. The code is modular and allows the creation of visualizations such as word clouds and bar charts tailored to lexical categories such as verbs, nouns, adjectives, and adverbs.

Access: github.com/LeoVichi/caesar_freq

4. Automatic Lexicographer

This tool collects and performs lexicographic analysis of terms found with high frequency (user-defined, currently set to a minimum of five occurrences). Terms are lemmatized, lexically classified, and their definitions are extracted from the Lewis & Short dictionary. For nouns and adjectives, the respective declensions are also indicated.

Currently, there are some limitations being addressed, such as terms not automatically found due to the complexity of definitions in the original dictionary. To partially overcome this, undefined terms are set aside for later manual analysis. An integrated version with an AI API is under development to improve and simplify extracted definitions.

Access: github.com/LeoVichi/caesar_dicionario

These tools are freely available and aim to support and promote academic research in classical studies, historical linguistics, and Latin philology.