Digitization of Croatian Latin writers: a research project proposal

Neven Jovanović
(neven.jovanovic_AT_ffzg.hr)

Mar 21, 2006



1  

Digitization of Croatian Latin writers here means transforming their texts into a machine-readable format, accessible over the internet - creating a database, corpus, collection of research material and tools named Croatiae auctores Latini (CAuLa) . CAuLa research space and digital collection aims to provide one place to find and use much of otherwise scattered textual and other primary and secondary material by and about Croatian writers who used Latin as medium of literature. The CAuLa corpus intends to simplify and enhance access to materials which are often rare, precious, and fragile (manuscripts, early printed books). At the same time, by providing digital versions of the documents, the corpus will help preserve these materials for future use and research. The workspace and forum eventually created by the CAuLa, globally accessible, will provide tools and medium for further insights into this segment of Croatian, and European, literature, culture, and intelectual history. The experience gained from building the CAuLA (and amply documented and shared) will remain available to other digitized heritage collections - both Croatian and Latin, textual and otherwise.

2  The state of research

  1. Croatian Latin heritage. Research on the writings of Marko Marulić shows what amazing discoveries can still be made today concerning all aspects of life and work of a Croatian author. The Lexicon of Croatian Writers (Leksikon hrvatskih pisaca, Školska knjiga, 2000) demonstrates both part and role Croatian writers in Latin play in national cultural heritage and literary production. Still, texts of these writers - even when reliably edited and adequately published - remain too dispersed, since publishing is often left to (much appreciated) private and local initiatives (cf. again the example of Marko Marulić, whose Opera omnia, now in its is 15th volume, is being published by the Splitski književni krug - legally, a citizen's organization based in Split, Croatia).
  2. Digitizing textual heritage. Digital collections broaden the scope of research; a computer-searchable and manipulable text may open new ways of reading and study. Supported by fast networks, such collections become available globally - everywhere and all of the time -, not only to research institutions, but to schools, public libraries, private homes. A global information infrastructure may enhance our own cultural identity - and help us learn about completely different identities. But literary texts from the past require approaches different from those now common on the internet: we have yet to explore how a stimulating and useful workspace for prolonged and intensive textual study should look (and feel) like.
  3. Digitizing cultural heritage in Latin. The leader here is the Perseus Project (http://www.perseus.tufts.edu/), "an evolving digital library, engineering interactions through time, space, and language", freely available, and comprising, among other material, classical Greek and Latin texts, both literary and non-literary (papyri and ostraka), as well as images (Greek vases) and secondary literature; the Perseus Project solved important problems with representing the ancient Greek writing system on the internet, with connecting texts and secondary literature, with presenting statistical data; it offers a stimulating workspace for research and teaching.
  4. Digitizing neo-Latin heritage. Let us quote a minimal and a maximal example. Italian corpus "Poeti d'Italia in lingua latina" (http://157.138.65.54:8080/poetiditalia/) offers textual searches on some 200 Italian authors (and some Croatian, too) from the Middle and Early Modern ages, with minimal bio-bibliographical information, with no possibilty to add own materials, comments or translations, and with somewhat unfriendly way of defining subgroups of the corpus for searching purposes. On the other hand, the German MATEO / CAMENA corpus (http://www.uni-mannheim.de/mateo/camenahtdocs/camena.html) seems to be more a repository than a database, but dynamical, growing steadily, improving the markup, adding metadata and images, even though its search interface is also somewhat unyieldy.

3  Aims of the CAuLa collection

Establish a frame for collecting research and material on Croatian neo-Latin writers; at the moment both research and editions are dispersed too widely. Publish on the internet a corpus of some 2 million Latin words, by 130 Croatian authors writing in Latin from the Middle Ages onwards. Prepare the corpus by digitizing some 15.000 pages of already published text (published in old, rare, or not easily accessible editions) and 1000 pages from manuscript. Enhance this material by additional proofreading, metadata, and markup (following the non-proprietary TEI XML standard), and by other types of media. Support and encourage new scholarship, both in-depth research (made possible by a computer-searchable corpus) and wide syntheses and interpretations (but relying on more precise, more explicit and controllable data, and therefore much more falsifiable). Remind people of material dimension of these texts. Open the collection to other kinds of research, to a community of researchers, both international and Croatian, from various disciplines (history, art history, anthropology, etc).



File translated from TEX by TTH, version 3.72.
On 21 Mar 2006, 13:03.