====== Perseus Annis Adventures ====== [[http://annis.perseus.tufts.edu|Perseus Annis]] environment enables syntactical searches in annotated Greek and Latin texts. However, the query syntax is neither simple nor self-evident. For the patient (or impatient?), the Annis query language syntax --- applied to another set of corpora --- is here: [[http://www.sfb632.uni-potsdam.de/annis/aql.html|ANNIS2 --- Search and Visualization in Multilevel Linguistic Corpora]]. Others can read on here to try out my recipes for finding things in Perseus Annis' Greek and Latin corpora. Syntactic annotation documentation for Latin is here: [[http://nlp.perseus.tufts.edu/syntax/treebank/ldt/1.5/docs/guidelines.pdf|Guidelines for the Syntactic Annotation of Latin Treebanks]] ===== Nouns in nominative ===== Here is how I searched for **nouns in nominative**. case="nominative" & POS="noun" & #1 _=_ #2 There turn out to be 212 annotated Cicero's nominatives. It seems that the clause ''& #1 _=_ #2'' is obligatory (tried first without it, to no avail), and it seems to mean that the first and the second condition both apply to the same word. There are 105 participles in accusative: case="accusative" & POS="participle" & #1 _=_ #2 - How would you search for verbs? - How would you search for participles in genitive? ===== Adverbs modifying verbs ===== Find all verbs modified by adverbs. POS="verb" & POS="adverb" & #1 ->parent #2 In a syntactic tree, the verb (element #1) is "parent" of the adverb (element #2). A nice variation: POS="adjective" & POS="adverb" & #1 ->parent #2 Is there a noun governing an adverb? ===== Find subject and predicate (verb) ===== Find any word (''form'') which, in the sentence's tree, is subject of the verb: form & POS="verb" & #2 ->parent[relation="SBJ"] #1 //Discussion//: first condition --- find any form, find any verb. ''&'' connects. Select element number 2 if it is the parent in node (e. g. if it is connected with) element number 1, on the condition that their relationship is "subject" ''SBJ''. The expression finds elements regardless of how many other words are between them (but in the same sentence). Caesar has 99 such cases, Plato 252. On corpora larger than 10,000 tokens I get a timeout. Find any nominative which is subject of the verb (not rocket-science syntactic experiment, I know): case="nominative" & POS="verb" & #2 ->parent[relation="SBJ"] #1 Plato has 178 results (on 6097 tokens in the corpus). Find any participle which is subject of the verb --- you get the idea: POS="participle" & POS="verb" & #2 ->parent[relation="SBJ"] #1 Well, the Plato corpus contains 13 such cases. And quite thorny, at that --- have to figure out how to deal with predicative expressions. ==== Variations ==== 1. Find subjects in nominative POS="noun" & case="nominative" & POS="verb" & #1 _=_ #2 & #3 ->parent[relation="SBJ"] #2 1.a Find subjects in nominative, predicates in indicative POS="noun" & case="nominative" & POS="verb" & mood="indicative" & #1 _=_ #2 & #3 = #4 & #3 ->parent[relation="SBJ"] #2 2. Find SPO structures with direct object in accusative, predicate in indicative, subject in nominative POS="noun" & case="nominative" & POS="verb" & mood="indicative" & POS="noun" & case="accusative" & #1 _=_ #2 & #3 _=_ #4 & #5 _=_ #6 & #3 ->parent[relation="SBJ"] #2 & #3 ->parent[relation="OBJ"] #5 **Tip of the day** --- check out the "Arch Dependency" tab beneath each result, they're great and useful. ===== Find predicate nominals (subject complements) ===== The following Annis query: form & LEMMA="sum" & #2 ->parent[relation="PNOM"] #1 finds sentences of type "Sapientes //beati sunt//". ===== Relative clauses as subjects ===== The query: form & form & form & #1 ->parent[relation="SBJ"] #2 & #2 ->parent[relation="SBJ"] #3 finds sentences such as //fuere qui crederent//. We can make the confusing point (verb as SBJ) even more prominent: POS="verb" & POS="verb" & form & #1 ->parent[relation="SBJ"] #2 & #2 ->parent[relation="SBJ"] #3 ===== Searching for Greek words ===== Greek has to be entered in Unicode, with accents. This query for the ''form'' δικάζου won't produce any results on the Plato corpus: form="δικαζου" Betacode doesn't work either: form="dika/zou" This search, however, finds one occurrence: form="δικάζου" Search for all forms of δικάζω (two in the Plato corpus): LEMMA="δικάζω" Find only the participles of δικάζω (there is exactly one --- I proudly use what I already learned): LEMMA="δικάζω" & POS="participle" & #1 _=_ #2 This one (with the operator ''='' instead of ''_=_'') produces the same result in this context. Should [[http://www.sfb632.uni-potsdam.de/annis/aql.html|read up on Annis operators]]. LEMMA="δικάζω" & POS="participle" & #1 = #2 ===== Finding specific formation ===== We want to find phrases of type φίλοι γάρ εἰσιν. case="nominative" & LEMMA="γάρ" & LEMMA="εἰμί" & #1 . #2 & #2 . #3 This search finds 8 results in the Aeschylus corpus. It seems that an Annis query must be written in pairs (''#1 . #2 & #2 . #3'') -- the version ''#1 . #2 . #3'' is not valid. Find phrases like the one above, but with nominative as the subject: case="nominative" & LEMMA="γάρ" & LEMMA="εἰμί" & #1 . #2 & #3 ->parent[relation="SBJ"] #1 Plato corpus -- 1 result, in others I get a timeout. Find attributes and governing nouns (or whatever): form & form & #2 ->parent[relation="ATR"] #1 786 results in Plato corpus, including phrases such as ἐν Λυκείῳ. The other way around: form & form & #1 ->parent[relation="ATR"] #2 Produces 786 results as well, but in different order (ἐμὸς πατήρ comes first now). Find all attributive phrases with πατήρ: form & LEMMA="πατήρ" & #2 ->parent[relation="ATR"] #1 20 results in Plato. ἐμὸς πατήρ is one, ὁ (ἐμὸς) πατήρ another. (Should be able to get multiple attributes?) ===== arity! ===== While most of the categories offered by Perseus Annis seem familiar from classroom, the strangely named "arity" operator is something else. It is a "meta-operator" which, given the "arity number", selects only search terms that govern exactly so many other words and sentence elements. E. g. to find all verbs governing four other elements: POS="verb" & #1:arity=4 In the Cicero corpus there are ninety such situations. By studying the Arch Dependency diagrams, you'll discover that a comma can also be governed by the given element. Now, one of verbs governing four elements is "interficio". If we want to concentrate on forms of interficio governing four elements, we do it like this: LEMMA="interficio" & #1:arity=4 Can you decode what is found by this search? (If not, try pasting it into Annis search interface!) POS="noun" & #1:arity=4 ===== Magnis in periculis ===== Our colleague [[http://www.hrstud.unizg.hr/latinissime/rei.php|Šime Demo]] (Croatian Studies, University of Zagreb) thought of a beautiful search: POS="adjective" & POS="preposition" & POS="noun" & #3 ->parent #1 & #2 -> parent #3 & #1 .* #2 & #2 .* #3 This searches for prepositional phrases of the type "magnis in periculis", i. e. with preposition interposed (adjective -- preposition -- noun). On the currently available Latin corpus, the phrase sharply distinguishes prose (Caesar, Cicero, Sallust, Petronius) from poetry (Propertius, Vergil). Lots of it in poetry, rarely in prose. Šime, thanks! ===== Study coordination ===== A seemingly simple and self-explanatory syntactical relationship is coordination. However, for treebank notation it has to be learned a little differently. Take a simple Latin sentence made up of three clauses, connected asyndetically (just with commas):
Ego scribo, tu legis, ille pingit.
In treebank notation, here the root of the sentence is the (first) comma; on it are dependent the three predicates and the other comma (the full stop is on the same level as the root). Here is an Annis QL query that finds all kinds of coordination, finding root and its child connected through "COORD" relationship, or arc: form & form & #1 ->parent[relation="COORD"] #2 We can modify the query to find ("filter") just commas as roots: form="," & form & #1 ->parent[relation="COORD"] #2 A similar, but more complex case from the Caesar corpus annotated in Perseus is part of [[http://data.perseus.org/citations/urn:cts:latinLit:phi0448.phi001:2.33|Caes. Gal. 2.33]]:
ad Venetos, Venellos, Osismos, Coriosolitas, Esuvios, Aulercos, Redones, quae sunt maritimae civitates Oceanumque attingunt
Absolutely, Cicero seems to have more cases of coordinating comma than Caesar. But, since in the Perseus annotated corpus, Cicero's 6229 tokens yield 12 cases, while Caesar's 1488 yield 5, relative ratio is actually 0.19 percent for Cicero --- 0.33 for Caesar. Sallust, who has 27 cases on 12311 tokens, is between Cicero and Caesar, with 0.2 percent of his corpus. Jerome, with 8382 tokens, seems to have zero coordinating commas, which is slightly strange. ===== Using Annis corpus to test annotations ===== Problem: a difficult sentence has to be syntactically annotated.
Quicquid oritur, causam habeat a natura necesse est (C. div. 2, 60)
A proposed annotation is [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=643|here]]. But is it correct? To test it, we write an Annis query and see if there is anything similar in the annotated corpora: POS="verb" & POS="verb" & #1 ->parent[relation="SBJ"] #2 ("Find two verbs of which one governs the other, and their relationship is labeled as "SBJ".) This search finds, among other results, the well-known passage from Cicero's In Catilinam:
quod eam [sicam] necesse putas esse in consulis corpore defigere (Cic. Catil. 1, 16)
I guess this confirms my annotation. ====== Simple sentences for annotating exercises ====== The tool: [[http://repos1.alpheios.net/exist/rest/db/app/treebank-entertext.xhtml|Alpheios treebank editor]]. ===== 1. Sentences which already have been annotated ===== //Annotations can be found in Perseus Annis.// - rumores adferebantur - crebri ad eum rumores adferebantur - Remi primos civitatis miserunt - ... si suas copias Haedui in fines Bellovacorum introduxerint... - hi omnes nuntiaverunt... - qui moleste ferebant - qui finitimi Belgis erant - Germanos in Gallia versari noluerant - Caesar ad se adduci iussit - qui novis imperiis studebant - Q. Pedium legatum misit ipse - cum primum pabuli copia esse inciperet... - coniurandi has esse causas... - ... uti ea quae apud eos gerantur cognoscant - ... reliquos omnes Belgas in armis esse - ... omnem senatum ad se convenire - ... exercitum in unum locum conduci ===== 2. Sentences without previous annotation ===== //Some exercises done by NJ.// - mane surgo (cf. [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=552|annotation]]) --- Annis query: POS="verb" & POS="adverb" & #1 ->parent[relation="ADV"] #2 Or, even more precisely: POS="verb" & POS="adverb" & #1 ->parent[relation="ADV"] #2 & #2 . #1 - sol ortus est (cf. [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=551|annotation]]) - surrexit de lecto (cf. [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=553|annotation]]) - vigilavit heri diu (cf. [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=554|annotation]]) - vesti me (cf. [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=555|annotation]]) - da mihi calciamenta et udones et bracas (cf. [[http://repos1.alpheios.net/exist/rest/db/app/treebank-editsentence.xhtml?doc=sentences-aldt-la&s=556|annotation]]) - iam calciatus sum - adfer aquam manibus - manus sordidae sunt - iam lavi meas manus et faciem - adhuc non tersi - procedo foris de cubiculo - vado in scholam //Additional exercise: using Annis notation, try to find similar sentences in the Perseus annotated corpus.// ---- - satis declarauit Dionysius - Despotus VI. mille equites gratia praesidii Smederovo reliquerat - rex qui maximas copias duxit ad Troiam - magister cum omnibus classiariis ad oppidum tendit - Nec uos quidem, iudices timueritis - animus grauioribus curis sedulo coquitur - Stephanus Malipetrus et Victor Soprantius ad imperatorem se mature conferunt - Copias uero quas adduxi tuum imperium sequentur (!) - Milites ac turba omnis uenationi incumbit - adolescens quidam, Dalmata natione et lingua, urso mirae magnitudinis occurrit - Cadit itaque ubique magnus numerus ferarum - imperator in Clazomeniorum agro copias exposuit - Myra fuit ciuitas Lyciae Sentences from the "[[http://www.menge.net/ueloeframe.html|new Menge]]" (taken from classical authors): - Argumenta plus quam testes valent . - Effluit voluptas corporis et prima quaeque avolat . - Consul ego nuper defendi C. Pisonem ; qui , quia consul fortis fuerat , incolumis est rei publicae conservatus . - Furti damnatus est . - Huic legioni maxime confidebat . - Illi pictores non sunt usi plus quam quattuor coloribus . - Si Fabio laudi datum esset , quod pingeret , etiam apud Romanos multi Polycleti et Parrhasii fuissent . - Illi dimicare non ausi turpiter se in castra receperunt . - Cum quaepiam cohors impetum fecerat , hostes velocissime refugiebant . - Alacris exsultat improbitas in victoria . - Ordiamur ab eo , quod primum posui . - Invidetur commodis hominum . - Cn. Pompeius est omnium gentium , omnium saceulorum facile princeps . - Est apud Platonem Socrates , cum esset in custodia publica , dicens Critoni sibi post tertium diem esse moriendum. - De insidiis celare te noluit . - Omnes immemorem beneficii oderunt . - Multi alacres exspectant . - Milites amplius horis quattuor fortissime pugnabant . ====== Sentences from Pinkster ====== Source: Pinkster, Harm (1942-) [1990], [[http://perseus.uchicago.edu/cgi-bin/philologic/getobject.pl?c.19:2:0.NewPerseusMonographs|Latin Syntax and Semantics]], xii, 320 p. - pater filium laudat . - ovum ovo simile est . - Alexander erat rex Macedonum . - pater ambulat . - pater hostibus timorem iniecit . - interea ea legione quam secum habeat murum fossamque perducit . - num stulte anteposuit exilii libertatem domesticae servituti ? - Narbonensis provincia amplitudine opum nulli provinciarum postferenda breviter que Italia verius quam provincia . ====== Sentences from Caesar ====== Found with Annis QL: subject is a noun in nominative, predicate is verb in indicative, has direct object in accusative. Examples are shortened here (most of the sentence is omitted). - Equites proelium commiserunt . - Sectionem eius oppidi universa Caesar vendidit . - Caesar VI legiones ducebat . - Eorum fines Nervii attingebant . - Copias Haedui introduxerint . And with any word in nominative: - qui facultates habebant . - neutri initium faciunt . - locum nostri castris delegerant . - Illi eruptionem fecerunt .