====== Levenshtein distance as a metaphor in evaluating translations ====== //Hypothesis//: as in computing [[http://en.wikipedia.org/wiki/Levenshtein_distance|Levenshtein distance]], we count the number of operations necessary to transform a sentence in one language into a sentence in another. Our units are not characters, but words. ===== Categories ===== Levenshtein's algorithm counts additions, deletions, substitutions; the Damerau–Levenshtein variant allows further to count transpositions. Analysing a translation, we would want to evaluate operations on following levels: - lexical: **words** added, deleted, substituted, transposed - morphosyntactic: **parts of speech** substituted (usually), or transposed (cannot imagine adding a part of speech without adding a word too) - semantic: **meanings** added, deleted, substituted (here my brain starts to boil imagining meanings //transposed//) ===== Example ===== ==== A literal translation ==== The text is Hom. Il. 2, 305--306: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. Here is a literal translation into Latin, given by Rajmund Kunić (Raymundus Cunichius), c. 1776: Tolerate, amici, et manete aliquandiu; ut sciamus An verum Calchas vaticinetur, an et non. We count all words where both morphology and meaning are carried over as 0. Additionally, the meaning must be "literal", i. e. the first equivalent that comes to mind; e. g. καὶ "et", not "-que". Operations are: addition, deletion, substitution, and transposition. Under these conditions, the score for Kunić's literal translation is: 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 The 1 and 3 in the fifth and sixth place of the first line result from rendering ἐπὶ χρόνον as //aliquandiu//: 1 (deletion of ἐπὶ) + 1 (morphological substitution of an adverb for a noun) + 1 (semantic addition of "a CERTAIN time" for "a time"): ἐπὶ → ... (lexical deletion) χρόνον → aliquandiu (lexical substitution) noun → adverb (morphological substitution) FOR A TIME → for SOME time (semantic addition) An alignment of the passages quoted (done with Alpheios) can be seen and explored here: [[http://www.ffzg.unizg.hr/klafil/croala/xpr/homerus-latine-alignment.xhtml|X]]. ==== Cicero's translation ==== In his preface on translation, Kunić quotes and analyzes Cicero's translation of the passage from Homer. We'll compare Cicero (div. 2, 30, 63) and the original. This Alpheios alignment is here: [[http://www.ffzg.unizg.hr/klafil/croala/xpr/homerus-cicero-alignment.xhtml|X]]. Actually, we should take into consideration metrical organization of verses (Cicero has one verse more than Homer), but, for the time being, we'll disregard this, counting only transpositions. τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. Ferte, viri, et duros animo tolerate labores, auguris ut nostri Calchantis fata queamus scire ratosne habeant an vanos pectoris orsus. First three words are transformed in a relatively simple way. τλῆτε is morphologically and positionally equivalent to //Ferte//, but there is a semantic move away from the basic meaning, so we count this as 1. The same goes for φίλοι ~ viri. καὶ is literally transferred as //et//. So the score is: 1 1 0 Now, what is to be seen as equivalent for μείνατ'? Morphologically, it is "tolerate", which gets 1 for transposition (word-order), 1 for semantic change (or should this be two, as "to endure" is only an implied synonym for "to wait"?). ἐπὶ χρόνον is then replaced by //duros animo... labores//: 1 1 for deletions, 1 2 for morphological substitutions, 1 2 for syntactical substitutions, 1 2 for semantical changes (this is still only a sketch -- the semantic moves should be better analyzed). Now we have: 1 1 0 2 4 7 ὄφρα is translated by //ut//, but it is transposed, so 1. δαῶμεν is translated by //queamus / scire//: 2 transpositions, 1 addition, 1 morphological addition, 1 semantic addition (to BE ABLE to learn). In all: 1 1 0 2 4 7 1 5 In the next line, everything gets 1 for transposition. ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. auguris ut nostri Calchantis fata queamus scire ratosne habeant an vanos pectoris orsus. ἢ is additionaly substituted by //-ne// (1), an enclytic, but with no semantic change. ἐτεὸν is //ratos... pectoris orsus//: 1 morphological change (plural), 2 additions, at least 2 semantic changes. Κάλχας is substituted morphologically by genitive case (1), has 2 additions, both morphological and semantical (//auguris... nostri//), so we'll count them as 4. So far: 2 6 6 μαντεύεται corresponds to //fata... habeant//: 2 transpositions, 2 morphological substitutions (a noun, a verb in 3 pl), 2 semantic substitutions (from HE AUGURS to HIS PROPHECIES HAVE). ἦε has 1 for transposition. καὶ has 1 for omission. οὐκί gets 1 for transposition, 1 for morphological substitution (negation into adjective), at least 2 for semantic substitution (NOT into EMPTY i. e. FALSE). The score for both lines together: 1 1 0 2 4 7 1 5 2 6 6 6 1 1 4 ===== Greek and English ===== Let's try and compare Homer with Robert Fitzgerald's translation. τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. Hold on hard, dear friends! Come, sweat it out, until at least we learn if Kalkhas made true prophecy or not. Analysis: τλῆτε → Hold on hard (1 morphological + 1 semantic addition) φίλοι → dear friends (1 morphological + 1 semantic addition) καὶ → ... (1 deletion) μείνατ' ἐπὶ χρόνον → sweat it out (1 semantic substitution, 1 omission + 1 addition + 1 semantic substitution (χρόνον - it), 1 omission + 1 addition ὄφρα → until at least (2 morphological, 1 semantic addition) δαῶμεν → we learn 1 semantic addition ἢ → if 0 ἐτεὸν → true 1 transposition Κάλχας → Kalkhas 1 transposition μαντεύεται → made... prophecy 1 transposition 1 substitution 1 addition (morphological) ἦε → or 0 καὶ → ... 1 deletion οὐκί → not 0 Richmond Lattimore: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. No, but be patient, friends, and stay yet a little longer until we know whether Kalchas' prophecy is true or is not true. Analysis: τλῆτε → No, but be patient lexical: 2 added (we don't count necessary changes) morphosyntactic: 2 added semantic: 2 added ("No,but") φίλοι → friends: 0 καὶ → and: 0 μείνατ' → stay: 0 ἐπὶ χρόνον → yet a little longer lexical: 2 added morphosyntactic: 3 added semantic: 2 added (YET, LITTLE) ὄφρα → until: 0 δαῶμεν → we know - 1 semantic addition ἢ → whether: 0 ἐτεὸν → true 1 transposition Κάλχας → Kalchas' lexical: 1 transposition morphosyntactic: 1 substitution (genitive for nominative) μαντεύεται → prophecy is lexical: 1 addition morphosyntactic: 2 substitutions (noun for verb, copula for the mediopassive) semantic: 1 substitution (PROPHECIES: PROPHECY IS) ἦε → or 0 καὶ → is morphosyntactic: 1 substitution semantic: 1 substitution οὐκί → not true lexical: 1 addition morphosyntactic: 1 addition (implied in the original) And Robert Fagles: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. Courage, my friends, hold out a little longer. Till we see if Calchas divined the truth or not. ===== Going a bit slower ===== Actually, a gradual approach is clearer. ==== Phase 1: word count ==== We start by simply counting words, but marking //which// words were added or omitted. Example --- //Homer vs. Fagles// τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν Courage, my friends, hold out a little longer. / Till we see 0 10 (my friends, "my" added) 1 (καὶ deleted) 01 (hold out, adverb added) 010 (a little longer, adverb added) 0 10 (we see, pronoun added -- never mind that this is obligatory in English) = 5 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. if Calchas divined the truth or not. 0 1 (ἐτεὸν transposed) 0 0 10 (the truth, article added) 0 1 (καὶ deleted) 0 = 3 Note that this word count is //not// the simple word count, which would result in 8:11 for the first Iliad verse, and in 7:7 for the second. We count //operations//, representing a word-for-word relationship with 0, and any addition, deletion, or transposition with 1, and producing score 5 for the first line and 3 for the second. Let's try //Homer vs. Lattimore//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν No, but be patient, friends, and stay yet a little longer / until we know 1 1 10 (be patient) 0 0 0 1 (yet) 010 0 10 = 5 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. whether Kalchas' prophecy is true or is not true. 0 1 0 01 (prophecy is) 0 1 (καὶ) 1 (is) 01 = 5 See how Lattimore scores 5 for the first line and 5 for the second. //Homer vs. Fitzgerald//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν Hold on hard, dear friends! / Come, sweat it out, until at least we learn 011 10 / 0 (καὶ ~ come) 011 (it out) 1 (until, transposed) 0 0 0 10 = 7 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. if Kalkhas made true prophecy or not. 0 1 (true tp.) 0 110 (made prophecy - tp. + add.) 0 1 0 = 4 Fagles vs. Lattimore vs. Fitzgerald: 8:10:11 (on this level). ==== Phase 2: grammatic transformations ==== Now we count operations that change parts of speech etc. //Homer vs. Fagles//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν Courage, my friends, hold out a little longer. / Till we see 1 10 1 0 1 (little) / 0 0 = 4 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. if Calchas divined the truth or not. 0 1 0 1 (present > past) 0 1 0 = 3 //Homer vs. Lattimore//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν No, but be patient, friends, and stay yet a little longer / until we know 1 1 0 0 0 0 1 (yet) / 0 0 = 3 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. whether Kalchas' prophecy is true or is not true. 0 1 0 11 (prophecy is) 0 1 (καὶ om.) 1 (is true) 0 = 5 //Homer vs. Fitzgerald//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν Hold on hard, dear friends! / Come, sweat it out, until at least we learn 011 10 / 1 (καὶ - come) 011 (it out) 1 (until, transposed) 0 1 (least) 1 = 9 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. if Kalkhas made true prophecy or not. 0 1 (true tp.) 0 111 (made prophecy - tp. + noun + add. perfect) 0 1 (καὶ) 0 = 5 The score Fagles --- Lattimore --- Fitzgerald in grammar: 7:8:14. ==== Phase 3: semantic transformations ==== Here we count every change of meaning that is not necessary, i. e. every meaning which is not the "first appropriate" one in the dictionary. This includes also additions and deletions of implied information. At first, the level seems similar to grammatic one, but I think it should be separated (otherwise counting changes gets confusing). //Homer vs. Fagles//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν Courage, my friends, hold out a little longer. / Till we see 1 1 1 1 (hold out) 1 (little) 0 1 (see) = 5 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. if Calchas divined the truth or not. 0 1 0 1 (present > past) 0 1 0 = 3 //Homer vs. Lattimore//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν No, but be patient, friends, and stay yet a little longer / until we know 1 1 1 (be patient) 0 0 0 1 (yet) 1 (little) / 0 0 = 4 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. whether Kalchas' prophecy is true or is not true. 0 0 1 (is true) 1 (prophecy) 0 1 (καὶ) 1 (is... true) 0 = 4 //Homer vs. Fitzgerald//: τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν Hold on hard, dear friends! / Come, sweat it out, until at least we learn 111 10 / 1 (καὶ - come) 111 (it out) 1 (until, transposed) 0 1 (least) 1 (learn) = 11 ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί. if Kalkhas made true prophecy or not. 0 1 (true tp.) 0 111 (made prophecy - tp. + noun + add. perfect) 0 1 (καὶ) 0 = 5 On this level, Fagles vs. Lattimore evens out, Fitzgerald leads: 8:8:16. ==== Phase 4: Total ==== The total "translation distance" would be the sum of lexical, grammatical, semantic changes. ^ ^ Fagles ^ Lattimore ^ Fitzgerald ^ | Word count | 8| 10| 11| | Grammar | 7| 8| 14| | Semantics | 8| 8| 16| | **Total** | **23**| **26**| **41**| This should suggest that Fagles achieves his translation (of verses cited) with fewer transformations than Lattimore, and Fitzgerald's translation should read significantly //different// than others. We could use an English speaker (who reads Greek as well) to say whether this feels true. Also, we need more material: more verses, more translations. We don't know yet whether 3 points is significant at all.