Levenshtein distance as a metaphor in evaluating translations

Hypothesis: as in computing Levenshtein distance, we count the number of operations necessary to transform a sentence in one language into a sentence in another. Our units are not characters, but words.

Categories

Levenshtein's algorithm counts additions, deletions, substitutions; the Damerau–Levenshtein variant allows further to count transpositions.

Analysing a translation, we would want to evaluate operations on following levels:

  1. lexical: words added, deleted, substituted, transposed
  2. morphosyntactic: parts of speech substituted (usually), or transposed (cannot imagine adding a part of speech without adding a word too)
  3. semantic: meanings added, deleted, substituted (here my brain starts to boil imagining meanings transposed)

Example

A literal translation

The text is Hom. Il. 2, 305–306:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.

Here is a literal translation into Latin, given by Rajmund Kunić (Raymundus Cunichius), c. 1776:

Tolerate, amici, et manete aliquandiu; ut sciamus
An verum Calchas vaticinetur, an et non.

We count all words where both morphology and meaning are carried over as 0. Additionally, the meaning must be “literal”, i. e. the first equivalent that comes to mind; e. g. καὶ “et”, not ”-que”.

Operations are: addition, deletion, substitution, and transposition.

Under these conditions, the score for Kunić's literal translation is:

0 0 0 0 1 2 0 0
0 0 0 0 0 0 0

The 1 and 3 in the fifth and sixth place of the first line result from rendering ἐπὶ χρόνον as aliquandiu: 1 (deletion of ἐπὶ) + 1 (morphological substitution of an adverb for a noun) + 1 (semantic addition of “a CERTAIN time” for “a time”):

ἐπὶ → ... (lexical deletion)
χρόνον → aliquandiu (lexical substitution)
   noun → adverb (morphological substitution)
     FOR A TIME → for SOME time (semantic addition)

An alignment of the passages quoted (done with Alpheios) can be seen and explored here: X.

Cicero's translation

In his preface on translation, Kunić quotes and analyzes Cicero's translation of the passage from Homer. We'll compare Cicero (div. 2, 30, 63) and the original.

This Alpheios alignment is here: X.

Actually, we should take into consideration metrical organization of verses (Cicero has one verse more than Homer), but, for the time being, we'll disregard this, counting only transpositions.

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.

Ferte, viri, et duros animo tolerate labores,
auguris ut nostri Calchantis fata queamus
scire ratosne habeant an vanos pectoris orsus.

First three words are transformed in a relatively simple way.

τλῆτε is morphologically and positionally equivalent to Ferte, but there is a semantic move away from the basic meaning, so we count this as 1. The same goes for φίλοι ~ viri. καὶ is literally transferred as et. So the score is:

1 1 0

Now, what is to be seen as equivalent for μείνατ'? Morphologically, it is “tolerate”, which gets 1 for transposition (word-order), 1 for semantic change (or should this be two, as “to endure” is only an implied synonym for “to wait”?).

ἐπὶ χρόνον is then replaced by duros animo… labores: 1 1 for deletions, 1 2 for morphological substitutions, 1 2 for syntactical substitutions, 1 2 for semantical changes (this is still only a sketch – the semantic moves should be better analyzed). Now we have:

1 1 0 2 4 7

ὄφρα is translated by ut, but it is transposed, so 1. δαῶμεν is translated by queamus / scire: 2 transpositions, 1 addition, 1 morphological addition, 1 semantic addition (to BE ABLE to learn).

In all:

1 1 0 2 4 7 1 5

In the next line, everything gets 1 for transposition.

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.

auguris ut nostri Calchantis fata queamus
scire ratosne habeant an vanos pectoris orsus.

ἢ is additionaly substituted by -ne (1), an enclytic, but with no semantic change. ἐτεὸν is ratos… pectoris orsus: 1 morphological change (plural), 2 additions, at least 2 semantic changes. Κάλχας is substituted morphologically by genitive case (1), has 2 additions, both morphological and semantical (auguris… nostri), so we'll count them as 4.

So far:

2 6 6

μαντεύεται corresponds to fata… habeant: 2 transpositions, 2 morphological substitutions (a noun, a verb in 3 pl), 2 semantic substitutions (from HE AUGURS to HIS PROPHECIES HAVE).

ἦε has 1 for transposition.

καὶ has 1 for omission.

οὐκί gets 1 for transposition, 1 for morphological substitution (negation into adjective), at least 2 for semantic substitution (NOT into EMPTY i. e. FALSE).

The score for both lines together:

1 1 0 2 4 7 1 5
2 6 6 6 1 1 4

Greek and English

Let's try and compare Homer with Robert Fitzgerald's translation.

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.

Hold on hard, dear friends!
Come, sweat it out, until at least we learn
if Kalkhas made true prophecy or not.

Analysis:

τλῆτε → Hold on hard (1 morphological + 1 semantic addition)
φίλοι → dear friends (1 morphological + 1 semantic addition)
καὶ → ... (1 deletion)
μείνατ' ἐπὶ χρόνον → sweat it out (1 semantic substitution, 
     1 omission + 1 addition + 1 semantic substitution (χρόνον - it), 
     1 omission + 1 addition
ὄφρα → until at least (2 morphological, 1 semantic addition)
δαῶμεν → we learn 1 semantic addition
ἢ → if 0
ἐτεὸν →  true 1 transposition
Κάλχας → Kalkhas 1 transposition
μαντεύεται → made... prophecy 1 transposition 1 substitution 1 addition (morphological)
ἦε → or 0
καὶ → ... 1 deletion
οὐκί → not 0

Richmond Lattimore:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.

No, but be patient, friends, and stay yet a little longer
until we know whether Kalchas' prophecy is true or is not true.

Analysis:

τλῆτε → No, but be patient
   lexical: 2 added (we don't count necessary changes)
     morphosyntactic: 2 added
        semantic: 2 added ("No,but")
φίλοι → friends: 0
καὶ → and: 0
μείνατ'  → stay: 0
ἐπὶ χρόνον → yet a little longer
   lexical: 2 added
     morphosyntactic: 3 added
       semantic: 2 added (YET, LITTLE)
ὄφρα → until: 0
δαῶμεν → we know - 1 semantic addition
ἢ → whether: 0
ἐτεὸν →  true 1 transposition
Κάλχας → Kalchas'
  lexical: 1 transposition
    morphosyntactic: 1 substitution (genitive for nominative)
μαντεύεται → prophecy is
  lexical: 1 addition
    morphosyntactic: 2 substitutions (noun for verb, copula for the mediopassive)
      semantic: 1 substitution (PROPHECIES: PROPHECY IS)
ἦε → or 0
καὶ → is
   morphosyntactic: 1 substitution
     semantic: 1 substitution
οὐκί → not true
   lexical: 1 addition
     morphosyntactic: 1 addition (implied in the original)

And Robert Fagles:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.

Courage, my friends, hold out a little longer.
Till we see if Calchas divined the truth or not.

Going a bit slower

Actually, a gradual approach is clearer.

Phase 1: word count

We start by simply counting words, but marking which words were added or omitted.

Example — Homer vs. Fagles

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
Courage, my friends, hold out a little longer. / Till we see 
0 10 (my friends, "my" added) 1 (καὶ deleted) 01 (hold out, adverb added) 010 (a little longer, adverb added) 0 10 (we see, pronoun added -- never mind that this is obligatory in English) = 5

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
if Calchas divined the truth or not.
0 1 (ἐτεὸν transposed) 0 0 10 (the truth, article added) 0 1 (καὶ deleted) 0 = 3

Note that this word count is not the simple word count, which would result in 8:11 for the first Iliad verse, and in 7:7 for the second. We count operations, representing a word-for-word relationship with 0, and any addition, deletion, or transposition with 1, and producing score 5 for the first line and 3 for the second.

Let's try Homer vs. Lattimore:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
No, but be patient, friends, and stay yet a little longer / until we know
1 1 10 (be patient) 0 0 0 1 (yet) 010 0 10 = 5

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
whether Kalchas' prophecy is true or is not true.
0 1 0 01 (prophecy is) 0 1 (καὶ) 1 (is) 01 = 5

See how Lattimore scores 5 for the first line and 5 for the second.

Homer vs. Fitzgerald:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
Hold on hard, dear friends! / Come, sweat it out, until at least we learn
011 10 / 0 (καὶ ~ come) 011 (it out) 1 (until, transposed) 0 0 0 10 = 7

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
if Kalkhas made true prophecy or not.
0 1 (true tp.) 0 110 (made prophecy - tp. + add.) 0 1 0 = 4

Fagles vs. Lattimore vs. Fitzgerald: 8:10:11 (on this level).

Phase 2: grammatic transformations

Now we count operations that change parts of speech etc.

Homer vs. Fagles:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
Courage, my friends, hold out a little longer. / Till we see
1 10 1 0 1 (little) / 0 0 = 4

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
if Calchas divined the truth or not.
0 1 0 1 (present > past) 0 1 0 = 3

Homer vs. Lattimore:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
No, but be patient, friends, and stay yet a little longer / until we know
1 1 0 0 0 0 1 (yet) / 0 0 = 3

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
whether Kalchas' prophecy is true or is not true.
0 1 0 11 (prophecy is) 0 1 (καὶ om.) 1 (is true) 0 = 5

Homer vs. Fitzgerald:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
Hold on hard, dear friends! / Come, sweat it out, until at least we learn
011 10 / 1 (καὶ - come) 011 (it out) 1 (until, transposed) 0 1 (least) 1 = 9

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
if Kalkhas made true prophecy or not.
0 1 (true tp.) 0 111 (made prophecy - tp. + noun + add. perfect) 0 1 (καὶ) 0 = 5

The score Fagles — Lattimore — Fitzgerald in grammar: 7:8:14.

Phase 3: semantic transformations

Here we count every change of meaning that is not necessary, i. e. every meaning which is not the “first appropriate” one in the dictionary. This includes also additions and deletions of implied information. At first, the level seems similar to grammatic one, but I think it should be separated (otherwise counting changes gets confusing).

Homer vs. Fagles:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
Courage, my friends, hold out a little longer. / Till we see
1 1 1 1 (hold out) 1 (little) 0 1 (see) = 5

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
if Calchas divined the truth or not.
0 1 0 1 (present > past) 0 1 0 = 3

Homer vs. Lattimore:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
No, but be patient, friends, and stay yet a little longer / until we know
1 1 1 (be patient) 0 0 0 1 (yet) 1 (little) / 0 0 = 4

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
whether Kalchas' prophecy is true or is not true.
0 0 1 (is true) 1 (prophecy) 0 1 (καὶ) 1 (is... true) 0 = 4

Homer vs. Fitzgerald:

τλῆτε φίλοι, καὶ μείνατ' ἐπὶ χρόνον ὄφρα δαῶμεν 
Hold on hard, dear friends! / Come, sweat it out, until at least we learn
111 10 / 1 (καὶ - come) 111 (it out) 1 (until, transposed) 0 1 (least) 1 (learn) = 11

ἢ ἐτεὸν Κάλχας μαντεύεται ἦε καὶ οὐκί.
if Kalkhas made true prophecy or not.
0 1 (true tp.) 0 111 (made prophecy - tp. + noun + add. perfect) 0 1 (καὶ) 0 = 5

On this level, Fagles vs. Lattimore evens out, Fitzgerald leads: 8:8:16.

Phase 4: Total

The total “translation distance” would be the sum of lexical, grammatical, semantic changes.

Fagles Lattimore Fitzgerald
Word count 8 10 11
Grammar 7 8 14
Semantics 8 8 16
Total 23 26 41

This should suggest that Fagles achieves his translation (of verses cited) with fewer transformations than Lattimore, and Fitzgerald's translation should read significantly different than others. We could use an English speaker (who reads Greek as well) to say whether this feels true.

Also, we need more material: more verses, more translations.

We don't know yet whether 3 points is significant at all.

 
z/levenshtein-translation.txt · Last modified: 07. 11. 2012. 17:21 by njovanov
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki