Leipzig Workshop - Croatian documentation

A workshop of the Digital Humanities Leipzig research programme.

Getting CTS / MOM workflow to work (English version by Anamarija Žugić)

5. 8. 2013. Software to install:

  • Fuseki
  • Gradle-1.6
  • Oxygen 14.2
  • Mercurial 2.6.3

Installed:

  • Oxygen 14.2
  • Mercurial 2.6.3

Clone a repository using Mercurial (dse)

6.8.

Problems with installing Gradle-1.6 on Windows XP (visible in Command Prompt):

  • incorrect path (solved by going to Properties in My Computer, then clicking Advanced System Settings and writing the correct path in the Environment Variables)
  • Java not installed (solved by installing Java Development Kit 1.7.0_25)
  • Path for Java doesn't exist on the machine (solved by going to Environment Variables and setting the JAVA_HOME variable and writing the installation path for Java in Variable Value; restarting computer and Command Prompt)

Cloned another repository using Mercurial (dsetemplate)

Opened the TextInventory file using Oxygen (added the path to TextInventory in the Customize: Schema URL)

Created test files using Oxygen: Publilius Syrus.xml (this is a TextInventory) Izdanje5.xml (this is a TEI XML to contain text)

Edited conf.gradle file:

  • Imported path to ctsinventory (path for .xml file (TextInventory) including the file name)
  • Imported path to ctsarchive (path for .xml without the file name)

Started Gradle in Command Prompt:

  • Started gradle tasks (in folder „Primjer“, this is where dse was cloned)
  • gradle checkti test - PASS
  • gradle ctsttl (failed due to lack of document containing text of the edition – Oxygen command refering to non-existing document: <online docname=“izdanje5.xml”>)

Creating a test document (with the text of the edition): created a new file using Oxygen (document type: TEI P5 – all)

Started Gradle:

ran gradle ctsttl – passed all tests after fixing the error

2 files created as a result: cts.ttl izdanje5-00001.txt

Instaliranje i korištenje CTS MOM sustava (hrvatska verzija, Jan Šipoš)

sustav: Mac OS 10.6.8.

Prvi dan (5. 8. 2013.)

- instalacija:

  • python-dev (Terminal: easy_install python-dev)
  • Oxygen XML Editor 14.2
  • mercurial-2.7

problem u instalaciji – u sustavu nije postojao gcc (C compiler) niti se mogao instalirati

rješenje – nađena binary verzija Mercuriala na: http://mercurial.berkwood.com/

Kloniran repozitorij dse (Terminal, via Mercurial: hg clone https://bitbucket.org/neelsmith/dse)

Drugi dan (6. 8. 2013.)

- preuzete datoteke (via Mercurial):

dsetemplate (Terminal: hg clone https://bitbucket.org/neelsmith/dsetemplate)

- kreirane datoteke (Oxygen):

minorpoets.xml (TextInventory)

new> customize> Schema URL: dsetemplate/texts/editions/TextInventory.rng

mlp.xml (TEI) – kreirano naknadno

- gradle (instalacija)

  • problem u pokretanju – Terminal: gradle: command not found
  • rješenje: dodan je PATH na gradle-ov bin folder

- gradle (konfiguracija):

  • conf.gradle – u datoteku unesene adrese za ctsinventory i ctsarchive

Terminal (u folderu dse): gradle tasks

- gradle (validacija):

Terminal: gradle checkti

problem: datoteka nije prošla validaciju, jer nije bio specificiran ctsnamespace

rješenje: dodano <ctsnamespace abbr=“latinLit” ns=“http://stoa.org/ctsns” /> kao prvi child node na root node <TextInventory>

- gradle (kreacija RDF)

Terminal: gradle ttl

problem – još nije bila kreirana datoteka mlp.xml na koju se poziva minorpoets.xml (TextInventory): <online docname=“mlp.xml”>

rješenje – datoteka je kreirana prema TEI p5.all templatu

nastale datoteke:

  • minorpoets-00001.txt
  • mlp-00001.txt
  • cts.ttl

Thursday, 7. August 2013: Workshop with Christopher Blackwell and Neel Smith

Report prepared by Željka Salopek.

Connecting image with text inventory

  1. Signing in to Bitbucket: http://bitbucket.org/
  2. Clone a repository using Mercurial („hg clone“ command) (https://bitbucket.org/nevenjovanovic/croaladse)
  3. Updating repository – with commands „hg pull“ and „hg update“ in Mercurial in order to get the latest version.
  4. Adding „Caption statement data“ and „Rights data“ about the Image. (ci –m „Edited data collection“). Concluded with „hg push“ command.
  5. Adding text inventory created in the previous workshop (Publilius Syrus). (ci –m „Added text inventory“). Concluded with „hg push“ command.

Monday, 12 August 2013: Prepare a CTS repository for serving RDF

1. CTS working repository is croaladse

2. CTS RDF-serving repository (where the ttl files will be) is croaladsepub – cloned from dsetemplate

3. gradle.conf in the CTS RDF-serving repository must point to relevant places in the CTS working repository

4. gradle checkti

5. gradle ttl

Osmanides, XML edition

Anamarija Žugić, Jan Šipoš, Jura Ozmec

  • Skinuli PDF digitalizat izdanja osmanides (Google Books), skinuli libreoffice i latinski spellingchecker
  • Natezali se s mercurialom (nije praktično da svi rade na istoj datoteci) pa podijelili izdanje na svaku knjigu posebno – dogovorili se sta radimo i sta oznacujemo, sta radimo kad tko koju knjigu zavrsi

13.8. Stvaranje izdanja „Osmanides“ u prijevodu Vlahe Getaldića uz pomoć Oxygena

  • Ispunjeni osnovni podaci o izdanju u TextInventory; umetnut tekst
  • Kloniran repozitorij s bitbucket adrese pomoću Command Prompta

Instalacija programa i alata za rad na tekstu:

  • LibreOffice
  • alat za provjeru pravopisa i gramatike (Latin spelling and hyphenation dictionaries) – ekstenzija preuzeta s adrese: extensions.openoffice.org/en/project/dict-la

u programu LibreOffice: Tools> Extension Manager>Add (odabrati željenu ekstenziju)

Rad na tekstu:

  • po jedno pjevanje dodijeljeno svakom studentu unutar .xml datoteke koja sadrži cjelokupni tekst Osmanide
  • pri ispravljanju OCR-a bilo je potrebno ukloniti suvišne redove što je dovelo do konflikata kod spajanja u jednu datoteku (hg merge)
  • zbog konflikata izmijenjeno je tijelo teksta – rješenje: tekst podijeljen po pjevanjima u zasebne .xml datoteke
  • zadane korekcije (za svako pjevanje):
    • naslov pjevanja okružiti oznakom <head>
    • sažetak pjevanja okružiti oznakom <argument>
    • naslov sažetka okružiti oznakom <head>
    • svaki stih zasebno okružiti oznakom <l> (moguće automatski postaviti pomoću komande Find/Replace (Ctrl + f); u polje Text to find upisati ^.+$ a u polje Replace with upisati <l>$0</l>, u Options izabrati Regular expression, te u Scope izabrati Only selected lines)
    • XML elementi: sic, persName, corr, pb, ref
    • ortografski arhaične riječi, riječi kojih nema u rječniku i očite pogreške okružujemo oznakom sic
    • osobna imena tiskana malom kapitalom okružujemo oznakom persName
    • pogreške u tiskanom izdanju koje smo ispravili okružujemo oznakom corr
    • početak nove stranice okružujemo oznakom pb (<pb n=“12” />)
    • oznaku za fusnotu okružujemo oznakom ref
    • tekst fusnote okružujemo oznakom note
  • Umetanje brojeva stihova: Na alatnoj traci oXygena pronađemo ključ (za bicikl) kraj crvenog trokuta u krugu. Pritisnemo i odaberemo “brojevi”, a zatim “transform now”.
  • Provjerimo jesu li novo dodani brojevi stihova usklađeni s prijašnjima. Pohranimo rezultat transformacije pod nazivom broja knjige na kojom radimo.
 
z/leipzig-workshop-2013.txt · Last modified: 16. 08. 2013. 00:36 by njovanov
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki