====== BaseX Adventures ====== A lab diary of getting acquainted with the [[http://docs.basex.org/wiki/Main_Page|BaseX XML database]]. The version in use is the [[http://basex.org/products/download/|Official Release: 7.6 (2013/02/05)]]. We search through the [[http://docs.basex.org/wiki/GUI|BaseX GUI]]. The documents are encoded in [[http://www.tei-c.org/index.xml|TEI XML]]. They contain mostly bibliographical and prosopographical data prepared for the //Mercurius Croaticus// database. The documents will be accessible in a "basex" subdirectory on the [[http://sourceforge.net/projects/croala/|CroALa Sourceforge pages]]. Further experiments are provided by encoding a book of decisions by three administrative bodies of medieval Dubrovnik, for years 1390-1392. The TEI XML file is [[http://sourceforge.net/p/croala/code/146/tree/radno/dbk1390-r6.xml|available on Sourceforge]] (current version: dbk1390-r6.xml), as part of the CroALa Project. ===== Find all filenames in collection ===== In BaseX and XQuery, a database is a "collection". It has a name and can contain many files. This is an XQuery to return all filenames of files in the //croala// database: let $collection := collection("croala") for $file in $collection return substring(document-uri($file), 8) The ''substring()'' function serves for pretty-printing --- it actually removes all repetitions of ''croala/'' from node-names (file names). Result: mikac-obsidio.xml stephanus-confirmatio-crisogoni.xml babulak-o-ode-matzek.xml zamanj-b-navis.xml nn-ianci-epitaph.xml banic-j-epist-1513-02-16.xml andreis-f-thurc-her.xml lipavic-eleg.xml berislavic-p-epist-1518-04-10.xml kordic-m-funere-lazzari.xml frankapan-b-epist-1525-02-15.xml frankapan-f-epist-1533-07-05.xml vicic-k-thien.xml marul-mar-trop.xml crijev-i-sorgo-1509.xml utjesenovic-j-epist-1545-02-28.xml vitezov-ritter-p-epist-marsil.xml krsava-j-epigram.xml ian-pan-oratio.xml nn-vekenega.xml stay-b-philos.xml andreis-f-epist-1570-02-02.xml mazuranic-a-epist.xml kunic-r-epigr.xml niger-t-epist-adr-1522.xml nn-thoma-archid.xml banic-j-epist-1513-06-04.xml vicic-k-jess.xml milasin-f-met.xml banic-j-epist.xml brodaric-s-epist-1525-11-30.xml brodaric-s-epist-1525-09-30.xml crijev-i-carm-1678.xml vrancic-a-epist-1553-08-12.xml gradic-s-palmottae-vita.xml skerle-n-ep-verh-1798-04.xml vrancic-a-c-vd.xml sisgor-g-eleg.xml dubrovnik-epist-1531-07-22.xml andronic-trag-elegia.xml banic-j-epist-1513-05-01.xml skerle-n-ep-verh-1798-07.xml baric-a-stat.xml crnkovic-p-epist.xml brodaric-s-epist-1526-02-22.xml marul-mar-epist-1477.xml skerle-n-ep-verh-1798-10-16.xml Etc. etc. ===== Search several files in a collection ===== **Problem**: when we create a collection, the usual XQuery / XPath expression, such as: //date will produce zero results. **Solution**: the correct notation has to take the namespace into account. It is: //*:date However, the expression: //tei:date won't work, because of the following: "Error: [XPST0081] No namespace declared for "tei:date"." UPDATE: in XQuery, namespace is declared like this: declare namespace tei = "http://www.tei-c.org/ns/1.0"; ... on a first line of the XQuery program or expression, that is, before the FLWOR sequence. ===== Find all mentions of "Dubrovnik" as a place name in the collection ===== XPath, or "search" in BaseX: //*:placeName[. = "Dubrovnik"] We get **161** results. ===== Find in the collection all persons born in Dubrovnik ===== //*:person[*:birth/*:placeName = "Dubrovnik"] "*:" notation is necessary, because we are searching in a collection. Now we get **76** results. The collection contains 276 persons. ===== Find in the collection all persons who wrote elegies ===== //*:person[*:occupation[contains(.,"elegij")]] **49** results, out of 276 persons. ===== Find only the persons connected with Dubrovnik who wrote elegies ===== We use an XQuery expression: for $eleg in //*:person[*:occupation[contains(.,"elegij")]] where $eleg//*:placeName[. = "Dubrovnik"] return $eleg **24** results, out of 49 elegiac authors. ===== Return just x results from the found set ===== **Problem**: we need to show just 10 results from a bigger set found in the prosopographical database. It does not have to be only the //first// 10 results; we can show, e. g., just the results from 200-209. **Solution**. We use the following XQuery expressions: for $person in subsequence(collection("prosop1")//*:person, 1, 10) "1" means "from the result 1". "10" means "show just ten results. **Source**: the [[http://en.wikibooks.org/wiki/XQuery/Limiting_Result_Sets#Using_Subsequence_to_Limit_Results|XQuery Wikibook]] (= Wikibooks contributors, "XQuery," Wikibooks, The Free Textbook Project, [[http://en.wikibooks.org/w/index.php?title=XQuery&oldid=2361911]] (accessed June 18, 2012)). for $person in subsequence(collection("prosop1")//*:person, 200, 10) return $person Show the 10 results from 200 onwards. for $person in subsequence(collection("prosop1")//*:person, 200, 10) order by $person/*:death/*:date/@when return $person Show only the 10 results from 200 onwards, sorting them by value of the "when" attribute in the "death/date" element. for $person in subsequence(collection("prosop1")//*:person, 200, 10) order by $person/*:death/*:date/@when descending return $person The same set, sorted in descending order. ===== Order by attribute value ===== Select all persName elements which have a ref attribute; sort them by value of the attribute; return the persName elements. for $pers in //*:body//*:persName[@ref] order by string($pers/@ref) return $pers Variation: order divs by month (which is expressed as @when inside date, part of each head element): for $mon in //*:div[@ana[. = 'mensis']] order by string($mon/*:head/*:date/@when) return $mon/@xml:id Even more complex: the collection contains an index of persons and a list of records. Select all matches of names in records (identified by @ref attribute) with persons (identified by @xml). Order the set by date of record, contained in div element with @ana attribute dies, and in its head element, marked as date with @when attribute. (The elements belong to TEI XML set.) for $pers in //*:person, $persname in //*:persName let $date := $persname/ancestor::*:div[@ana[. = 'dies']]/*:head/*:date/@when where $pers/@xml:id = $persname/@ref order by $date return (data($pers/@xml:id), data($date)) ===== Count number of matches, group by that number ===== We have two types of divs, one for months (@ana='mensis'), one for days (@ana='dies'). We need to count days inside months and to group resulting counts. Furthermore, each month contains records of several types (marked by a part of @xml:id value), but we select just one type. The results are returned wrapped inside a p element. for $dies in //*:div[@ana[. = 'mensis']][@xml:id[contains(., 'minor')]] order by count($dies/*:div[@ana[. = 'dies']]) return

{count($dies/*:div[@ana[. = 'dies']])}

===== Table rows: meetings, dates, number of votes ===== Meeting records are in divs (with @ana = "dies"); their head contains dates (date/@when) and number of persons present (num/value, which can be wrapped in sic element, when something is strange in the MS). We want to see on which days how many people were present, and to know what type of meeting it was (the type is encoded as part of div/@n value). The results are returned as HTML table rows (tr) and cells (td). for $dies in //*:div[@ana[. eq 'dies']] let $balote := $dies/*:head let $broj := $balote/*:num[not(parent::*:date)][last()]/@value let $dan := $balote/*:date/@when order by number($broj) descending, $dan return {data($dies/@n)} {data($dan)} {data($broj)} ===== Update files ===== The working collection (database) consists of several TEI XML files. They are altered elsewhere, and we want to update the BaseX database. The easiest way seems to be using several [[http://docs.basex.org/wiki/Commands#REPLACE|REPLACE commands]]: replace dbk1390-92idx.xml /home/neven/rad/croala-r/radno/dbk1390-92idx.xml ... Local name (TEI XML file in the database) is the same as the name of file on disk (described by full path).