Upload and process
Posted by pandorfer
on Nov. 21, 2016, 10:36 a.m. (last update Nov. 21, 2017, 9:07 a.m.)
Upload and process adlibXML
Go to 'Workbench -> Import Texts' and hit the import button.
This will trigger a script called
totei:triggerBatchTrans. This script will
- read in the identifiers of those documents from this adlib-endpoint which have been marked as ready,
- check if this document is already stored in Glaser-TEI at
data/importedand if not, it will request the adlibXML, transform it with the stylesheet
resources/xslt/adlibXMLtoTEI.xslinto a basic XML/TEI document,
- and finally stores it in
resources/xslt/adlibXMLtoTEI.xsl does not much more then mapping (some) fields of adlibXML to the TEI (epidoc) data model but doesn't do anything with the actual transcribed texts. To parse the transcribed and partially annotated texts of the glaser squeezes which can be located at
- Go to 'Workbench -> Check and Process'.
- There you find an overview of all imported texts and a check, if the annotated transcriptions are valid.
- If a text is not valid, report this to the annotators and delete this document (by clicking on delete).
- if a text is valid (this means you can see the transcribed text), then click on process.
This will trigger the script
totei:processDoc which will
- store a copy of the selected document and store it in
- transform the 'plain text markup' into XML/TEI valid markup.
Be aware that documents which have already been processed can't be deleted nor processed any more. This is made on purpose to avoid duplicates as well as to keep the data processing workflow replicable.
The status/progress of the annotation process can be checked at 'Progress -> Check Progress'. This overview lists all imported documents and shows, if those documents have been already processed or marked as done.
By browsing to 'Progress -> Valid?', you can quickly check all documents at any stage of the workflow if they validate against either the TEI or as well as the Epidoc schema. Simply select the schema and collection you would like to check and hit Check!. Ideally everything is green.
In case a document is not valid, the first few (in case there are more) errors will be listed.
After a document was successfully imported and process further, human annotators can start working with the processed documents adding e.g. semantical markup (tagging person, places...). This can be accomplished by
- using eXist-db's inbuilt XML editor eXide,
- or using the standalone (and commercial) XML editor oXygen which connects to the glaser-tei app/database using eXist-db's WebDAV API.
Comment/Edit this post on GitHub.
export blog text