Responsive imageResponsive image

TEI-XML Software

While working on STEP (Scholarly Text-Editing Platform), we have been developing a few software utilities that are useful in other settings, especially within the TEI community. We have therefore decided to make them available to everyone, including TEI practitioners, but also other developers working with XML.

The left panel guides you to three distinct utilities: TEI-XML Components, Explore TEI-XML Files, and XML:Lang Utility. They are distinct because they fulfill distinct missions. In the course of developing them, however, we have concluded that it was desirable to package them together under one application. TEI-XML Components is by far the most ambitious of the three, but these other two are most serviceable companions. Explore TEI-XML Files and XML:Lang Utility have therefore been made accessible inside TEI-XML Components via a menu called Explore in the menu bar of the latter app.

TEI-XML Components is intended to be used by anyone with an interest in TEI. Explore TEI-XML Files will be useful to anyone wishing to explore any XML file. The XML:Lang Utility will delight anyone with an interest in encoding languages while learning about them at the same time.



A New Tool to Learn, Navigate, and Teach the TEI-XML Guidelines

TEI-XML Components: A Cross-Platform Software for anyone interested in TEI

Developed by André De Tienne
Director and General Editor
Peirce Edition Project, IU School of Liberal Arts

TEI-XML Components has been developed to help users learn, understand, and navigate among all the TEI components described in the TEI Guidelines swiftly and easily. The pedagogical goal is to help break the barrier that discourages many editors from embracing the TEI concept of encoding texts for the long-term benefit of humanities research. The large XML universe is daunting for most because of its many arcane rules and perceived demands for high-level technical expertise. A particular challenge is getting a good grasp of the many mutual or hierarchical interdependencies within TEI’s XML structure. TEI-XML Components aims to help users, whether beginners or even advanced, to understand the logic of those interdependencies through a navigational system that reveals, more readily than in the online guidelines, connections between elements, attributes, values, attribute classes, datatypes, models, macros, and modules.


Conceived initially and primarily as a companion tool to the Peirce Project’s NEH-funded STEP (Scholarly Text-Editing Platform), this app has been turned into a cross-platform standalone software for Mac, Windows, and Linux. Designed as a professional tool for serious scholarship, it is not intended nor designed for use on IOS or Android devices. It is better viewed on screens taller than 7": text encoding is a work that requires larger screens.

The initial purpose of this software was to develop a tool capable of importing the myriad TEI data and metadata directly from the repositories into STEP’s companion tools in order to make them not only TEI-compliant but more robustly TEI-conformant in all particulars. Along the way, the app grew into something even more useful thanks to the way in which it incorporated and consolidated the large variety of TEI components along with their metadata in a single place.

The website provides users full access to all of those components via its table of contents and especially the appendices in that table. The TEI website’s interface does a very nice job displaying the details regarding those components once users have clicked their way to them. The difficulty is that the way information about those components is distributed across hundreds of webpages makes it hard to develop a clear picture or understanding of the logic of interrelations or cross-dependencies among those components and their classes. Ordinary TEI encoders may know how to find information about a particular XML element (or tag) and have a look at what attributes they might accept, but without a clear sense of their use rules according to what context and within the context of what attribute class, what element model, complying with imperatives or recommendations formulated to accommodate what particular scholarly needs or standards in what particular discipline of the humanities as manifested in one of the TEI’s 21 modules. They would also not be likely to find out easily what kind of values certain attributes or classes thereof accept, or how to format them according to what sort of standard syntax that ensures that an XML processor attuned to TEI XML schemas would know how to interpret and process such values when laying out an encoded text or answering X-queries based on their encoding. The TEI website provides answers to such questions but navigating it and finding one’s way across hundreds of scrollable documents takes hours of clicking, often in the wrong place for lack of intuitive logic.

TEI-XML Components makes all of this much easier. All eight classes of components are brought within a single front-end interface headed by clearly labeled tabs. Clicking these tabs (the first eight in the top row) provides instant access to all TEI components that correspond to that label. Clicking any of these components brings into view everything that the TEI Guidelines allow users to know about it. Relevant information is distributed among multiple fields that are clearly labeled. All other components that are related in one way or another to the component under review (as parent, child, sibling, class, datatype, or module) are displayed in the form of links. Users need only click such links to be instantaneously transported to the exact place that discusses it within the app.

Another service provided by TEI-XML Components is a special interface (under the + tab) that lets users create and/or check attribute values in full conformity with the standard syntax those values are expected to comply with. Such syntax is defined through datatypes (“teidata”), some of which come with regular-expression formulas that are used to validate the well-formedness of such values. The app checks automatically whether submitted values pass those algorithmic tests; if they don’t, the app explains what is wrong. It includes a sub-interface specialized in forming accurate machine-readable values for durations, dates, and times—those are especially tricky, but the app makes them a breeze.

TEI-XML Components is conceived as a companion tool to the website. The latter is actually embedded within the app, both through certain links but also directly and conveniently through an internal web browser. A copyright statement clarifies the extent of TEI’s intellectual property within the application (under the last tab labeled ©). Most important is that the app is fully updatable. The TEI Guidelines get updated every six months. A set of simple commands in the app’s menubar allows users to update the app in a few minutes.

The app comes with a comprehensive user guide in the form of a PDF, which is itself viewable within the app itself via the Help menu. Every object in the app (fields, buttons, widgets), when visited by the mouse pointer, displays a helpful tooltip that briefly describes its purpose. The User Guide is abundantly illustrated. The User Guide can be downloaded by clicking this URL.




Explore TEI-XML Files

Explore TEI-XML Files is a second utility, accessible from within TEI-XML Components via its Explore menu. This second utility facilitates the navigation of XML files and the extraction of encoded data from them. “Facilitation” means that users do not need to master programming languages such as XQuery to query the XML. Once an XML file has been downloaded, imported, or pasted into Explore TEI-XML Files, users need only use pull-down menus to examine complete alphabetized lists of tag elements, attributes, and values found in the XML file. Users then select any of those XML components in any order to display their related contents or related encodings (as the case may be) in the interface’s bottom field. Clicking any line in that bottom field selects and displays the related encoding in the XML file in the top field.

While Explore TEI-XML Files accommodates any XML file, some of its algorithms have been optimized to handle TEI-XML files especially well, for the main impulse behind this utility comes from a desire to facilitate TEI-XML transactions, so to speak. Shortcut buttons are provided to move from a selected element, attribute, or value to their full descriptive display in TEI-XML Components.

The last eight pages of the TEI-XML Components User Guide explain and illustrate how to use Explore TEI-XML Files.




Explore the XML:LANG UTILITY App

The XML:Lang Utility is a third application, accessible from within TEI-XML Components via its Explore menu. This third utility may look small but provides many services. It was born from the simple desire to help XML practitioners fill in correct values for the ubiquitous xml:lang attribute—the attribute in charge of identifying the language in which any portion of a text has been written or spoken. Behind that attribute is a whole universe of global scholarship with a long history driven by the research of linguists and ethnologists. The need to identify languages correctly is paramount for research.

Identifying a language is no easy task, especially because each language tends to evolve and vary greatly across space and time. The need for standardization is global, especially when considering the duty to create sharable encodings that remain valid and trustworthy over the long haul.

The IANA (Internet Assigned Numbers Authority) has established the so-called “Language Subtags Registry”, a large database that provides a unique identifier for thousands of languages, dialects, idioms, scripts, and orthographies. That Registry is built upon ISO 639-1, ISO 639-2, and ISO 639-3. The W3C’s internationalization effort recommends the use of the IANA Registry for selecting codes for languages.

The @xml:lang attribute needs to comply with those codes. Such codes have a particular structure that takes in consideration genealogical dependencies among languages, their country or region of practice, their written rendition, and their variations. The Registry provides that type of information, frequently with additional comments and cross-references, for all registered languages and scripts.

The Registry is a work in progress and depends on agreements among linguists and ethnologists. Classifying a language, whether dead, endangered, or alive, is a complex phylogenetic matter. The whole endeavor is utterly fascinating, and it is that fascination that brought this third application to do a lot more than merely providing the correct and well-structured code for any registered language. The app also helps users discover languages and dialects, and uses TEI-XML Components’ internal web browser to display more information than is available in the Registry itself.

In the XML universe, use of the attribute @xml:lang can help disambiguate words across languages. The word “pain” for instance, if encoded <w xml:lang="fr">pain</w>, is a French word that means “bread” in English but, if encoded <w xml:lang="en">pain</w>, is an English word that means “douleur” in French. The IANA registry allows users to indicate the language of a text most precisely: an Ancient Greek word (up to 1453) will call for the "grc" code, while a modern Greek word will need the "el" code—not to mention Cappadocian Greek ("cpg"), Mycenean Greek ("gmy"), and Romano-Greek ("rge"). A performance done in Greek sign language would be encoded "gss", and even more precisely, though not indispensably, "sgn-gss". The app makes all such distinctions very plain, and one pedagogical advantage is to excite curiosity within the minds of students and other learners, while increasing their historical and linguistic sensitivity.

The app comes with its own downloadable User Guide, which explains how to use the XML:Lang Utility with plenty of illustrations. That user guide is accessible directly with TEI-XML Components via a command in its Help menu, a command that displays the user guide within TEI-XML Components’s own internal PDF viewer, a viewer that comes with a menu that helps navigate each section of the guide. Following is a sample of illustrations.




Download TEI-XML Components

There are five versions of TEI-XML Components, one for the MacOS (64 bit), two for Windows (32 and 64 bit), and two for Linux (32 and 64 bit). The Mac and Windows versions provide identical functionalities and work in the same way. The Linux versions work the same except for the app’s internal browser, which is not compatible; the app in this case launches the Linux computer’s default browser instead. All versions include the two companion apps, Explore TEI-XML Files and XML:Lang Utility. The software was developed on a Mac which gives the interface an aesthetic quality that unfortunately cannot be matched in Windows or Linux.

  The software is provided free of charge under a BY-NC-ND Creative Commons License defined within a document that accompanies the software.

Click one of the buttons below to download the software. Version 1.3.3 released November 3, 2021.

Double-click the zip file. Install the application preferably inside the Documents folder, not within the Applications or Program folder.

Once the software is installed, open the folder “TEI-XML Components 1.3.3,” and within it a Read_Me_First file, a license file, and the application. RIGHT-CLICK the application’s icon (a modified version of the TEI icon) and select Open in the dropdown menu. You will likely need to give permission to the software to run on your Mac. A splash screen will come and go quickly, and then the application will come into view.

Double-click the zip file. Install the Windows32 or Windows64 folder preferably inside the Documents folder.

Once the software is installed, open the folder; within it there are a Read_Me_First file, a license file, and the folder “TEI-XML Components 1.3.3” that contains the application. Follow the installation instructions in the Read_Me_First file.

In Windows, if the zip file or folder appears in green letters in the directory, that's because it remains encrypted. Right-click it, choose “Properties” at the bottom of the pop-up dialog, click the “Advanced...” button under the “General” tab, uncheck the checkbox “Encrypt contents to secure data,” then click OK. This will unencrypt the file, and the filename will turn black. You will need to give the software permission to run on your computer.

Double-click the zip file. Install the Linux32 or Linux64 folder preferably inside the Documents folder.

Once the software is installed, open the folder; within it there are a Read_Me_First file, a license file, and the folder “TEI-XML Components 1.3.3” that contains the application. Follow the installation instructions in the Read_Me_First file.

When launching the Linux version of the software’s .exe file, permission may be required. Execute it via the command chmod +x TEI-XML Components .

For all other details, please read the User Guide, accessible through the Help menu in TEI-XML Components, or viewable at this URL.


Responsive image
Give Now