The TEI Technical Council
TEI TCW 32:
Customizing TEI to Check Pointers
Text Encoding Initiative Consortium
2022

Table of contents

Introduction

As with any other part of document production, when using URIs (web pointers) it is advantageous to check that they ‘work’ early in the production process. The aspects of a pointer we would want to test, in increasing order of difficulty, are that the URI:

  1. points to something (i.e., has proper syntax)
  2. points to something that can be retrieved (i.e., refers to an object that exists and to which the user has access)
  3. points to the right kind of thing (e.g., the ref of <persName> should point to a <person>, not a <fileDesc> nor an SVG chart of COVID-19 cases)
  4. points to the right thing (e.g., the "#MLK" of <persName ref="#MLK">Martin Luther King, Jr.</persName> should point to an entry for Martin Luther King, Jr., not an entry for Lois Lane).

Most of the main closed schema languages, including PureODD and RELAX NG (but excluding DTDs), have the capability to test that an information item (in our case, and thus hereafter, an attribute value) meets the syntactic constraints of a URI. That is, we can test (1) above just by using our schemas. But none of those languages (including PureODD without use of <constraintSpec>) have the capability to test (2)–(4).

The open schema language used by TEI (ISO Schematron) does allow for (2) and (3). However, a) it is difficult to associate the needed tests with all teidata.pointer attributes, and b) it requires special-case coding for (3), depending on what is considered ‘right’ in the current circumstances. In any case, (4) is much more difficult, and usually requires a human, if not a human with domain-specific knowledge.

Thus the TEI Technical Council does not plan to include any general tests for (2)–(4) directly in TEI P5, at least not in the near future. However, recognizing that at least (2) & (3) represent an important subset of validation tests that projects want to perform, we exemplify herein some mechanisms for doing so. The intent is that projects can copy-and-paste ODD fragments from this file into their own, and modify as desired to suit their local needs.

Note, however, that this document only discusses testing of URIs that are intended to return an XML document or fragment thereof. It is possible to use URIs to point to and retrieve other sorts of information objects including JSON, images, HTML (other than XHTML), audio files, word processor files, etc., but we do not consider these cases herein.

Also note that the examples in this document often use an attribute as the context node of an ISO Schematron rule. Some Schematron processors fail to process this correctly. (The oXygen XML Editor handles these perfectly well, as does David Maus’s SchXslt.)

There are over 100 attributes in TEI defined as teidata.pointer.1 Roughly 60 of those are restricted to having only 1 URI as a value, but the other nearly 50 permit multiple URIs in the value. Processing multiple pointers is much more difficult than handling a single URI, and thus we consider the singleton cases separately from the cases in which multiple URI values of a single pointer attribute need to be tested.

We first discuss cases where items are being directly pointed to (first in the singleton case, then with multiple pointers), and then cases where an item is being referred to indirectly either via an intermediate <link> or <alt> element, or via a prefix defined in the <prefixDef>. As a last case, we demonstrate ensuring that the validation is being performed after XInclude processing.

1. Direct reference by single value

Projects that intend to always use one and only one URI as the value of an attribute which by default in TEI may take multiple URIs will probably find it best to constrain said attribute to only 1 URI. For example, if the (fictional) project ‘The Papers of Dr. Virgil Swann’ were to provide correspondence links between Dr. Swann’s English translation of and his transcription of an intercepted Kryptonian message, each <s> element with an xml:lang of "en" would bear a corresp attribute that pointed to the corresponding <s> element with an xml:lang of "x-kr". The TEI corresp attribute allows one or more URIs. But for this project, the corresp of <s> should be limited to only 1 URI. This can be accomplished as shown in Figure 1, PureODD to limit attribute to one URI.

1.1. shorthand pointer fragment identifier

The fragment identifier portion of a URI is that which follows the first # (read left-to-right). The portion of the URI before the first # locates the document of interest; the portion after the first # locates the element of interest within said document. The portion after the first # can take many forms, only one of which we consider in this document: the shorthand pointer fragment identifier. It is probably by far the most common form of fragment identifier; it refers to an element in the document in question by referring to its ID. (In TEI the ID is always indicated by the xml:id attribute.) This typically looks like either

https://www.w3.org/TR/xptr-framework/#shorthand

or

#DeRoseetal1990

If there is no document mentioned to the left of the first #, then the document being referred to is the base document, which is typically the current document in which the URI appears (but which can be modified by the xml:base attribute).

First, just to demonstrate how this works in general, we show in Figure 2, Ensure ref of g is a shorthand pointer how to test that the ref attribute of <g> has the correct syntax to point to something in the same file. However, we also note that this check could be expressed in PureODD without resorting to Schematron. This PureODD-only mechanism, exemplified in Figure 3, Ensure ref of g is a shorthand pointer, PureODD, has the sizable advantage that the testing is performed by the closed schema language. Note, however, that although this use of the restriction attribute is supported by the TEI’s current ODD processor, other uses may not be; in particular, restriction is currently only usable with the name attribute, not the key attribute of <dataRef>.

To check that a shorthand pointer fragment identifier URI points to something that can be retrieved (i.e., to test a pointer that looks like #duck for (2), above) we can take advantage of the XPath id() function. This technique is used in Figure 4, Check that ref of g points to something to ensure that the ref of a <g> actually points to something in the current document. A call to id('tho') returns the element from the same document, if there is one, that bears an xml:id of "tho". There is supposed to be at most one such element; if there were two or more, only the first is returned. Thus id('tho') may be thought of as //*[@xml:id eq 'tho'][1]2.

To check that a shorthand pointer fragment identifier URI specifically points to a particular element type (in this case <char> or <glyph>), we simply append a node test, as seen in Figure 5, Check that ref of g points to a char or a glyph. Note that references to TEI elements in the XPath expressions in the Schematron inside <constraint> need to be explicitly bound to the TEI namespace. Although an ODD author may define any namespace prefix for the purpose (using the Schematron <ns> element), TEI ODD software will automatically insert a definition that binds the prefix tei: to the TEI namespace.

1.2. local filesystem

A URI may refer to a file on the local filesystem using either an absolute-path reference or a relative-path reference.3 An absolute-path reference starts with a slash; a relative-path reference does not. A relative-path reference may start with a dot segment (./), unless the first segment of its path contains a colon, in which case it must start with a dot segment.

The URI that refers to a file on the local filesystem may have a # followed by a fragment identifier. Here we first consider testing that a URI points to a local file, then that it points to a local file with a particular file extension, then that it points to a local file with a particular root element, and last that it points to a particular element type in a local file.

1.2.1. entire local file

A <moduleRef> element is typically used to refer to one of the TEI modules (for example, "core", "gaiji", or "namesdates") from a customization ODD file using its key attribute. But <moduleRef> can also refer to a non-TEI module using its url attribute. The TEI schema ensures that the value of the url attribute is, in fact, a URI (that is, it performs test (1)). It would be quite reasonable for a project to want to check that the value of url was a URI that referred to an existing, readable, local XML file.

It is possible to write generic constraints for this purpose that would allow any valid URI that referred to a local file (whether an absolute path reference, a relative path reference, or a file URL scheme; see Figure 20, Test for any local URI) or that a value uses a private URI scheme prefix defined by a <prefixDef> (see Figure 21, ref uses defined prefix). However, in most cases projects would probably want to constrain the value to a particular method for referring to an existing, readable, local XML file (if not a particular file, for which see Figure 6, Require the url of moduleRef to refer to a particular file). For example, Figure 7, Check that url of moduleRef refers to an RNG file in the same directory demonstrates a method for requiring that the url attribute of <moduleRef> refers to a local file that is in the same directory as the instance ODD and whose filename ends in ".rng" (and thus is probably a RELAX NG file in the XML syntax).

We may wish to ensure that the file referred to (whether local or remote) is readable, well-formed XML. Luckily, XPath provides the doc-available() function for this very purpose. The constraints demonstrated in Figure 8, Ensure file is readable XML first ensure that the information item referred to by the url attribute is a file, not an element, and then require that the file be readable, well-formed XML. The example at Figure 9, Ensure file is readable RELAX NG grammar duplicates these constraints, and also tests that the outermost element of the retrieved file is an <rng:grammar> element.

1.2.2. element of a local file

The ref attribute of <persName> should typically point to a <person> element, which will often be in a separate ‘personography’ file. Presuming that file is in a known location in the local filesystem, Figure 10, Require persName to refer to a person in the local personography can be used to test that the ref attribute refers to a <person> in that file.

1.3. remote pointers

Checking remote pointers is in principle very similar to checking local pointers, but with different tests on the syntax of the URI. (They are also, of course, a bit harder to test because you must have a working internet connection (or alternatively use an XML Catalog), and have to worry about firewalls, proxies, same-source problems, cached files, etc.) For example, Figure 11, Ensure uri of equiv is present and refers to an item on a particular page ensures that any <equiv> element has a uri attribute, and further that said attribute refers to the (fictional) ‘markup_taxonomy’ page of the WWP website, allowing a reference to that page on either the production or test version of the website, and allowing access with or without specifying a secure connection. Note that this example only checks the syntax of the uri attribute, and does not ensure that there actually is such a page or specific element on that page.

The Figure 12, Require a filter on equiv that points to an XSLT program example, on the other hand, ensures that the filter attribute of <equiv> points to an XSLT program. It does this by testing that either the namespace of the outermost element of the retrieved file is the XSLT namespace (because the outermost element of a ‘normal’ XSLT program could be either <xsl:stylesheet> or <xsl:transform>), or that the outermost element has a xsl:version attribute (because a simplified stylesheet must have such an attribute). Thus, in order for a filter attribute to pass this test, it not only needs the right syntax, but it must also point to an XSLT program that exists and is accessible via the web.

2. Direct access by multiple values

If a pointer attribute may have multiple values, testing is mildly more difficult because the attribute needs to be parsed first. In the general case, delivering a precise error message is quite difficult, as the entire process needs to be handled in XPath because Schematron does not have an iteration construct. However, specific cases may be reasonably easy to handle. Figure 13, rendition points to 1 or 2 renditions tests that a rendition attribute refers to at most 2 <rendition> elements in the same file. This is not particularly difficult because of the restriction that there are at most 2 pointers.

But in many, if not most, cases there is no such restriction. For example, there may be dozens of witnesses to a particular manuscript manifestation. Figure 14, Check that each pointer in wit points to witness tests that each pointer in the value of a wit refers to a <witness> element in the same file. In addition to tests similar to the previous example, this example reports to the user what each pointer in a failed value is, in fact, pointing to.

3. Possibly indirect reference

We have already demonstrated a methodology for ensuring that the ref of <persName> points to a <person>. One recommended method for encoding an ambiguous reference to a person is to use the <alt> element — the encoding of the ambiguous reference itself points to an <alt> element which, in turn, points to each of the possible <person>s. (See https://wwp.northeastern.edu/outreach/seminars/_current/presentations/contextual_encoding/advanced_context_09.xhtml for a sample encoding.) Similarly, a reference to multiple individuals could be encoded as multiple pointers on a single ref, or could be encoded as a single pointer on the ref that points to a <link> element which, in turn, points to each <person> referred to. One advantage to this latter method is that you can restrict the ref of <persName> to one and only one pointer, and use similar constraints for both ambiguous and multiple references. Figure 15, Required ref of persName eventually refers to person demonstrates such constraints.

A similar set of constraints can be expressed in a simpler, perhaps easier to follow, way by splitting them up as constraints on the ref of <persName> and the target of <link> and <alt> separately, as demonstrated in Figure 16, Required ref of persName eventually refers to person using abstract patterns. Because the code for <alt> and <link> is somewhat complicated, this technique expresses that code only once as an abstract Schematron pattern, which is instantiated separately, once for <alt> and once for <link>.

4. Reference via <prefixDef>

The TEI provides a method of indirection for both shortening URLs and having a single place to change a set of URIs. This mechanism makes use of a local URL prefix and a definition of how that prefix is mapped to a full URI which is expressed in a <prefixDef> element. Creating pointers using this method is simpler, shorter, easier to read (and thus proofread), and reduces the chance for errors in the first place. Checking pointers that use this method, however, is significantly more difficult. As with several other types of pointer checking, it is the general case that is most difficult; particular cases may be reasonably easy.

Here we will limit ourselves to the simple (and to our knowledge far most common) case in which the matchPattern follows the shorthand pointer syntax or a subset thereof, and the replacementPattern follows the syntax of a URL except it ends with #$1. I.e., the case in which the matched bit is the shorthand pointer, as seen in an example from the Guidelines. While example Figure 17, Resolve prefixDef for references to people limits itself to <prefixDef>s of this sort, it does not limit itself to any particular prefix. The document is searched for possible prefix values.

5. XInclude, yet?

It is often the case that we know certain tests will fail before XInclude processing, whereas we hope they would succeed after XInclude processing.4 For example, imagine that at our ‘The Papers of Dr. Virgil Swann’ project the extant documents to be encoded (which include letters, scientific notebooks, satellite schematics, a dictionary, and translations of intercepted radio transmissions) have been categorized using a project-specific taxonomy. This taxonomy is encoded as a <taxonomy> element. Since each TEI document refers to this taxonomy (from its /TEI/teiHeader/profileDesc/textClass/catRef/@target), the project has chosen to have a copy of the entire project taxonomy in each TEI document’s header (in its /TEI/teiHeader/encodingDesc/classDecl). In order to avoid multiple copies of the same information, the project has chosen to store the <taxonomy> in a separate document and use XInclude to insert it into each TEI header.5

Given this situation, imagine now that the project wishes to check that the target attributes actually point to one or more <category> elements. This is problematic, because the <category> elements are not actually in the file as it sits unprocessed. There are a variety of ways this could be handled, probably the easiest of which is simply to skip the test (and warn the user it is being skipped) if XInclude processing has not yet taken place.

There are two straightforward methods of asking the question ‘has XInclude processing taken place yet?’. The first relies on the fact that after XInclude processing, no elements from the XInclude namespace should remain: they should have become the file to be included or, in case of error, the contents of the <xi:fallback>. This method is exemplified in Figure 18, XIncluded yet? — one size fits all.

The second relies on the fact that before XInclude processing the file does not have a <taxonomy>, and after XInclude processing it does. This method is exemplified in Figure 19, XIncluded yet? — per-test reporting.

Appendix A Examples

Figure 1. PureODD to limit attribute to one URI
  <elementSpec module="analysis" ident="s" mode="change">
    <attList>
      <attDef ident="corresp" mode="change">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef key="teidata.pointer"/>
        </datatype>
      </attDef>
    </attList>
  </elementSpec>

In this example the corresp of <s> is altered so that its value must be one and only one URI.

Figure 2. Ensure ref of <g> is a shorthand pointer
  <elementSpec ident="g" module="gaiji" mode="change">
    <attList>
      <attDef ident="ref" mode="change">
        <constraintSpec ident="g_has_shorthand_ref" scheme="schematron">
          <desc>Somewhat useless example that just tests
          that the <emph>syntax</emph> of the <att>ref</att>
          attribute of <gi>g</gi> is nothing but a shorthand
          pointer, e.g. <val>#duck</val>. This example takes
                    advantage of the fact that a
          <gi>constraintSpec</gi> specified inside an
          <gi>attDef</gi> has that attribute as its
          context.</desc>
          <constraint>
            <sch:let name="ref" value="normalize-space(.)"/>
            <sch:assert test="matches( $ref, '#\i\c*' )">
              The value of the ref= attribute of 'g' ("<sch:value-of select="."/>") is supposed to be a shorthand pointer; i.e., should look like '#duck'.
            </sch:assert>
          </constraint>
        </constraintSpec>
      </attDef>
    </attList>
  </elementSpec>

Snippet of ODD code that uses Schematron to enforce that the value of ref of <g> is a shorthand pointer. The regular expression \i\c* matches an XML Name, and thus #\i\c* matches a shorthand pointer to an XML element with an ID attribute.

Figure 3. Ensure ref of <g> is a shorthand pointer, PureODD
  <elementSpec module="gaiji" ident="g" mode="change">
    <attList>
      <attDef mode="replace" ident="ref" usage="req">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef key="xmpdata.shorthandPointer"/>
        </datatype>
        <remarks>
          <p>This attribute must be a single shorthand
                    pointer to an element in the same XML file, i.e.
          must look like <code>#duck</code>.</p>
        </remarks>
      </attDef>
    </attList>
  </elementSpec>
  <!-- ... -->
  <dataSpec ident="xmpdata.shorthandPointer">
    <desc>defines the range of attribute values used to
    provide a single shorthand pointer URI</desc>
    <content>
      <dataRef name="anyURI" restriction="#\i\c*"/>
    </content>
    <remarks>
      <p>Formerly referred to as a
      <soCalled>barename</soCalled>
      identifier.</p></remarks>
  </dataSpec>

Two snippets of ODD code which together use a facet to enforce that the value of ref of <g> is a shorthand pointer.

While somewhat more complicated, this method has the advantage that it is both faster and generally easier on the encoder, in that the constraint will be expressed in RELAX NG (or the W3C Schema Language), and modern XML editors (like oXygen) can enforce such constraints immediately, as you type.

Figure 4. Check that ref of <g> points to something
  <constraintSpec ident="g_points_locally" scheme="schematron">
    <desc>Example of how to test that the <att>ref</att> of <gi>g</gi>
    points to an element in the same document.</desc>
    <constraint>
      <sch:let name="ref" value="substring( normalize-space(.), 2 )"/>
      <sch:assert test="id( $ref )">
        The ref= of this 'g' element ("<sch:value-of select="."/>") does not point to an element in this document.
      </sch:assert>
    </constraint>
  </constraintSpec>

In this example we presume that the ref of <g> is a shorthand pointer. Thus this constraint should be used in conjunction with something like Figure 2, Ensure ref of g is a shorthand pointer or Figure 3, Ensure ref of g is a shorthand pointer, PureODD.

Figure 5. Check that ref of <g> points to a <char> or a <glyph>
  <constraintSpec ident="g_points_to_local_char_or_glyph" scheme="schematron">
    <desc>Example of how to test that the <att>ref</att> of
    <gi>g</gi> points to a local <gi>char</gi> or
    <gi>glyph</gi>.</desc>
    <constraint>
      <sch:let name="ref" value="substring( normalize-space(.), 2 )"/>
      <sch:assert test="id( $ref )/self::tei:char | id( $ref )/self::tei:glyph">
        The ref= of 'g' is supposed to point to a 'char' or 'glyph'; this one ("<sch:value-of select="."/>") points to a '<sch:value-of select="local-name( id( $ref ) )"/>'.
      </sch:assert>
    </constraint>
  </constraintSpec>

In this example we presume that the ref of <g> is a shorthand pointer that actually points to something in the current document. Thus this constraint should be used in conjunction with a constraint like Figure 4, Check that ref of g points to something (which itself should be used in conjunction with a constraint like Figure 2, Ensure ref of g is a shorthand pointer or Figure 3, Ensure ref of g is a shorthand pointer, PureODD).

There are several other ways to express the XPath in the test of the <sch:assert>, of course. E.g., many would consider id( $ref )[ self::tei:char | self::tei:glyph ] better because it is more compact. In either of these example XPaths the union operator (|, which may also be expressed union) could be replaced with or to indicate an OrExpr.

Note: use self::tei:char, not self::char — the use of the namespace prefix in the XPaths in test attributes is currently required.

Figure 6. Require the url of <moduleRef> to refer to a particular file
  <elementSpec ident="moduleRef" module="tagdocs" mode="change">
    <attList>
      <attDef ident="url" mode="change">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef key="teidata.enumerated"/>
        </datatype>
        <valList type="closed" mode="add">
          <valItem ident="./kryptonian.rng">
            <desc>Our special module for Kryptonian constructs</desc>
          </valItem>
        </valList>
      </attDef>
    </attList>
  </elementSpec>

ODD fragment which requires that the value of url of <moduleRef> be "./kryptonian.rng".

Figure 7. Check that url of <moduleRef> refers to an RNG file in the same directory
  <elementSpec ident="moduleRef" module="tagdocs" mode="change">
    <attList>
      <attDef ident="url" mode="change">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef name="anyURI" restriction="\./.+\.rng"/>
        </datatype>
        <remarks>
          <p>The <att>url</att> attribute must refer to a file in the
                    same directory as the ODD customization file, and must have the
          file extension <val>.rng</val>.</p>
        </remarks>
      </attDef>
    </attList>
  </elementSpec>
Figure 8. Ensure file is readable XML
  <elementSpec ident="moduleRef" module="tagdocs" mode="change">
    <attList>
      <attDef ident="url" mode="change">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef name="anyURI" restriction="[^ #]+"/>
        </datatype>
        <constraintSpec scheme="schematron" ident="moduleRef_url_readable">
          <constraint>
            <sch:assert test="doc-available( resolve-uri( ., base-uri(/) ) )">Module <sch:value-of select="."/> is not readable, well-formed XML.</sch:assert>
          </constraint>
        </constraintSpec>
        <remarks>
          <p>The <att>url</att> attribute must refer to a file, not
          an element.</p>
        </remarks>
      </attDef>
    </attList>
  </elementSpec>

The constraints placed on the url attribute of <moduleRef> by the restriction attribute ensure both that the value is a single reference (by disallowing whitespace), and also is not a reference to a particular element within a file (by disallowing number sign (‘#’, U+0023)).

The <constraintSpec> then tests that said file is readable, well-formed XML. The use of resolve-uri() assures that if a relative URI is used, the document we attempt to retrieve is the one at that URI relative to the input document, not the Schematron schema.

Figure 9. Ensure file is readable RELAX NG grammar
  <elementSpec ident="moduleRef" module="tagdocs" mode="change">
    <attList>
      <attDef ident="url" mode="change">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef name="anyURI" restriction="[^ #]+"/>
        </datatype>
        <constraintSpec scheme="schematron" ident="moduleRef_url_relaxng">
          <constraint>
            <sch:assert test="doc-available( resolve-uri( ., base-uri(/) ) )
                              and
                              doc( resolve-uri( ., base-uri(/) ) )/rng:grammar"
>
Module <sch:value-of select="."/> is not a readable, well-formed RELAX NG grammar in the XML syntax. </sch:assert> </constraint> </constraintSpec> <remarks> <p>The <att>url</att> attribute must refer to a RELAX NG grammar in the XML syntax.</p> </remarks> </attDef> </attList> </elementSpec>

See comments for Figure 8, Ensure file is readable XML.

The prefix rng: is automatically bound to the RELAX NG namespace by the TEI ODD processor.

Note that the test for doc-available() is not strictly necessary; the same pointers will succeed or fail with or without that test. But with the doc-available() test the user gets the custom error message when the file is not readable well-formed XML; without it, the user gets a generic error message, e.g. ‘No such file or directory’.

Figure 10. Require <persName> to refer to a <person> in the local personography
  <elementSpec ident="persName" module="namesdates" mode="change">
    <attList>
      <attDef ident="ref" mode="change">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef name="anyURI" restriction="(\./)?persons\.xml#\i\c*"/>
        </datatype>
        <constraintSpec scheme="schematron" ident="ref_points_to_person">
          <constraint>
            <sch:let name="file" value="substring-before( ., '#')"/>
            <sch:let name="ID" value="substring-after( ., '#')"/>
            <sch:let name="element_found" value="doc( $file )//id( $ID )"/>
            <sch:assert test="$element_found[ self::tei:person ]">
                        persName should refer to a person; this one
              refers to <sch:value-of select="
                if ($element_found )
                then concat( 'a ', local-name( $element_found ) )
                else 'nothing'"
/>. </sch:assert> </constraint> </constraintSpec> <remarks> <p>The <att>ref</att> attribute must refer to a <gi>person</gi> element in the persons.xml file.</p> </remarks> </attDef> </attList> </elementSpec>

This constraint takes advantage of Schematron’s facility for storing values in ‘variables’.

Figure 11. Ensure uri of <equiv> is present and refers to an item on a particular page
  <elementSpec ident="equiv" module="tagdocs" mode="change">
    <attList>
      <attDef ident="uri" mode="change" usage="req">
        <datatype minOccurs="1" maxOccurs="1">
          <dataRef name="anyURI" restriction="https?://www\.wwp(-test)?\.(northeastern|neu)\.edu/markup_taxonomy.xhtml#\i\c*"/>
        </datatype>
        <remarks>
          <p>The <att>url</att> attribute is required, and
                    must refer to an ID on our markup_taxonomy
          page.</p>
        </remarks>
      </attDef>
    </attList>
  </elementSpec>

Note that this constraint ensures that the uri attribute has the correct syntax, but does not actually try to retrieve the document it refers to.

The ‘our markup_taxonomy’ web page is accessible at eight different possible URLs; the restriction allows only these eight URLs.

Figure 12. Require a filter on <equiv> that points to an XSLT program
  <elementSpec ident="equiv" module="tagdocs" mode="change">
    <attList>
      <attDef ident="filter" mode="change" usage="req">
        <constraintSpec scheme="schematron" ident="filter_is_XSLT">
          <constraint>
            <sch:ns prefix="xsl" uri="http://www.w3.org/1999/XSL/Transform"/>
            <sch:let name="xsltNS" value="'http://www.w3.org/1999/XSL/Transform'"/>
            <sch:let name="relURI" value="resolve-uri( normalize-space(.), base-uri(/) )"/>
            <sch:assert test="doc($relURI)/*[namespace-uri() eq $xsltNS  or  @xsl:version]">
                        The filter must be an XSLT program!
            </sch:assert>
          </constraint>
        </constraintSpec>
      </attDef>
    </attList>
  </elementSpec>

Note that this constraint actually tries to retrieve the document filter refers to. This results in more thorough testing than Figure 11, Ensure uri of equiv is present and refers to an item on a particular page, but requires a working internet connection or re-direction via an XML Catalog.

The use of resolve-uri() assures that if a relative URI is used, the document we attempt to retrieve is the one at that URI relative to the input document, not the Schematron schema.

Note that the TEI processors do not automatically bind the xsl: prefix to the XSLT namespace. It is common to put all of your <sch:ns> elements in a <constraintSpec> that is a direct child of the <schemaSpec> of your customization ODD.

Figure 13. rendition points to 1 or 2 <rendition>s
  <classSpec ident="att.global.rendition" mode="change" type="atts">
    <attList>
      <attDef ident="rendition" mode="change">
        <datatype minOccurs="1" maxOccurs="2">
          <dataRef key="teidata.pointer"/>
        </datatype>
        <constraintSpec scheme="schematron" ident="rendition_ponts_to_rendition">
          <desc>Ensure that each pointer in a
          <att>rendition</att> points to a
          <gi>rendition</gi>.</desc>
          <constraint>

            <!-- Get sequence of 1 or 2 pointers -->
            <sch:let name="RENDITIONs" value="tokenize( normalize-space(.),' ')"/>
                      
            <!-- Test that each starts with '#' -->
            <sch:assert test="starts-with( $RENDITIONs[1], '#')">
                        First rendition reference does not start with a number sign.
            </sch:assert>
            <sch:assert test="empty( $RENDITIONs[2] )  or  starts-with( $RENDITIONs[2], '#')">
                        Second rendition reference does not start with a number sign.
            </sch:assert>
                      
            <!-- Test that they point to <rendition>s -->
            <!--
                Note: since leaving off the '#' is BY FAR
                the most common error, we actually test the
                two tokens that should start with '#'
                whether or not they do (adding the '#' if
                needed), thus avoiding two error messages
                for "style03" when "#style03" would have
                worked.
            -->
            <sch:let name="RND_ONE" value="replace( $RENDITIONs[1], '^#','')"/>
            <sch:let name="RND_TWO" value="replace( $RENDITIONs[2], '^#','')"/>
            <sch:assert test="id( $RND_ONE )/self::tei:rendition">
              The first rendition reference (<sch:value-of select="$RENDITIONs[1]"/>) does not point to a
                        local 'rendition' element.
            </sch:assert>
            <sch:assert test="empty( $RND_TWO )  or  id( $RND_TWO )/self::tei:rendition">
              The second rendition reference (<sch:value-of select="$RENDITIONs[2]"/>) does not point to a
                        local 'rendition' element.
            </sch:assert>
          </constraint>
        </constraintSpec>
      </attDef>
    </attList>
    <remarks>
      <p>Here at the <title>The Papers of Dr. Virgil
      Swann</title> project the <att>rendition</att> attribute
                is limited to 1 or 2 values. If a more complex description
                of rendition is required, the individual CSS declarations
                (property:value pairs) should be combined into a new
      <gi>rendition</gi>, and the <att>rendition</att> should
      point at that.</p>
      <p>On generation of schemas the TEI ODD software will
      issue an information message: <quote>constraint for
                @rendition of the att.global.rendition class does not have
                a context=. Resulting rule is applied to *all* occurrences
      of @rendition.</quote> Since we do not declare any other
      <att>rendition</att> attributes, this is not a
      concern.</p>
    </remarks>
  </classSpec>

Note that the comparison $VAR eq '' could have been used instead of the function empty( $VAR ).

Figure 14. Check that each pointer in wit points to <witness>
  <elementSpec ident="rdg" module="textcrit" mode="change">
    <attList>
      <attDef ident="wit" mode="change">
        <constraintSpec scheme="schematron" ident="wit_ponts_to_witnesses">
          <constraint>

            <!-- value is a string (supposedly of shorthand pointers); change it to a sequence -->
            <sch:let name="WITs" value="tokenize( normalize-space(.), ' ')"/>

            <!-- check each starts with number sign -->
            <sch:assert test="every $WIT in $WITs satisfies starts-with( $WIT, '#')">
                        One (or more) of the pointers in this @wit attribute does (do) not start with a '#'.
            </sch:assert>

            <!--
                develop sequence of ID values (i.e., IDREFs), rather than shorthand pointers,
                by stripping off '#', if present, from each
            -->
            <sch:let name="WITIDs" value="for $WIT in $WITs return replace( $WIT, '^#','')"/>
            <!--
                Note that we actually test the pointer whether it had
                a '#' or not, thus avoiding two error messages for
                "wit015" when "#wit015" would have worked.
            -->
                      
            <!-- Test that each points to a <witness>, storing result as boolean -->
            <sch:let name="points_to_witness" value="for $ID in $WITIDs return exists( id( $ID )[self::tei:witness] )"/>

            <!--
                Now we have a sequence of boolean values. If
                even one is false(), we have a problem that
                needs to be reported to the user.
            -->
            <sch:report test="$points_to_witness = false()">
                        One (or more) of the pointers of this @wit does (do) not point
                        to a 'witness' element. The pointers of this @wit point to the
                        following items, in the order specified:
              <sch:value-of select="for $ID in $WITIDs return
                                      if ( exists( id( $ID ) ) )
                                        then local-name( id( $ID ) )
                                        else '*nothing*'"
/>. </sch:report> </constraint> </constraintSpec> <remarks> <p>Note that the <att>wit</att> is constrained here, not in the definition of <name>att.witnessed</name>, because the only other element that is a member of <name>att.witnessed</name> is <gi>lem</gi>, and at our project <gi>lem</gi> never uses the <att>wit</att> attribute.</p> </remarks> </attDef> </attList> </elementSpec>
Figure 15. Required ref of <persName> eventually refers to <person>
  <elementSpec ident="persName" module="namesdates" mode="change">
    <constraintSpec scheme="schematron" ident="people_have_refs">
      <constraint>
        <sch:rule context="tei:persName[ not( ancestor::tei:teiHeader ) ]">
          <sch:assert test="@ref">A transcribed personal name (as opposed to a personal name in the metadata) must have a @ref</sch:assert>
        </sch:rule>
      </constraint>
    </constraintSpec>
    <constraintSpec scheme="schematron" ident="persName_refers_eventually_to_person">
      <constraint>
        <sch:rule context="tei:persName/@ref">
          <sch:let name="direct_target" value="id( substring-after( normalize-space(.), '#') )"/>
          <sch:let name="IT" value="parent::*"/>
          <sch:let name="indirect_targets" value="
              if ( $direct_target/self::tei:person )
              then $direct_target
              else
                if ( $direct_target[self::tei:link|self::tei:alt][@type eq 'person'] )
                then for $it in tokenize( normalize-space( $direct_target/@target ) ) return id( substring-after( $it, '#') )
                else parent::* "
/> <sch:let name="targs_pt_to_person" value="for $targ in $indirect_targets return exists( $targ/self::tei:person )"/> <sch:report test="$targs_pt_to_person = false()"> At least one of the pointers in "<sch:value-of select="normalize-space(.)"/>" does not end up pointing to a 'person' element, even via a 'link' or 'alt' with type="person". </sch:report> </sch:rule> </constraint> </constraintSpec> </elementSpec>

This set of constraints requires that <persName> have a ref unless it is part of the document metadata, and also requires that the pointers on the ref attribute point to <person> elements, even if through intermediate <alt> or <link>.

Figure 16. Required ref of <persName> eventually refers to <person> using abstract patterns
  <constraintSpec ident="abstract_indirect_person" scheme="schematron">
    <constraint>
      <sch:pattern abstract="true" id="abstract_indirect_person">
        <sch:rule context="$alt_or_link_of_type_person">
          <sch:let name="TARGETs" value="tokenize( normalize-space( @target ), ' ')"/>
          <sch:let name="IDREFs" value="for $target in $TARGETs return replace( $target, '^#','')"/>
          <sch:let name="TARGET_ELEMENTs" value="for $idref in $IDREFs return
                            (: if it points to an element in this document … :)
                            if ( id( $idref ) )
                              (: … then return the element it points to … :)
                              then id( $idref )
                              (: … otherwise return current link or alt element, which is not a
                                 person element, so will be caught as an error, below. :)
                              else ."
/><!-- myself as target is an error either way :-) --> <sch:let name="TEs_ARE_PERSON" value="for $element in $TARGET_ELEMENTs return
                            exists( $element/self::tei:person )"
/> <sch:report test="$TEs_ARE_PERSON = false()">One or more of the items pointed to by @target of this <sch:value-of select="local-name(.)"/> (of type "person") does not point to a 'person' element.</sch:report> </sch:rule> </sch:pattern> </constraint> </constraintSpec> <classSpec ident="att.canonical" type="atts" mode="change"> <attList> <attDef ident="key" mode="delete"/> <attDef ident="ref" mode="change"> <desc>provides a definition for the entity being named or referred to via a single URI pointer</desc> <datatype minOccurs="1" maxOccurs="1"> <dataRef key="teidata.pointer"/> </datatype> <remarks> <p>If the word or phrase being encoded directly refers to multiple entities, this attribute should point to a <gi>link</gi> which in turn should point to the definitions.</p> <p>If the word or phrase being encoded is ambiguous, this attribute should point to a <gi>alt</gi> which in turn should point to the various possible definitions.</p> </remarks> </attDef> </attList> </classSpec> <elementSpec ident="alt" mode="change"> <constraintSpec scheme="schematron" ident="indirect_person_alt"> <constraint> <sch:pattern id="concrete_indirect_person_alt" is-a="abstract_indirect_person"> <sch:param name="alt_or_link_of_type_person" value="tei:alt[ ( @type, ../@type ) = 'person' ]"/> </sch:pattern> </constraint> </constraintSpec> </elementSpec> <elementSpec ident="link" mode="change"> <constraintSpec scheme="schematron" ident="indirect_person_link"> <constraint> <sch:pattern id="concrete_indirect_person_link" is-a="abstract_indirect_person"> <sch:param name="alt_or_link_of_type_person" value="tei:link[ ( @type, ../@type ) = 'person' ]"/> </sch:pattern> </constraint> </constraintSpec> </elementSpec> <elementSpec ident="persName" mode="change"> <constraintSpec scheme="schematron" ident="people_have_refs"> <constraint> <sch:rule context="tei:persName[ not( ancestor::tei:teiHeader ) ]"> <sch:assert test="@ref">A transcribed personal name (as opposed to a personal name in the metadata) must have a @ref</sch:assert> </sch:rule> </constraint> </constraintSpec> <attList> <attDef mode="change" ident="ref"> <constraintSpec scheme="schematron" ident="persName_refers_to_person_alt_or_link"> <constraint> <sch:let name="target" value="id( substring-after( normalize-space(.), '#') )"/> <sch:assert test=" $target/self::tei:person
                              | $target/self::tei:alt[ ( @type, ../@type) = 'person']
                              | $target/self::tei:link[ ( @type, ../@type ) = 'person']
                             "
>
The @ref of a personal name should refer to a 'person' element, or either a 'link' or 'alt' element with type="person"; but this one points to the '<sch:value-of select="name($target)"/>' with ID "<sch:value-of select="$target/@xml:id"/>".</sch:assert> </constraint> </constraintSpec> </attDef> </attList> </elementSpec>

This set of constraints makes use of the abstract pattern facility of ISO Schematron.

Figure 17. Resolve <prefixDef> for references to people
  <sch:rule context="tei:persName/@ref | tei:rs[@type eq 'person']/@ref | tei:author/@ref">

    <!-- remember the element type and attribute value we matched on -->
    <sch:let name="gi" value="local-name(..)"/>
    <sch:let name="val" value="normalize-space(.)"/>
              
    <!-- get a sequence of the references in the attr value -->
    <sch:let name="REFs" value="tokenize( $val )"/>

    <!-- get a sequence of the prefixes thereof[1] -->
    <sch:let name="PREFs" value="for $r in $REFs return substring-before( $r,':')"/>

    <!-- get a sequence of the URLs associated with each prefix[1] -->
    <sch:let name="URLs" value="for $prefix in $PREFs return
                                  substring-before(
                                    /*/tei:teiHeader
                                    /tei:encodingDesc/tei:listPrefixDef
                                    //tei:prefixDef[ @ident eq $prefix ]
                                    /@replacementPattern,
                                    '#'
                                  )"
/> <!-- get a sequence of the keys associated with each prefix[1] --> <sch:let name="KEYs" value="for $ref in $REFs return substring-after( $ref,':')"/> <!-- get a sequence of the elements pointed to by each key in the corresponding URL. --> <!-- If, for some reason, the URL & KEY combination does not point to a TEI element, record the current node as the target. (Thus we can test later to see if the pointer failed by testing for the current node.) --> <sch:let name="PERSONs" value="for $key in $KEYs return
                    if ( count( index-of( $KEYs, $key ) ) eq 1 )
                    then
                       if ( doc-available( $URLs[ index-of( $KEYs, $key )] ) )
                      then
                        if ( document( $URLs[ index-of( $KEYs, $key )] )//tei:*[ @xml:id eq $key ] )
                        then
                          document( $URLs[ index-of( $KEYs, $key )] )//tei:*[ @xml:id eq $key ]
                        else .
                      else .
                    else ."
/> <!-- If there are no references in @ref, that's an error; report it --> <sch:report test="count( $REFs ) eq 0"> Empty @ref of element '<sch:value-of select="$gi"/>' </sch:report> <!-- If there are any references that point to something other than <tei:person>, report it (them). --> <sch:report test="count( $PERSONs[ not( self::tei:person ) ] ) gt 0"> Each pointer in the @ref attribute of a '<sch:value-of select="$gi"/>' element should point to a 'person' element, but the pointers in this one ("<sch:value-of select="$val"/>") point to the following in the order specified: <sch:value-of select="if ( not( $PERSONs ) )
                  then 'nowhere'
                  else for $p in $PERSONs return
                    if ( $p is . )
                    then ' *nothing*'
                    else concat(' &lt;', name($p), '>')"
/>. </sch:report> <!-- [1] The PREFs, URLs, KEYs, and PERSONs sequences have the same number of items as the REFs sequence, but some of the items may be nil. --> </sch:rule>

The constraint above checks only ref attributes that occur on <persName>, <author>, and <rs type="person">.

Note that the constraints above only check the <prefixDef>s that are in the <teiHeader> child of the outermost element; <prefixDef>s inside nested <TEI> or <teiCorpus> elements are ignored. This can easily be changed by either removing just the asterisk (‘*’, U+002A) from the XPath, or replacing the entire precise XPath with //tei:prefixDef. These paths may be slower, however.

Heretofore we have avoided use of the less-than sign (‘<’, U+003C) and greater-than sign (‘>’, U+003E) inside messages to indicate ‘this is an element’. The only reason for this is that some Schematron processors produce ugly output when these characters occur in a message.

Figure 18. XIncluded yet? — one size fits all
  <sch:ns prefix="xi" uri="http://www.w3.org/2001/XInclude"/>
  <sch:rule context="/">
    <sch:report test="//xi:*">
                Error! XInclude processing has not been performed yet.
                Many rule-based tests will fail because of this.
    </sch:report>
  </sch:rule>

This is a very generic test which is easy to write and very fast, but does not tell the user which particular test will fail.

Note that the TEI processors do not automatically bind the xi: prefix to the XInclude namespace. It is common to put all of your <sch:ns> elements in a <constraintSpec> that is a direct child of the <schemaSpec> of your customization ODD.

Figure 19. XIncluded yet? — per-test reporting
  <sch:rule context="tei:textClass[ not( preceding::tei:textClass ) ]">
    <!-- The predicate above simply keeps this error to a
         single occurrence, instead of once for each
         <textClass>. The TEI permits multiple <textClass>
         elements in any given <profileDesc> (and multiple
         <profileDesc> elements in any given <teiHeader>);
         but most projects only use one, so you may well not
         need this predicate. -->
    <sch:assert test="/tei:TEI/tei:teiHeader//tei:taxonomy">
                There is no <taxonomy> in the header, so no <catRef> pointer can point into the taxonomy.
                (Have you performed XInclude processing yet?)
    </sch:assert>
  </sch:rule>
  <sch:rule context="tei:catRef[ /tei:TEI/tei:teiHeader//tei:taxonomy ]">
    <sch:let name="target" value="substring-after( normalize-space( @target ),'#')"/>
    <sch:assert test="id($target)/self::tei:category">
                This <catRef> does not point to a <category>!
    </sch:assert>
  </sch:rule>

In this example we presume that the target of <catRef> is a single shorthand pointer that actually points to something in the current document. Thus this constraint should be used in conjunction with constraints like those in Figure 1, PureODD to limit attribute to one URI and Figure 4, Check that ref of g points to something (which itself should be used in conjunction with a constraint like Figure 2, Ensure ref of g is a shorthand pointer or Figure 3, Ensure ref of g is a shorthand pointer, PureODD).

Figure 20. Test for any local URI

This example is different from the others in that it is not intended to be copied, pasted, modified, and used. (And thus has not been tested as well as the others.) It is really here only to demonstrate that a) testing for the syntax of a general URI can be done, and b) it is very complex. We suspect that the vast majority of TEI projects use only a very specific subset of local URI formats — e.g., only relative filepaths with or without a fragment identifier (e.g. "KRdict.xml#e05" or just "KRdict.xml"), or even only relative filepaths that start with an explicit dot-segment (e.g. "./KRdict.xml") — and thus would either test only for the syntax of that particular format, or simply test to see if the desired target is retrievable, which is demonstrated in Figure 7, Check that url of moduleRef refers to an RNG file in the same directory.

  <sch:rule context="tei:ptr[@type='localOnly']/@target">

    <!-- ******** First, define the component pieces for RFC 3986 “relative-ref” ******** -->

    <sch:let name="HEXDIG" value="'[0-9A-Fa-f]'"/>
    <sch:let name="pct-encoded" value="concat('%', $HEXDIG, $HEXDIG )"/>
    <!--
        Since “unrserved” and “sub-delims” are never used independently
        we simply use a combined form here:
        ursdsh = unreserved and sub-delims sans hyphen
    -->
    <sch:let name="unressubdel" value='"A-Za-z0-9\-._~!$&'()*+,;="'/>
    <!--
        “unencoded” is a combination of “unreserved”, “sub-delims”, COMMERCIAL AT, and COLON;
        “unencodednc” is the same without COLON.
    -->
    <sch:let name="unencoded" value="concat('[', $unressubdel, '@', ':', ']')"/>
    <sch:let name="unencodednc" value="concat('[', $unressubdel, '@',      ']')"/>
              
    <sch:let name="pchar" value="concat( $unencoded,   '|', $pct-encoded )"/>
    <sch:let name="ncpchar" value="concat( $unencodednc, '|', $pct-encoded )"/>
              
    <sch:let name="segment" value="concat('(', $pchar,   ')*')"/>
    <sch:let name="segment-nz" value="concat('(', $pchar,   ')+')"/>
    <sch:let name="segment-nz-nc" value="concat('(', $ncpchar, ')+')"/>
              
    <sch:let name="path-abempty" value="concat('(/', $segment, ')*')"/>
    <sch:let name="path-absolute" value="concat('/(', $segment-nz, '(/', $segment, ')*)?')"/>
    <sch:let name="path-noscheme" value="concat( $segment-nz-nc, '(/', $segment, ')*' )"/>
    <sch:let name="path-empty" value="'[empty]{0}'"/>
              
    <sch:let name="query" value="concat('(', $pchar, '|/|\?)*')"/>
    <sch:let name="fragment" value="concat('(', $pchar, '|/|\?)*')"/>

    <!-- The following are only used in the “host” pattern -->
    <sch:let name="dec-octet" value="'((1?[1-9])?[0-9]|2([0-4][0-9]|5[0-5]))'"/>
    <sch:let name="IPv4address" value="concat( $dec-octet, '.', $dec-octet, '.', $dec-octet, '.', $dec-octet )"/>
    <sch:let name="h16" value="'[0-9A-Fa-f]{1,4}'"/>
    <sch:let name="h16c" value="concat('(', $h16, ':',')')"/>
    <sch:let name="ls32" value="concat('((', $h16, ':', $h16, ')|(', $IPv4address, '))')"/>
    <sch:let name="IPv6addr_a" value="concat(
                        $h16c, '{6}', $ls32 )"
/> <sch:let name="IPv6addr_b" value="concat(
                  '::', $h16c, '{5}', $ls32 )"
/> <sch:let name="IPv6addr_c" value="concat('(', $h16c, '{0,0}', $h16, ')?::', $h16c, '{4}', $ls32 )"/> <sch:let name="IPv6addr_d" value="concat('(', $h16c, '{0,1}', $h16, ')?::', $h16c, '{3}', $ls32 )"/> <sch:let name="IPv6addr_e" value="concat('(', $h16c, '{0,2}', $h16, ')?::', $h16c, '{2}', $ls32 )"/> <sch:let name="IPv6addr_f" value="concat('(', $h16c, '{0,3}', $h16, ')?::', $h16c, '{1}', $ls32 )"/> <sch:let name="IPv6addr_g" value="concat('(', $h16c, '{0,4}', $h16, ')?::',
    $ls32 )"
/> <sch:let name="IPv6addr_h" value="concat('(', $h16c, '{0,5}', $h16, ')?::',
    $ls32 )"
/> <sch:let name="IPv6addr_i" value="concat('(', $h16c, '{0,6}', $h16, ')?::',
    $ls32 )"
/> <sch:let name="IPv6address" value="concat('(',
                           '|', $IPv6addr_a,
                           '|', $IPv6addr_b,
                           '|', $IPv6addr_c,
                           '|', $IPv6addr_d,
                           '|', $IPv6addr_e,
                           '|', $IPv6addr_f,
                           '|', $IPv6addr_g,
                           '|', $IPv6addr_h,
                           '|', $IPv6addr_i,
                           ')')"
/> <sch:let name="IPvFuture" value="concat('v', $HEXDIG, '+\.[', $unressubdel, ':', ']*')"/> <sch:let name="IP-literal" value="concat('\[(', $IPv6address, '|', $IPvFuture, ')\]')"/> <sch:let name="reg-name" value="concat('([', $unressubdel, ']|', $pct-encoded, ')*')"/> <!-- end “host”-pattern only portion --> <sch:let name="userinfo" value="concat('([', $unressubdel, ':]|', $pct-encoded, ')*')"/> <sch:let name="host" value="concat('(', $IP-literal, '|', $IPv4address, '|', $reg-name, ')')"/> <sch:let name="port" value="'[0-9]*'"/> <sch:let name="authority" value="concat('(', $userinfo, '@)?', $host, '(:', $port, ')?')"/> <!-- ******** relative reference itself ******** --> <sch:let name="relative-ref" value="concat('(', '//', $authority, $path-abempty,
                           '|', $path-absolute,
                           '|', $path-noscheme,
                           '|', $path-empty,
                           ')(\?', $query, ')?(#', $fragment, ')?'
                           )"
/> <!-- ******** now components of an RFC 8089 “file-URI” ******** --> <!-- “host” and “path-absolute” have already been defined, above. --> <sch:let name="file-auth" value="concat('(', $host, '|(localhost))')"/> <sch:let name="auth-path" value="concat( $file-auth, '?', $path-absolute )"/> <sch:let name="file-hier-part" value="concat( '(//', $auth-path, '|', $path-absolute, ')')"/> <!-- ******** the file-URI itself ******** --> <sch:let name="file-URI" value="concat('file:', $file-hier-part )"/> <!-- ******** now perform the actual test ******** --> <sch:assert test="matches(
                        normalize-space(.),
                        concat('^(', $relative-ref, '|', $file-URI, ')$')
                      )"
>
Value of @target should be a local URI (a relative URI reference or a 'file:' scheme URI). </sch:assert> </sch:rule>

Note that this constraint only tests that the value of target is either a relative URI reference (e.g., "kryptonian_charDecl.xml") or uses the file: URI scheme. It does not test that the file referred to with the file: scheme, if any, is actually local. This would require testing that the host specified is, in fact, the local machine. If the host is not specified or is localhost, we know it refers to the local machine. But for any other value of host we would need to resolve the address and then ask the local system for its address and compare the two.

Note that the test in this example does not permit zone identifiers, which were added to IPv6 addresses by RFC 6874.

Figure 21. ref uses defined prefix

This example tests that each separate pointer in a ref attribute on a <geogName>, <name>, <persName>, <placeName>, <pubPlace>, <rs>, or <title> element begins with a prefix that is defined in the <listPrefixDef>.

  <sch:rule context=" tei:geogName[@ref]
                     |tei:name[@ref]
                     |tei:persName[@ref]
                     |tei:placeName[@ref]
                     |tei:pubPlace[@ref]
                     |tei:rs[@ref]
                     |tei:title[@ref] "
>
<!-- get a sequence of the references in $val --> <sch:let name="REFs" value="tokenize( @ref )"/> <!-- get a sequence of the prefixes thereof[1] --> <sch:let name="PREFs" value="for $r in $REFs return substring-before( $r,':')"/> <!-- get the list of defined prefixes --> <sch:let name="DEFINEDs" value="for $prefixDef in /tei:TEI/tei:teiHeader/tei:encodingDesc/tei:listPrefixDef//tei:prefixDef
                    return $prefixDef/@ident"
/> <!-- create a sequence of booleans that indicate if prefix is defined --> <sch:let name="BOOLEANs" value="for $pfd in distinct-values( $PREFs ) return $pfd = $DEFINEDs"/> <!-- if any 1 of PREFs is not in list of DEFINEDs, warn user --> <sch:report test="$BOOLEANs = false()"> One or more of the references in the @ref of this <sch:name/> does not start with a defined prefix. The complete set of pointers on this @ref is "<sch:value-of select="normalize-space( @ref )"/>". The complete set of defined prefixes is <sch:value-of select="string-join( $DEFINEDs, ', ')"/>. </sch:report> </sch:rule>

Note that the above code requires an XSLT3 queryBinding, as it uses tokenize(). In order to convert this to a constraint that will work with an XSLT2 queryBinding, change the tokenize() to tokenize( normalize-space(.), ' ').

This constraint could be quite a bit simpler if there were never a need to test more than 1 value in a given ref. In which case, one would only need test that substring-before( normalize-space( @ref ),':') = /tei:TEI/tei:teiHeader/tei:encodingDesc/tei:listPrefixDef//tei:prefixDef/@ident.

Notes
1
And another 4 or so defined as teidata.namespace, whose values have the same syntax as a URI, but are not necessarily intended to point at anything.
2
Except there are differences in whitespace normalization.
3
A URI may also refer to a file on the local filesystem using the file: URI scheme. This document does not consider these URIs, except in a demonstration that testing that a URI uses the file: scheme is possible (in Figure 20, Test for any local URI) and the following comment on the general syntax of a file: scheme URI — Said format is file://host/path, where the //host portion is optional (defaulting to ‘localhost’), or may be expressed as just // (again defaulting to ‘localhost’). Thus file:/path and file:///path are both perfectly acceptable ways to refer to the file found at path on the local filesystem.
4
In the OXygen XML editor, you can toggle whether XInclude processing is performed before validation using the ‘Enable XInclude Processing’ checkbox in the Options > Preferences > XML > XML Parser pane.
5
Why they would choose to do this rather than use a URI that points to an external document is an interesting, if irrelevant, question. They may well have been influenced by the fact that every example of <catRef> in the Guidelines uses shorthand pointers.
The TEI Technical Council. Date: 1.0.0