3 Elements Available in All TEI Documents
Table of contents
- 3.1 Paragraphs
- 3.2 Treatment of Punctuation
- 3.3 Highlighting and Quotation
- 3.4 Terms and Glosses, Ruby Annotations, and Equivalents and Descriptions
- 3.5 Simple Editorial Changes
- 3.6 Names, Numbers, Dates, Abbreviations, and Addresses
- 3.7 Simple Links and Cross-References
- 3.8 Lists
- 3.9 Notes, Annotation, and Indexing
- 3.10 Graphics and Other Non-textual Components
- 3.11 Reference Systems
- 3.12 Bibliographic Citations and References
- 3.13 Passages of Verse or Drama
- 3.14 Overview of the Core Module
This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they should generally be contained by a higher-level element of some kind (such as a paragraph). A few of the elements described in this chapter (for example, bibliographic citations and lists) have a comparatively well-defined internal structure, but most of them have no consistent inner structure of their own. In the general case, they contain only a few words, and are often identifiable in a conventionally printed text by the use of typographic conventions such as shifts of font, use of quotation or other punctuation marks, or other changes in layout.
This chapter begins by describing the p tag used to mark paragraphs, the prototypical formal unit for running text in many TEI modules. This is followed, in section 3.2 Treatment of Punctuation, by a discussion of some specific problems associated with the interpretation of conventional punctuation, and the methods proposed by these Guidelines for resolving ambiguities therein.
The next section (section 3.3 Highlighting and Quotation) describes a number of phrase-level elements commonly marked by typographic features (and thus well-represented in conventional markup languages). These include features commonly marked by font shifts (section 3.3.2 Emphasis, Foreign Words, and Unusual Language) and features commonly marked by quotation marks (section 3.3.3 Quotation) as well as such features as terms, cited words, and glosses (section 3.4 Terms and Glosses, Ruby Annotations, and Equivalents and Descriptions).
Section 3.5 Simple Editorial Changes introduces some phrase-level elements which may be used to record simple editorial interventions, such as emendation or correction of the encoded text. The elements described here constitute a simple subset of the full mechanisms for encoding such information (described in full in chapter 12 Representation of Primary Sources), which should be adequate to most commonly encountered situations.
The next section (section 3.6 Names, Numbers, Dates, Abbreviations, and Addresses) describes several phrase-level and inter-level elements which, although often of interest for analysis or processing, are rarely explicitly identified in conventional printing. These include names (section 3.6.1 Referring Strings), numbers and measures (section 3.6.3 Numbers and Measures), dates and times (section 3.6.4 Dates and Times), abbreviations (section 3.6.5 Abbreviations and Their Expansions), and addresses (section 3.6.2 Addresses).
In the same way, the following section (section 3.7 Simple Links and Cross-References) presents only a subset of the facilities available for the encoding of cross-references or text-linkage. The full story may be found in chapter 17 Linking, Segmentation, and Alignment; the tags presented here are intended to be usable for a wide variety of simple applications.
Sections 3.8 Lists, and 3.9 Notes, Annotation, and Indexing, describe two kinds of quasi-structural elements: lists and notes. These may appear either within chunk-level elements such as paragraphs, or between them. Several kinds of lists are catered for, of an arbitrary complexity. The section on notes discusses both notes found in the source and simple mechanisms for adding annotations of an interpretive nature during the encoding; again, only a subset of the facilities described in full elsewhere (specifically, in chapter 18 Simple Analytic Mechanisms) is discussed.
Section 3.10 Graphics and Other Non-textual Components introduces some simple ways of representing graphic or other non-textual content found in a text. A fuller discussion of the multimedia facilities supported by these Guidelines may be found in chapters 15 Tables, Formulæ, Graphics, and Notated Music and 17 Linking, Segmentation, and Alignment.
Next, section 3.11 Reference Systems, describes methods of encoding within a text the conventional system or systems used when making references to the text. Some reference systems have attained canonical authority and should be recorded to make the text useable in normal work; in other cases, a convenient reference system should be created by the creator or analyst of an electronic text.
Like lists and notes, the bibliographic citations discussed in section 3.12 Bibliographic Citations and References, may be regarded as structural elements in their own right. A range of possibilities is presented for the encoding of bibliographic citations or references, which may be treated as simple phrases within a running text, or as highly-structured components suitable for inclusion in a bibliographic database.
Additional elements for the encoding of passages of verse or drama (whether prose or verse) are discussed in section 3.13 Passages of Verse or Drama.
The chapter concludes with a technical overview of the structure and organization of the module described here. This should be read in conjunction with chapter 1 The TEI Infrastructure, describing the structure of the TEI document type definition.
TEI: Paragraphs⚓︎3.1 Paragraphs
Paragraphs in modern printed or online text are typically visually offset with whitespace or an indented first line. But paragraphs are not simply blocks of text. The paragraph may be thought of as a mid-level unit of sense, a coherent grouping of sentences. Paragraphs may, in turn, be grouped into larger divisions, such as chapters. Because it is a unit of sense rather than simply a block of text, the p element in TEI may contain other structures displayed as blocks, such as lists or quotes. This distinguishes it from the p element in HTML, which is primarily a block of text, and from the ab (anonymous block) element described in 17.3 Blocks, Segments, and Anchors which may be used as an alternative to the paragraph in cases that require a chunk-level container which is not necessarily a sense unit and which may have different structural properties.
The paragraph can be understood in the context of the distinct forms of textual division discussed in 1.3.2.1 Informal Element Classifications : chunk, phrase, and inter-level. Chunk-level elements are paragraphs and other elements which have similar structural properties. Phrase-level elements must be entirely contained within a paragraph or other chunk-level structure. This type includes emphasized or quoted phrases, names, dates, etc. Inter-level elements can appear either within a paragraph or between paragraphs, and include bibliographic citations, notes, and lists. The ab (anonymous block) element is an alternative to p which is useful in cases where paragraphs are not present, but chunk-level organization is still needed. ab may contain other abs, may use the type attribute, and does not necessarily represent a coherent set of statements.
Paragraphs can contain many of the other elements described within this chapter, as well as other elements which are specific to individual text types. Because paragraphs may appear in different customizations, their possible contents may vary in different kinds of documents. In particular, additional elements not listed in this chapter may appear in paragraphs. However, the elements described in this chapter are available in all kinds of text unless they are excluded by a customization.
The paragraph is marked using the p element:
- p (paragraph) marks paragraphs in prose.
If a consistent internal subdivision of paragraphs is desired, the s or seg (‘segment’) elements may be used, as discussed in chapters 17 Linking, Segmentation, and Alignment and 18 Simple Analytic Mechanisms respectively. More usually, however, paragraphs have no firm internal structure, but contain prose encoded as a mix of characters, entity references, phrases marked as described in the rest of this chapter, and embedded elements like lists, figures, or tables.
<p>It is in vain to say human beings ought to be satisfied with tranquillity:
they must have action; and they will make it if they cannot find it. Millions are
condemned to a stiller doom than mine, and millions are in silent revolt against
their lot. Nobody knows how many rebellions besides political rebellions ferment
in the masses of life which people earth. Women are supposed to be very calm
generally: but women feel just as men feel; they need exercise for their faculties,
and a field for their efforts, as much as their brothers do; they suffer from
too rigid a restraint, too absolute a stagnation, precisely as men would suffer; and
it is narrow-minded in their more privileged fellow-creatures to say that they ought to
confine themselves to making puddings and knitting stockings, to playing on the piano
and embroidering bags. It is thoughtless to condemn them, or laugh at them, if they
seek to do more or learn more than custom has pronounced necessary for their sex.</p>
</body>
<p>Serbs seized more territory in this struggling new country today as
the United States Air Force ended a two-day airlift of humanitarian
aid into the capital, Sarajevo.</p>
<p>International relief workers called on European Community nations
to step up their humanitarian aid to the former Yugoslav republic,
in conjunction with new American aid flights if necessary.</p>
<p>A special envoy from the European Community, Colin Doyle, harshly
condemned the decision by Serbs to shell Sarajevo on Saturday night
during a visit to the Bosnian capital by a senior American official,
Deputy Assistant Secretary of State Ralph R. Johnson.</p>
<p>...</p>
There came to the castle the Crawling Louse. <q>Who,
who's in the castle? Who, who's in your house?</q>
said the Crawling Louse. <q>I, I, the Languishing Fly.
And who art thou?</q>
<q>I'm the Crawling Louse.</q>
</p>
<p>Then came to the castle the Leaping Flea. <q>Who,
who's in the castle?</q> said the Leaping Flea. <q>I,
I, the Languishing Fly, and I, the Crawling Louse. And
who art thou?</q>
<q>I'm the Leaping Flea.</q>
</p>
<p>Then came to the castle the Mischievous Mosquito.
<q>Who, who's in the castle?</q> said the Mischievous
Mosquito. <q>I, I, the Languishing Fly, and I, the
Crawling Louse, and I, the Leaping Flea. And who art
thou?</q>
<q>I'm the Mischievous Mosquito.</q>
</p>
TEI: Treatment of Punctuation⚓︎3.2 Treatment of Punctuation
Punctuation marks cause two distinct classes of problem for text markup: the marks may not be available in the character set used, and they may be significantly ambiguous. To some extent, the availability of the Unicode character set addresses the first of these problems, since it provides specific code points for most punctuation marks, and also the second to the extent that it distinguishes glyphs (such as stop, comma, and hyphen) which are used with different functions. Where punctuation itself is the subject of study, the element pc (punctuation character) may be used to mark it explicitly, as further discussed in 18.1.2 Below the Word Level. Where the character used for a punctuation mark is not available in Unicode, the g element and other facilities described in chapter 5 Characters, Glyphs, and Writing Modes may also be used to mark its presence.
TEI: Functions of Punctuation⚓︎3.2.1 Functions of Punctuation
Punctuation is itself a form of markup, historically introduced to provide the reader with an indication about how the text should be read. As such, it is unsurprising that encoders will often wish to encode directly the purpose for which punctuation was provided, as well as, or even instead of, the punctuation itself. We discuss some typical cases below.
The Full stop (period) may mark (orthographic) sentence boundaries, abbreviations, decimal points, or serve as a visual aid in printing numbers. These usages can be distinguished by tagging S-units, abbreviations, and numbers, as described in sections 17.3 Blocks, Segments, and Anchors, 3.6.5 Abbreviations and Their Expansions, and 3.6.3 Numbers and Measures respectively. However, there are independent reasons for tagging these, whether or not they are marked by full stops, and the polysemy of the full stop itself is perhaps no different from that of any other character in the writing system.
The Question mark and exclamation mark usually mark the end of orthographic sentences, but may also be used as a mid-sentence comment by the author (! to express surprise or some other strong feeling, ? to query a word or expression or mark a sentence as dubious in linguistic discussion). Such usages may be distinguished by marking S-units, in which case the mid-sentence uses of these punctuation marks may be left unmarked, or tagged using the pc element discussed in 18.1 Linguistic Segment Categories.
Dashes are used for a variety of purposes: as a mark of omission, insertion, or interruption; to show where a new speaker takes over (in dialogue); or to introduce a list item. In the latter two cases particularly, it is clearly desirable to mark the function as well as its rendition using the elements q or item, on which see section 3.3.3 Quotation, and section 3.8 Lists, respectively.
Quotation marks may be removed from text contained by q or quote elements on editorial grounds, or they may be marked in a variety of ways; see the discussion of quotation and related features in section 3.3.3 Quotation.
Apostrophes should be distinguished from single quote marks. As with hyphens, this disambiguation is best performed by selecting the appropriate Unicode character, though it may also be represented by using appropriate XML markup for quotations as suggested above. However, apostrophes have a variety of uses. In English they mark contractions, genitive forms, and (occasionally) plural forms. Full disambiguation of these uses belongs to the level of linguistic analysis and interpretation.
Parentheses and other marks of suspension such as dashes or ellipses are often used to signal information about the syntactic structure of a text fragment. Full disambiguation of their uses also belongs to the level of linguistic analysis and interpretation, and will therefore need to use the mechanisms discussed in chapter 18 Simple Analytic Mechanisms.
Where punctuation marks are disambiguated by tagging their assumed function in the text (for example, quotation), it may be debated whether they should be excluded or left as part of the text. In the case of quotation marks, it may be more convenient to distinguish opening from closing marks simply by using the appropriate Unicode character than to use the q element, with or without an indication of rendition.
Where segmentation of a text is performed automatically, the accuracy of the result may be considerably enhanced by a first pass in which the function of different punctuation characters is explicitly marked. This need not be done for all cases, but only where the structural function of the punctuation markup (for example as a word or phrase delimiter) is ambiguous. Thus, dots indicating abbreviation might be distinguished from dots indicating sentence end, and exclamation or question marks internal to a sentence distinguished from those which terminate one. Furthermore, when encoding historical materials, it may be considered essential to retain the original punctuation, whether by using an appropriate character code, if this is available (or using the g element where it is not) or by an explicit encoding using pc. The particular method adopted will vary depending upon the feature concerned and upon the purpose of the project.
TEI: Hyphenation⚓︎3.2.2 Hyphenation
Hyphenation as a phenomenon is generally of most concern when producing formatted text for display in print or on screen: different languages and systems have developed quite sophisticated sets of rules about where hyphens may be introduced and for what reason. These generally do not concern the text encoder, since they belong to the domain of formatting and will generally be handled by the rendition software in use. In this section, we discuss issues arising from the appearance of hyphens in pre-existing formatted texts which are being re-encoded for analysis or other processing. Unicode distinguishes four characters visually similar to the hyphen, including the undifferentiated hyphen-minus (U+002D) which is retained for compatibility reasons. The hard hyphen (U+2010) is distinguished from the minus sign (U+2212) which is for use in mathematical expressions, and also from the soft hyphen (U+00AD) which may appear in ‘born digital’ documents to indicate places where it is acceptable to insert a hyphen when the document is formatted.
Historically, the hard hyphen has been used in printed or manuscript documents for two distinct purposes. In many languages, it is used between words to show that they function as a single syntactic or lexical unit. For example, in French, est-ce que; in English body-snatcher, tea-party etc. It may also have an important role in disambiguation (for example, by distinguishing say a man-eating fish from a man eating fish). Such usages, although possibly problematic when a linguistic analysis is undertaken, are not generally of concern to text encoders: the hyphen character is usually retained in the text, because it may be regarded as part of the way a compound or other lexical item is spelled. Deciding whether a compound is to be decomposed into its constituent parts, and if so how, is a different question, involving consideration of many other phenomena in addition to the simple presence of a hyphen.
When it appears at the end of a printed or written line however, the hard hyphen generally indicates that—contrary to what might be expected—a word is not yet complete, but continues on the next line (or over the next page or column or other boundary). The hyphen character is not, in this case, part of the word, but just a signal that the word continues over the break. Unfortunately, few languages distinguish these two cases visually, which necessarily poses a problem for text encoders. Suppose, for example, that we wish to investigate a diachronic English corpus for occurrences of tea-pot and teapot, to find evidence for the point at which this compound becomes lexicalized. Any case where the word is hyphenated across a linebreak, like this:
tea- pot⚓
is ambiguous: there is no simple way of deciding which of the two spellings was intended.
As elsewhere, therefore, encoders have a range of choices:
- They may decide simply to remove any end-of-line hyphenation from the encoded text, on the grounds that its presence is purely a secondary matter of formatting. This will obviously apply also if line endings are themselves regarded as unimportant.
- Alternatively, they may decide to record the presence of the hyphen, perhaps on the grounds that it provides useful morphological information; perhaps in order to retain information about the visual appearance of the original source. In either case, they need to decide whether to record it explicitly, by including an appropriate punctuation character in the text data, or implicitly by supplying an appropriate symbolic value for one or more of the attributes on the lb or other milestone element used to record the fact of the line division. If the hyphen is included in the character data of the TEI document, it might be marked up using the pc (punctuation character) tag, which allows the encoder to express information about its function as a separator, through the force attribute. For example, the example above could be encoded with a force value of "inter" to indicate that the punctuation mark may or may not be a word separator (See also 18.1.2 Below the Word Level).
A similar range of possibilities applies equally to the representation of other common punctuation marks, notably quotation marks, as discussed in 3.3.3 Quotation.
The ‘text data’ of which XML documents are composed is decomposable into smaller units, here called orthographic tokens, even if those units are not explicitly indicated by the XML markup. The ambiguity of the end-of-line hyphen also causes problems in the way a processor identifies such tokens in the absence of explicit markup. If token boundaries are not explicitly marked (for example using the seg or w elements), for most languages a processor will rely on character class information to determine where they are to be found: some punctuation characters are considered to be word-breaking, while others are not. In XML, the newline character in text data is a kind of whitespace, and is therefore word breaking. However, it is generally unsafe to assume that whitespace adjacent to markup tags will always be preserved, and it is decidedly unsafe to assume that markup tags themselves are equivalent to whitespace.
The lb, pb, and cb elements are notable exceptions to this general rule, since their function is precisely to represent (or replace) line, page, or column breaks, which, as noted above, are generally considered to be equivalent to whitespace. These elements provide a more reliable way of preserving the lineation, pagination, etc of a source document, since the encoder should not assume that (untagged) line breaks etc. in an XML source file will necessarily be preserved.
To control the intended tokenization, the encoder may use the break attribute on such elements to indicate whether or not the element is to be regarded as equivalent to whitespace. This attribute can take the values yes or no to indicate whether or not the element corresponds with a token boundary. The value maybe is also available, for cases where the encoder does not wish (or is unable) to determine whether the orthographic token concerned is broken by the line ending.
As a final complication, it should be noted that in some languages, particularly German and Dutch, the spelling of a word may be altered in the presence of end of line hyphenation. For example, in Dutch, the word opaatje (granddad), occurring at the end of a line may be hyphenated as opa-tje, with a single letter a. An encoder wishing to preserve the original form of this orthographic token in a printed text while at the same time facilitating its recognition as the word opaatje will therefore need to rely on a more sophisticated process than simply removing the hyphen. This is however essentially the same as any other form of normalization accompanying the recognition of variations in spelling or morphology: as such it may be encoded using the choice element discussed in 3.5 Simple Editorial Changes, or the more sophisticated mechanisms for linguistic analysis discussed in chapter 18 Simple Analytic Mechanisms.
TEI: Highlighting and Quotation⚓︎3.3 Highlighting and Quotation
This section deals with a variety of textual features, all of which have in common that they are frequently realized in conventional printing practice by the use of such features as underlining, italic fonts, or quotation marks, collectively referred to here as highlighting. After an initial discussion of this phenomenon and alternate approaches to encoding it, this section describes ways of encoding the following textual features, all of which are conventionally rendered using some kind of highlighting:
- emphasis, foreign words and other linguistically distinct uses of highlighting
- representation of speech and thought, quotation, etc.
- technical terms, glosses, etc.
TEI: What Is Highlighting?⚓︎3.3.1 What Is Highlighting?
By highlighting we mean the use of any combination of typographic features (font, size, hue, etc.) in a printed or written text in order to distinguish some passage of a text from its surroundings.11 The purpose of highlighting is generally to draw the reader's attention to some feature or characteristic of the passage highlighted; this section describes the elements recommended by these Guidelines for the encoding of such textual features.
In conventionally printed modern texts, highlighting is often employed to identify words or phrases which are regarded as being one or more of the following:
- distinct in some way—as foreign, dialectal, archaic, technical, etc.
- emphatic, and which would for example be stressed when spoken
- not part of the body of the text, for example cross-references, titles, headings, labels, etc.
- identified with a distinct narrative stream, for example an internal monologue or commentary.
- attributed by the narrator to some other agency, either within the text or outside it: for example, direct speech or quotation.
- set apart from the text in some other way: for example, proverbial phrases, words mentioned but not used, names of persons and places in older texts, editorial corrections or additions, etc.
The textual functions indicated by highlighting may not be rendered consistently in different parts of a text or in different texts. (For example, a foreign word may appear in italics if the surrounding text is in roman, but in roman if the surrounding text is in italics.) For this reason, these Guidelines distinguish between the encoding of rendering itself and the encoding of the underlying feature expressed by it.
Highlighting as such may be encoded by using one of the global attributes rend, rendition, or style (see further 1.3.1.1 Global Attributes). This allows the encoder both to specify the function of a highlighted phrase or word, by selecting the appropriate element described here or elsewhere in these Guidelines, and to further describe the way in which it is highlighted, by means of an attribute. If the encoder wishes to offer no interpretation of the feature underlying the use of highlighting in the source text, then the hi element may be used, which indicates only that the text so tagged was highlighted in some way.
- hi (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.
The hi element is provided by the model.hiLike class.
The possible values carried by the rend attribute are not formally defined in this version of the Guidelines. It may be used to document any peculiarity of the way a given segment of text was rendered in the original source text, and may thus express a very large range of typographic or other features, by no means restricted to typeface, type size, etc. The style attribute, by contrast, defines the way the source text was rendered using a formally defined style language, such as the W3C standard Cascading Stylesheet Language (Lie and Bos (eds.) (1999)). The complementary rendition attribute is used to point to one or more fragments expressed using such a language which have been predefined in the TEI header using the rendition element discussed in section 2.3.4 The Tagging Declaration.
Where it is both appropriate and feasible, these Guidelines recommend that the textual feature marked by the highlighting should be encoded, rather than just the simple fact of the highlighting. This is for the following reasons:
- the same kind of highlighting may be used for different purposes in different contexts
- the same textual function may be highlighted in different ways in different contexts
- for analytic purposes, it is in general more useful to know the intended function of a highlighted phrase than simply that it is distinct.
In many, if not most, cases the underlying function of a highlighted phrase will be obvious and non-controversial, since the distinctions indicated by a change of highlighting correspond with distinctions discussed elsewhere in these Guidelines. The elements available to record such distinctions are, for the most part, members of the model.emphLike class. This and the model.hiLike class mentioned above constitute the model.highlighted class, which is a phrase level class. Members of this class may appear anywhere within paragraph level elements.
The distinction between the two classes is simple, and typified by the two elements hi and emph: the former marks simply that a passage is typographically distinct in some way, while the latter asserts that a passage is linguistically emphasized for some purpose. These two properties, though often combined, are not identical. It should however be recognized, however, that cases do exist in which it is not economically feasible to mark the underlying function (e.g. in the preparation of large text corpora), as well as cases in which it is not intellectually appropriate (as in the transcription of some older materials, or in the preparation of material for the study of typographic practice). In such cases, the hi element or some other element from the model.hiLike class should be used.
Elements which are sometimes realized by typographic distinction but which are not discussed in this section include title (discussed in section 3.12 Bibliographic Citations and References) and name (discussed in section 3.6.1 Referring Strings).
TEI: Emphasis, Foreign Words, and Unusual Language⚓︎3.3.2 Emphasis, Foreign Words, and Unusual Language
This subsection discusses the following elements:
- foreign (foreign) identifies a word or phrase as belonging to some language other than that of the surrounding text.
- emph (emphasized) marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect.
- distinct identifies any word or phrase which is regarded as linguistically distinct, for example as archaic, technical, dialectal, non-preferred, etc., or as forming part of a sublanguage.
These elements are all members of the model.emphLike class.
TEI: Foreign Words or Expressions⚓︎3.3.2.1 Foreign Words or Expressions
hoc</foreign>?</q> said the Bee Master.
<q>Wax-moth only succeed when
weak bees let them in.</q>
pronounce with your mouth full.
piece of light, buttery, pastry that is usually eaten for
breakfast, especially in France.
Elements which do not explicitly state the language of their content by means of an xml:lang attribute are understood to inherit a value for it from their parent element. In the general case, therefore, it is recommended practice to supply a default value for xml:lang on the root TEI or text element, as further discussed in section 1.3.1.1.2 Language Indicators
TEI: Emphatic Words and Phrases⚓︎3.3.2.2 Emphatic Words and Phrases
<q>
<emph rend="italic">What does Christopher Robin do in the morning
nowadays?</emph>
</q>
<q>
<emph style="font-style: italic">What does Christopher Robin do in
the morning nowadays?</emph>
</q>
whom three Realms obey,</l>
<l>Doth sometimes Counsel take —
and sometimes <emph rendition="#italic">Tea</emph>.</l>
<!-- in the header ... -->
<rendition xml:id="italic" scheme="css">font-style: italic</rendition>
The hi element is used to mark words or phrases which are highlighted in some way, but for which identification of the intended distinction is difficult, controversial, or impossible. It enables an encoder simply to record the fact of highlighting, possibly describing it by the use of a rend, style, or rendition attribute, as discussed above, without however taking a position as to the function of the highlighting. This may also be useful if the text is to be processed in two stages: representing simply typographic distinctions during a first pass, and then replacing the hi elements with more specific elements in a second pass.
that the said <hi rend="italic">Walter Shandy</hi>, merchant,
in consideration of the said intended marriage ...
sleet, could boast of the advantage over him in only one respect. They
often <hi rend="quoted">came down</hi> handsomely, and Scrooge never
did.
TEI: Other Linguistically Distinct Material⚓︎3.3.2.3 Other Linguistically Distinct Material
For some kinds of analysis, it may be desirable to encode the linguistic distinctiveness of words and phrases with more delicacy than is allowed by the foreign element. The distinct element is provided for this purpose. Its attributes allow for additional information characterizing the nature of the linguistic distinction to be made in two distinct ways: the type attribute simply assigns a user-defined code of some kind to the word or phrase which assigns it to some register, sub-language, etc. No recommendations as to the set of values for this attribute are provided at this time, as little consensus exists in the field.
Alternatively, the remaining three attributes may be used in combination to place a word or phrase on a three-dimensional scale sometimes used in descriptive linguistics, as for example in Mattheier et al, 1988. The time attribute places a word or phrase diachronically, for example as archaic, old-fashioned, contemporary, futuristic, etc.; the space attribute places a word or phrase diatopically, that is, with respect to a geographical classification, for example as national, regional, international, etc.; the social attribute places a word or phrase diastratically, that is, with respect to a social classification, for example as technical, polite, impolite, restricted, etc. Again, no recommendations are made for the values of these attributes at this time; the encoder should provide a description of the scheme used in the appropriate section of the header (see section 2.3 The Encoding Description).
bosom friend, a <distinct type="psSlang">fag</distinct> of
Macrea's, that there was trouble in their midst which
King <distinct type="archaic">would fain</distinct> keep
secret.
bosom friend, a
<distinct time="1900" space="GB"
social="publicschool">fag</distinct>
of Macrea's, that there was trouble in their midst which
King <distinct time="archaic">would fain</distinct> keep
secret.
TEI: Quotation⚓︎3.3.3 Quotation
One form of presentational variation found particularly frequently in written and printed texts is the use of quotation marks. As with the typographic variations discussed in the preceding section, it is generally helpful to separate the encoding of the underlying textual feature (for example, a quotation or a piece of direct speech) from the encoding of its rendering (for example, the use of a particular style of quotation marks).
This section discusses the following elements, all of which are often rendered by the use of quotation marks:
- q (quoted) contains material which is distinguished from the surrounding text using quotation marks or a similar method, for any one of a variety of reasons including, but not limited to: direct speech or thought, technical terms or jargon, authorial distance, quotations from elsewhere, and passages that are mentioned but not used.
- said (speech or thought) indicates passages thought or spoken aloud, whether explicitly
indicated in the source or not, whether directly or indirectly reported, whether by
real people or fictional characters.
direct may be used to indicate whether the quoted matter is regarded as direct or indirect speech. aloud may be used to indicate whether the quoted matter is regarded as having been vocalized or signed. - quote (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
- att.global.source provides attributes used by elements to point to an external source.
source specifies the source from which some aspect of this element is drawn. - cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
- mentioned marks words or phrases mentioned, not used.
- soCalled (so called) contains a word or phrase for which the author or narrator indicates a disclaiming of responsibility, for example by the use of scare quotes or italics.
The elements mentioned and soCalled are members of the class model.emphLike while q stems from model.hiLike; the element said is a member of the class model.attributable in its own right, while cit and quote are members of model.quoteLike, a subclass of model.attributable. This class is a subclass of model.inter; hence all of these elements are permitted both within and between paragraph-level elements.
The most common and important use of quotation marks is, of course, to mark quotation, by which we mean simply any part of the text which the author or narrator wishes to attribute to some agency other than the narrative voice. The q element may be used if no further distinction beyond this is judged necessary. If it is felt necessary to distinguish such passages further, for example to indicate whether they are regarded as speech, writing, or thought, either the type attribute or one of the more specialized elements discussed in this section may be used. For example, the element quote may be used for written passages cited from other works, or the element said for words or phrases represented as being spoken or thought by people or characters within the current work. The soCalled element is used for cases where the author or narrator distances him or herself from the words in question without however attributing them to any other voice in particular. The mentioned element is appropriate for a case where a word or phrase is being discussed in the body of a text rather than forming part of the text directly.
As noted above, if the distinction among these various reasons why a passage is offset from surrounding text cannot be made reliably, or is not of interest, then any representation of speech, thought, or writing may simply be marked using the q element.
Quotation may be indicated in a printed source by changes in type face, by special punctuation marks (single or double or angled quotes, dashes, etc.) and by layout (indented paragraphs, etc.), or it may not be explicitly represented at all. If these characteristics are of interest, one or other of the global rend or rendition attributes discussed in section 1.3.1.1 Global Attributes may be used to record them.
<said>— Alors, Albert, quoi de neuf?</said>
<said>— Pas grand-chose.</said>
<said>— Il fait beau,</said> dit Robert.
<said rendition="#dashBefore">Alors,
Albert, quoi de neuf ?</said>
<said rendition="#dashBefore">Pas grand-chose.</said>
<said rendition="#dashBefore">Il fait beau,</said>
dit Robert.
<!-- ... within the header -->
<rendition xml:id="dashBefore"
scope="before">content: '— '</rendition>
<!-- ... -->
<quotation marks="none"/>
Whatever policy is adopted, the encoder should document the decision in some way, for example by using the quotation element provided in the TEI header (see 2.3.3 The Editorial Practices Declaration) to indicate that quotation marks have not been retained in the encoding; their presence in the source is implied by the rendition attribute values supplied.
you?</said> — he at last said —
<said rend="pre(‘) post(’)">you no speak-e,
damme, I kill-e.</said> And so saying,
the lighted tomahawk began flourishing
about me in the dark.
you?</said> — he at last said —
<said>you no speak-e,
damme, I kill-e.</said> And so saying,
the lighted tomahawk began flourishing
about me in the dark.
<!-- in the header: -->
<tagsDecl partial="true">
<rendition scheme="css" selector="said"
scope="before">content:"‘";</rendition>
<rendition scheme="css" selector="said"
scope="after">content:"’";</rendition>
</tagsDecl>
- att.ascribed provides attributes for elements representing speech or action that can be ascribed
to a specific individual.
who indicates the person, or group of people, to whom the element content is ascribed. - att.ascribed.directed provides attributes for elements representing speech or action that can be directed
at a group or individual.
toWhom indicates the person, or group of people, to whom a speech act or action is directed.
<said who="#Ado" toWhom="#Alb">— Alors, Albert, quoi de neuf?</said>
<said who="#Alb" toWhom="#Ado">— Pas grand-chose.</said>
<said who="#Rob">— Il fait beau,</said> dit Robert.
<!-- ... elsewhere in the document -->
<standOff>
<listPerson type="speakers">
<person xml:id="Ado">
<persName>Adolphe</persName>
</person>
<person xml:id="Alb">
<persName>Albert</persName>
</person>
<person xml:id="Rob">
<persName>Robert</persName>
</person>
</listPerson>
</standOff>
<said aloud="false">I mean
Gordon Macrae, for example…</said>
<said aloud="false">Jungian
Analyst with Winebox! That's what you called him, you callous bastard,
didn't you? Eh? Eh?</said>
eight weeks with this very paper in his hand, and he says:—
<said who="#WilsonSpaulding">I wish to the Lord, Mr. Wilson, that I was a
red-headed man.</said>
</said>
<!-- ... -->
<list type="speakers">
<item xml:id="Wilson">Wilson</item>
<item xml:id="WilsonSpaulding">Spaulding reported by Wilson</item>
<!-- ...-->
</list>
<said>The Lord! The Lord! It is Sakya Muni himself,</said> the lama half
sobbed; and under his breath began the wonderful Buddhist
invocation:-<said>
<quote>
<l>To Him the Way — the Law — Apart —</l>
<l>Whom Maya held beneath her heart</l>
<l>Ananda's Lord — the Bodhisat</l>
</quote>
And He is here! The Most Excellent Law is here also. My
pilgrimage is well begun. And what work! What work!</said>
</p>
<head>Chapter 1</head>
<epigraph>
<cit>
<quote>
<l>Since I can do no good because a woman</l>
<l>Reach constantly at something that is near it.</l>
</quote>
<bibl>
<title>The Maid's Tragedy</title>
<author>Beaumont and Fletcher</author>
</bibl>
</cit>
</epigraph>
<p>Miss Brooke had that kind of beauty which seems to be thrown into
relief by poor dress...</p>
</div>
work of followers of J.R. Firth, probably best summarized
in his slogan, <cit>
<quote>You shall know a word by the company it keeps.</quote>
<ref>(Firth, 1957)</ref>
</cit>
<quote source="#tlk_36">
<title>Beowulf</title> is in fact so interesting as
poetry, in places poetry so powerful, that this quite
overshadows the historical content
</quote>.
Unlike most of the other elements discussed in this chapter, direct speech and quotations may frequently contain other high-level elements such as paragraphs or verse lines, as well as being themselves contained by such elements. Three possible solutions exist for this well-known structural problem:
- the quotation is broken into segments, each of which is entirely contained within a paragraph
- the quotation is marked up using stand-off markup
- the quotation boundaries are represented by empty segment boundary delimiter elements
For further discussion and several examples, see chapter 21 Non-hierarchical Structures.
sentences are finite objects was never justified by arguments from
the attested properties of NLs, it did have a certain
<soCalled>social</soCalled> justification. It was commonly assumed in
works on logic until fairly recently that the notion
<mentioned>language</mentioned> is necessarily restricted to finite
strings.
TEI: Terms and Glosses, Ruby Annotations, and Equivalents and Descriptions⚓︎3.4 Terms and Glosses, Ruby Annotations, and Equivalents and Descriptions
This section describes a set of textual elements which are used to provide a gloss, alternate identification, or description of something.
TEI: Terms and Glosses⚓︎3.4.1 Terms and Glosses
Technical terms are often italicized or emboldened upon first mention in printed texts; an explanation or gloss is sometimes given in quotation marks. Linguistic analyses conventionally cite words in languages under discussion in italics, providing a gloss immediately following marked with single quotation marks. Other texts in which individual words or phrases are mentioned (for example, as examples) rather than used may mark them either with italics or with quotation marks, and will gloss them less regularly.
- term (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term.
- gloss (gloss) identifies a phrase or word used to provide a gloss or definition for some other word or phrase.
These elements are also members of the class model.emphLike.
as
<gloss target="#TDPv">the relationship, expressed through discourse
structure, between the implied author or some other addresser,
and the fiction.</gloss>
structure from grammatical strings of words</gloss> is known as a
<term xml:id="PRSR">parser</term>, and much of the history of NLP over the
last 20 years has been occupied with the design of parsers.
form like <mentioned xml:id="cw234" xml:lang="grc">eluthemen</mentioned>
<gloss target="#cw234">we were released,</gloss> accented on the
second syllable of the word, and its participial derivative
<mentioned xml:id="cw235" xml:lang="grc">lutheis</mentioned>
<gloss target="#cw235">released,</gloss> accented on the last.
For technical terminology in particular, and generally in terminological studies, it may be useful to associate an instance of a term within a text with a canonical definition for it, which is stored either elsewhere in the same text (for example in a glossary of terms) or externally, for example in a database, authority file, or published standard. The attributes key and ref discussed in section 3.6.1 Referring Strings below are available on the term element for this purpose.
TEI: Some Further Examples⚓︎3.4.1.1 Some Further Examples
associated with the new rise of romance of twelfth-century France,
the <hi xml:lang="fr" rend="italic">romans d'antiquité</hi>,
the romances of Chrétien de Troyes, ...
is associated with the new rise of romance of twelfth-century France,
the <foreign rend="italic">romans d'antiquité</foreign>, the
romances of Chrétien de Troyes, ...
In this example, the decision as to which textual features are distinguished by the highlighting is relatively uncontroversial. As a less straightforward example, consider the use of italic font in the following passage:
Clearly, the word vehement is not italicized for the same reason as the phrase not so young as she has been; the former is emphasized, while the latter is proverbial. It also provides an ironic gloss for the words too wise, in the same way as too pert glosses too witty. The glossed phrases are not, however, technical terms or cited words, but quoted phrases, as if the writer were putting words into her own and her mother's mouths. Finally, the words mother and daughter are apparently italicized simply to oppose them in the sentence; certainly they do not fit into any of the categories so far proposed as reasons for italicizing. Note also that the word Anglicé is not italicized although it is not generally considered an English word.
debatings. She says I am <q rend="italic">too witty</q>;
<foreign xml:lang="la" rend="roman">Anglicé</foreign>,
<gloss rend="italic">too pert</gloss>; I, that she is
<q rend="italic"> too wise</q>; that is to say, being likewise
put into English, <gloss rend="italic">not so young as she has
been</gloss>: in short, she is grown so much into a
<hi rend="italic">mother</hi>, that she had forgotten she ever
was a <hi rend="italic">daughter</hi>.
TEI: Ruby Annotations⚓︎3.4.2 Ruby Annotations
The word ruby (or rubi) refers to a particular method of glossing runs of text which is common in East Asian scripts. In horizontally-oriented text, ruby annotations typically appear above the text being glossed, while in vertical runs of text they may appear to the left or right, or both, also oriented vertically. An English example of a ruby annotation might look like this:
In Japanese, furigana (振り仮名) ruby annotations are often used to provide pronunciation guidance for readers; characters from the largely phonetic hiragana or katakana syllabaries accompany Chinese characters, like this:
Pinyin ruby annotations are also used in Chinese to provide pronunciation guidance, and Zhuyin (注音) phonetic symbols (commonly known as bopomofo) are used in Taiwan for the same purpose.
The TEI schema provides many different ways of encoding glosses and annotations, from the simple and flexible note element to a native implementation of the Web Annotation Data Model (17.11 Annotations). However, ruby is a particular, distinct, and widely-used form of annotation that appears in script, print, calligraphy, and web pages, and the TEI therefore provides specific elements for it:
- ruby (ruby container) contains a passage of base text along with its associated ruby gloss(es).
- rb (ruby base) contains the base text annotated by a ruby gloss.
- rt (ruby text) contains a ruby text, an annotation closely associated with a passage of the main text.
The rt element is a member of att.placement, and thus the place attribute may be used to indicate where the ruby gloss is with respect to the base text:
- att.placement provides attributes for describing where on the source page or object a textual element
appears.
place specifies where this item is placed. Suggested values include: 1] top; 2] bottom; 3] margin; 4] opposite; 5] overleaf; 6] above; 7] right; 8] below; 9] left; 10] end; 11] inline; 12] inspace
The most relevant suggested values of place for ruby text are above, below, left, and right.
<!--...-->
<ruby>
<rb>大</rb>
<rt place="above">だい</rt>
</ruby>
<ruby>
<rb>学</rb>
<rt place="above">がく</rt>
</ruby>
<!--...-->
</p>
<!--...-->
<ruby>
<rb>瓶</rb>
<rt place="right">ㄆㄧㄥˊ</rt>
</ruby>
<ruby>
<rb>子</rb>
<rt place="right">˙ㄗ</rt>
</ruby>
<!--...-->
</p>
xml:lang="ja">
<!--...-->
<ruby>
<rb>
<ruby>
<rb>打</rb>
<rt place="right">ダ</rt>
</ruby>
<ruby>
<rb>球</rb>
<rt place="right">キウ</rt>
</ruby>
場
</rb>
<rt place="left">ビリヤード</rt>
</ruby>
<!--...-->
</p>
xml:lang="ja">
<!--...-->
<ruby>
<rb>
<anchor xml:id="da"/>打
<anchor xml:id="kyuu"/>球
<anchor xml:id="ba"/>場
<anchor xml:id="owari"/>
</rb>
<rt place="left" from="#da" to="#owari">ビリヤード</rt>
<rt place="right" from="#da" to="#kyuu">ダ</rt>
<rt place="right" from="#kyuu" to="#ba">キウ</rt>
</ruby>
<!--...-->
</p>
xml:lang="ja">
<!--...-->
<ruby>
<rb xml:id="dakyuuba">
<c xml:id="chr1">打</c>
<c xml:id="chr2">球</c>
<c>場</c>
</rb>
<rt place="left" target="#dakyuuba">ビリヤード</rt>
<rt place="right" target="#chr1">ダ</rt>
<rt place="right" target="#chr2">キウ</rt>
</ruby>
<!--...-->
</p>
xml:lang="ja">
<!--...-->
<ruby>
<rb>ㄅ</rb>
<rt place="right">B</rt>
<rt place="right">博</rt>
</ruby>
<!--...-->
</p>
xml:lang="ja">
<!--...-->
<ruby>
<rb>
<ruby>
<rb>ㄅ</rb>
<rt place="right">B</rt>
</ruby>
</rb>
<rt place="right">博</rt>
</ruby>
<!--...-->
</p>
- att.written provides attributes to indicate the hand in which the content of an element was written
in the source being transcribed.
hand points to a handNote element describing the hand considered responsible for the content of the element concerned.
xml:lang="ja">
<!--...-->
<ruby>
<rb>蘝蔓</rb>
<rt hand="#MoriOhgai" style="color: red;">ヤブカラシ</rt>
</ruby>
于野
<!--...-->
</p>
The current support for ruby is rudimentary, and in future releases of the Guidelines we expect to see more development of these features and recommendations. While ruby is included for use with East Asian languages, encoders may find other contexts in which these elements are useful.
TEI: Equivalents and Descriptions⚓︎3.4.3 Equivalents and Descriptions
Another group of elements is used to supply different kinds of names for objects described by the TEI. Examples of this are documentation of elements, attributes, classes (and also attribute values where appropriate), and description of glyphs.
- altIdent (alternate identifier) supplies the recommended XML name for an element, class, attribute, etc. in some language.
- desc (description) contains a short description of the purpose, function, or use of its parent element, or when the parent is a documentation element, describes or defines the object being documented.
- equiv (equivalent) specifies a component which is considered equivalent to the parent element,
either by co-reference, or by external link.
uri (uniform resource identifier) references the underlying concept of which the parent is a representation by means of some external identifier. filter references an external script which contains a method to transform instances of this element to canonical TEI. name a single word which follows the rules defining a legal XML name (see https://www.w3.org/TR/REC-xml/#dt-name), naming the underlying concept of which the parent is a representation. predicate [att.predicate] the condition under which the element bearing this attribute applies, given as an XPath predicate expression.
Along with the gloss element mentioned above, these elements constitute the model.identSynonyms class. They are described in more detail in 23.4.1 Description of Components.
TEI: Simple Editorial Changes⚓︎3.5 Simple Editorial Changes
As in editing a printed text, so in encoding a text in electronic form, it may be necessary to accommodate editorial comment on the text and to render account of any changes made to the text in preparing it. The tags described in this section may be used to record such editorial interventions, whether made by the encoder, by the editor of a printed edition used as a copy text, by earlier editors, or by the copyists of manuscripts.
The tags described here handle most common types of editorial intervention and stereotyped comment; where less structured commentary of other types is to be included, it may be marked using the note element described in section 3.9 Notes, Annotation, and Indexing. Systematic interpretive annotation is also possible using the various methods described in chapter 17 Linking, Segmentation, and Alignment. The examples given here illustrate only simple cases of editorial intervention; in particular, they permit economical encoding of a simple set of alternative readings of a short span of text. To encode multiple views of large or heterogeneous spans of text, the mechanisms described in chapter 17 Linking, Segmentation, and Alignment should be used. To encode multiple witnesses of a particular text, a similar mechanism designed specifically for critical editions is described in chapter 13 Critical Apparatus.
For most of the elements discussed here, some encoders may wish to indicate both a responsibility, that is, a code indicating the person or agency responsible for making the editorial intervention in question, and also an indication of the degree of certainty which the encoder wishes to associate with the intervention. These requirements are served by the att.global.responsibility class, along with att.global.source and att.dimensions. Any of the elements discussed here thus may potentially carry any of the following optional attributes:
- att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text,
the markup or something asserted by the markup, and the degree of certainty associated
with it.
cert (certainty) signifies the degree of certainty associated with the intervention or interpretation. resp (responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber. - att.global.source provides attributes used by elements to point to an external source.
source specifies the source from which some aspect of this element is drawn. - att.editLike provides attributes describing the nature of an encoded scholarly intervention or
interpretation of any kind.
evidence indicates the nature of the evidence supporting the reliability or accuracy of the intervention or interpretation. Suggested values include: 1] internal; 2] external; 3] conjecture - att.dimensions provides attributes for describing the size of physical objects.
unit names the unit used for the measurement Suggested values include: 1] cm (centimetres); 2] mm (millimetres); 3] in (inches); 4] line; 5] char (characters) quantity specifies the length in the units specified extent indicates the size of the object concerned using a project-specific vocabulary combining quantity and units in a single string of words. precision characterizes the precision of the values specified by the other attributes. scope where the measurement summarizes more than one observation, specifies the applicability of this measurement. Sample values include: 1] all; 2] most; 3] range
Many of the elements discussed here can be used in two ways. Their primary purpose is to indicate that the text encoded as the element's content represents an editorial intervention (or non-intervention) of a specific kind, indicated by the element itself. However, pairs or other meaningful groupings of such elements can also be supplied, wrapped within a special purpose choice element:
- choice (choice) groups a number of alternative encodings for the same point in a text.
This element enables the encoder to represent for example a text in its ‘original’ uncorrected and unaltered form, alongside the same text in one or more ‘edited’ forms. This usage permits software to switch automatically between one ‘view’ of a text and another, so that (for example) a stylesheet may be set to display either the text in its original form or after the application of editorial interventions of particular kinds.
Elements which can be combined in this way constitute the model.choicePart class. The default members of this class are sic, corr, reg, orig, unclear, supplied, abbr, expan, ex, am and seg; some of their functions and usage are described further below.
Three categories of editorial intervention are discussed in this section:
- indication or correction of apparent errors
- indication or regularization of variant, irregular, non-standard, or eccentric forms
- editorial additions, suppressions, and omissions
A more extended treatment of the use of these tags in transcriptional and editorial work is given in chapter 12 Representation of Primary Sources.
TEI: Apparent Errors⚓︎3.5.1 Apparent Errors
When the copy text is manifestly faulty, an encoder or transcriber may elect simply to correct it without comment, although for scholarly purposes it will often be more generally useful to record both the correction and the original state of the text. The elements described here enable all three approaches, and allows the last to be done in such a way as make it easy for software to present either the original or the correction.
- sic (Latin for thus or so) contains text reproduced although apparently incorrect or inaccurate.
- corr (correction) contains the correct form of a passage apparently erroneous in the copy text.
The following examples show alternative treatment of the same material. The copy text reads:
mentioned in the main body of the text are incorrect.
mentioned in the main body of the text are incorrect.
<choice>
<corr>dates</corr>
<sic>date's</sic>
</choice> mentioned in the main body of the text are
incorrect.
<choice>
<corr resp="#msm">dates</corr>
<sic>date's</sic>
</choice> mentioned in the main body of the text are
incorrect.
<!-- within the header for this document ... -->
<respStmt xml:id="msm">
<resp>editor</resp>
<name>C.M. Sperberg-McQueen</name>
</respStmt>
<corr cert="high">Autumn</corr>
<sic>Antony</sic>
</choice> it was,
That grew the more by reaping
Where, as here, the correction takes the form of adding text not otherwise present in the text being encoded, the encoder should use the corr element. Where the correction is present in the text being encoded, and consists of some combination of visible additions and deletions, the elements add or del should be used: see further section 3.5.3 Additions, Deletions, and Omissions below. Where the correction takes the form of addition of material not present in the original because of physical damage or illegibility, the supplied element may be used. Where the ‘correction’ is simply a matter of expanding an abbreviation the ex element may be used. These and other elements to support the detailed encoding of authorial or scribal interventions of this kind are all provided by the module described in chapter 12 Representation of Primary Sources.
TEI: Regularization and Normalization⚓︎3.5.2 Regularization and Normalization
When the source text makes extensive use of variant forms or non-standard spellings, it may be desirable for a number of reasons to regularize it: that is, to provide ‘standard’ or ‘regularized’ forms equivalent to the non-standard forms.15
As with other such changes to the copy text, the changes may be made silently (in which case the TEI header should specify the types of silent changes made) or may be explicitly marked using the following elements:
- reg (regularization) contains a reading which has been regularized or normalized in some sense.
- orig (original form) contains a reading which is marked as following the original, rather than being normalized or corrected.
- choice (choice) groups a number of alternative encodings for the same point in a text.
Typical applications for these elements include the production of editions intended for student or lay readers, linguistic research in which spelling or usage variation is not the main question at issue, production of spelling dictionaries, etc.
Consider this 16th-century text:
<orig>overthrowe</orig> so wicked a race the
world may judge: for my part I <orig>thinke</orig>
there <orig>canot</orig> be a greater
<orig>sacryfice</orig> to God</p>
<reg>deed</reg> it is to <reg>overthrow</reg> so wicked a race the
world may judge: for my part I <reg>think</reg>
there <reg>cannot</reg> be a greater
<reg>sacrifice</reg> to God.</p>
<orig>dede</orig>
<reg>deed</reg>
</choice> it is to
<choice>
<orig>overthrowe</orig>
<reg>overthrow</reg>
</choice> so wicked a race the
world may judge: for my part I <choice>
<orig>thinke</orig>
<reg>think</reg>
</choice>
there <choice>
<orig>canot</orig>
<reg>cannot</reg>
</choice> be a greater
<choice>
<orig>sacryfice</orig>
<reg>sacrifice</reg>
</choice> to God.</p>
As elsewhere, the resp attribute may be used to specify the agency responsible for the regularization.
TEI: Additions, Deletions, and Omissions⚓︎3.5.3 Additions, Deletions, and Omissions
The following elements are used to indicate when words or phrases have been omitted from, added to, or marked for deletion from, a text. Like the other editorial elements, they allow for a wide range of editorial practices:
- gap (gap) indicates a point where material has been omitted in a transcription, whether
for editorial reasons described in the TEI header, as part of sampling practice, or
because the material is illegible, invisible, or inaudible.
reason (reason) gives the reason for omission. Suggested values include: 1] cancelled (cancelled); 2] deleted (deleted); 3] editorial (editorial); 4] illegible (illegible); 5] inaudible (inaudible); 6] irrelevant (irrelevant); 7] sampling (sampling) - ellipsis (deliberately marked omission) indicates a purposeful marking in the source document signalling that content has been omitted, and may also supply or describe the omitted content.
- unclear (unclear) contains a word, phrase, or passage which cannot be transcribed with certainty
because it is illegible or inaudible in the source.
reason indicates why the material is hard to transcribe. Suggested values include: 1] illegible (illegible); 2] inaudible (inaudible); 3] faded (faded); 4] background_noise (background noise); 5] eccentric_ductus (eccentric ductus) - add (addition) contains letters, words, or phrases inserted in the source text by an author, scribe, or a previous annotator or corrector.
- del (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, or a previous annotator or corrector.
quantity="2" resp="#editor04"/>
extent="several characters"/>
unit="lines">
<desc>irrelevant commentary</desc>
</gap>
<gap reason="sampling" extent="restOfPage">
<desc>astrological figure</desc>
</gap>
That is, there were two stars on the easterly side and one to the
west; …
<head>Tent supplies</head>
<row>
<cell>
<name>nylon tent</name>
</cell>
<cell>
<num>1</num>
</cell>
</row>
<row>
<cell>
<name>steel stakes</name>
</cell>
<cell>
<num>6</num>
</cell>
</row>
<row>
<cell>
<name>fiberglass poles</name>
</cell>
<cell>
<ellipsis>
<metamark function="ellipsis">"</metamark>
<supplied>
<num>6</num>
</supplied>
</ellipsis>
</cell>
</row>
</table>
<metamark>——</metamark>
</ellipsis> house. It is situated in the county of <ellipsis>
<metamark>——</metamark>
</ellipsis>, on the north-west coast of <placeName>Connaught</placeName>, which I am told is the classic ground of <placeName>Ireland</placeName>.</p>
<l>Amıgas sey eu be<ex>n</ex> dunha molher</l>
<l>Que se trabalha de uosco buscar</l>
<l>Mal a uossamigo polo matar</l>
<l>Mays todaquestamiga ela q<ex>ue</ex>r</l>
<metamark function="signalChorus">꜒</metamark>
<l>Por q<ex>ue</ex> nu<ex>n</ex>ca co<ex>n</ex> el pode poer</l>
<l>Queo podesse por amigauer</l>
</lg>
<lg>
<l>E buscalhi co<ex>n</ex> uosco q<ex>ua</ex>nto mal</l>
<l>Ela mays pode aq<ex>ue</ex>sto sei eu</l>
<l>E todaq<ex>ue</ex>stela faz polo seu</l>
<l>E p<ex>or</ex>este p<unclear>ẏ</unclear>te non por al</l>
<lb/>
<metamark function="signalChorus">꜒</metamark>
<l rend="nobreak" part="I">Por q<ex>ue</ex> nu<ex>n</ex>ca</l>
<ellipsis>
<metamark rend="nobreak"
function="ellipsis">·:—</metamark>
<supplied>
<l rend="nobreak" part="F">co<ex>n</ex> el pode poer</l>
<l>Queo podesse por amigauer</l>
</supplied>
</ellipsis>
</lg>
The add and del elements may be used to record where words or phrases have been added or deleted in the copy text. They are not appropriate where longer passages have been added or deleted, which span several elements; for these, the elements addSpan and delSpan described in chapter 12.3.1.4 Additions and Deletions should be used.
and as to the consequences <add place="above">of
these facts</add> from which this tale takes its title.
The add element should not be used to mark editorial changes, such as supplying a word omitted by mistake from the source text or a passage present in another version. In these cases, either the corr or supplied tags should be used, as discussed above in section 3.5.1 Apparent Errors, and in section 12.3.1.3 Correction and Conjecture, respectively.
The unclear element is used to mark passages in the original which cannot be read with confidence, or about which the transcriber is uncertain for other reasons, as for example when transcribing a partially inaudible or illegible source. Its reason and resp attributes are used, as with the gap element, to indicate the cause of uncertainty and the person responsible for the conjectured reading.
<l>
<unclear reason="ink blot">The</unclear> sea between
yet hence his pray'r prevail'd
</l>
Where the material affected is entirely illegible or inaudible, the gap element discussed above should be used in preference.
<l>I live in the middle of England</l>
<l>But!</l>
<l>Norway! My soul resides in your watery
<del rend="overstrike">fiords fyords fiiords</del>
</l>
<l>Inlets.</l>
<del rend="overtyped">Mein</del> Frisch <del rend="overstrike">schwebt</del> weht der Wind
</l>
<del rend="overstrike">Inviolable</del>
<add place="below">Inexplicable</add>
splendour of Corinthian white and gold
</l>
The del element should not be used where the deletion is such that material cannot be read with confidence, or read at all, or where the material has been omitted by the transcriber or editor for some other reason. Where the material deleted cannot be read with confidence, the unclear tag should be used with the reason attribute indicating that the difficulty of transcription is due to deletion. Where material has been omitted by the transcriber or editor, this may be indicated by use of the gap element. A deletion in which some parts may be read but not others may thus be represented by one or more gap elements intermingled with text, all contained by a del element. Text supplied or marked as unneccessary by an editor should be marked with the supplied and surplus elements (discussed in 12.3.1.7 Text Omitted from or Supplied in the Transcription) rather than add and del. These two sets of elements allow the encoder to distinguish editorial changes from those visible in the source text.
TEI: Names, Numbers, Dates, Abbreviations, and Addresses⚓︎3.6 Names, Numbers, Dates, Abbreviations, and Addresses
This section describes a number of textual features which it is often convenient to distinguish from their surrounding text. Names, dates, and numbers are likely to be of particular importance to the scholar treating a text as source for a database; distinguishing such items from the surrounding text is however equally important to the scholar primarily interested in lexis.
The treatment of these textual features proposed here is not intended to be exhaustive: fuller treatments for names, numbers, measures, and dates are provided in the names and dates module (see chapter 14 Names, Dates, People, and Places); more detailed treatment of abbreviations is provided by the transcription module (see section 12.3.1.2 Abbreviation and Expansion).
TEI: Referring Strings⚓︎3.6.1 Referring Strings
A referring string is a phrase which refers to some person, place, object, etc. Two elements are provided to mark such strings:
- rs (referencing string) contains a general purpose name or referring string.
- name (name, proper noun) contains a proper noun or noun phrase.
Both the name and rs elements are members of the att.typed class, from which they inherit the following attributes:
- att.typed provides attributes that can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology. subtype (subtype) provides a sub-categorization of the element, if needed.
which may be used to further categorize the kind of object referred to.
<q>My dear
<rs type="person">Mr. Bennet</rs>
</q>, said his lady to
him one day, <q>have you heard that <rs type="place">Netherfield Park</rs> is let at last?</q>
</p>
<rs type="org">Watering Committee</rs>.
They were paid a commission not exceeding four per
cent, and gave bond.</p>
<rs type="org">Circumlocution Office</rs> never, on any
account whatsoever, to give a straightforward answer,
<rs type="person">Mr Barnacle</rs> said, <q>Possibly.</q>
</p>
<q>My dear <rs type="person">Mr. Bennet</rs>
</q>, said
<rs type="person">his lady</rs> to him one day ...
</p>
<q>My dear <name type="person">Mr. Bennet</name>,</q> said <rs type="person">his lady</rs> to him one day,
<q>have you heard that <name type="place">Netherfield Park</name> is let at last?</q>
</p>
Simply tagging something as a name is generally not enough to enable automatic processing of personal names into the canonical forms usually required for reference purposes. The name as it appears in the text may be inconsistently spelled, partial, or vague. Moreover, name prefixes such as van or de la may or may not be included as part of the reference form of a name, depending on the language and country of origin of the bearer.
Two issues arise in this context: firstly, there may be a need to encode a regularized form of a name, distinct from the actual form in the source to hand; secondly, there may be a need to identify the particular person, place, etc. referred to by the name, irrespective of whether the name itself is normalized or not. The element reg, introduced in 3.5.2 Regularization and Normalization is provided for the former purpose; the attributes key or ref for the latter.
The key and ref attributes are common to all members of the att.canonical class and are defined as follows:
- att.canonical provides attributes that can be used to associate a representation such as a name
or title with canonical information about the object being named or referenced.
key provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind. ref (reference) provides an explicit means of locating a full definition or identity for the entity being named by means of one or more URIs.
<rs key="BENM1" type="person">Mr. Bennet</rs>,</q> said
<rs key="BENM2" type="person">his lady</rs> to him one day,
<q>have you heard that
<rs key="NETP1" type="place">Netherfield Park</rs> is let at
last?</q>
<name key="VOM1" type="person">Mme. de Volanges</name>
marie <rs key="VOM2">sa fille</rs>:
c'est encore un secret;
mais elle m'en a fait part hier.
</p>
The standard reference source should be documented, for example using a taxonomy element in the TEI header.
<name ref="http://en.wikipedia.org/wiki/Heathrow_airport"
type="airport">Heathrow</name>
</p>
the administration of <rs key="POJA1" type="person">Col. Polk
(<reg>Polk, James K.</reg>)</rs> has but poorly compensated me for the
suspended enjoyments and pursuits of private and professional
spheres</p>
<name ref="tag:projectname.org,2012:VOM1"
type="person">Mme. de Volanges</name> marie <rs ref="tag:theworksoflaclos.org,2012:VOM2">sa fille</rs>: c'est encore un secret;
mais elle m'en a fait part hier.
</p>
<name ref="tag:projectname.org,2012:WADLM1"
type="person">
<choice>
<orig>Walter de la Mare</orig>
<reg>de la Mare, Walter</reg>
</choice>
</name>
was born at <name ref="tag:projectname.org,2012:Ch1"
type="place">Charlton</name>, in
<name ref="tag:projectname.org,2012:KT1"
type="county">Kent</name>, in 1873.
</p>
<name type="place">Montaillou</name> is not a large parish.
At the time of the events which led to
<name type="person">Fournier<index>
<term>Benedict XII, Pope of Avignon (Jacques Fournier)</term>
</index>
</name>'s
investigations, the local population consisted of between 200 and 250 inhabitants.
</p>
TEI: Addresses⚓︎3.6.2 Addresses
These Guidelines propose the following elements to distinguish postal and electronic addresses:
- address (address) contains a postal address, for example of a publisher, an organization, or an individual.
- email (electronic mail address) contains an email address identifying a location to which email messages can be delivered.
These two elements constitute the class of model.addressLike elements; for other kinds of address this class may be extended by adding new elements if necessary.
- addrLine (address line) contains one line of a postal address.
<addrLine>110 Southmoor Road,</addrLine>
<addrLine>Oxford OX2 6RB,</addrLine>
<addrLine>UK</addrLine>
</address>
Alternatively, an address may be encoded as a structure of more semantically rich elements. The class model.addrPart element class identifies a number of such possible components:
- street contains a full street address including any name or number identifying a building as well as the name of the street or route on which it is located.
- name (name, proper noun) contains a proper noun or noun phrase.
- postCode (postal code) contains a numerical or alphanumeric code used as part of a postal address to simplify sorting or delivery of mail.
- postBox (postal box or post office box) contains a number or other identifier for some postal delivery point other than a street address.
- model.nameLike groups elements which name or refer to a person, place, or organization.
model.nameLike.agent groups elements which contain names of individuals or corporate bodies. model.offsetLike groups elements which can appear only as part of a place name. model.persNamePart groups elements which form part of a personal name. model.placeStateLike groups elements which describe changing states of a place. eventName (name of an event) contains a proper noun or noun phrase used to refer to an event. idno (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way. lang (language name) contains the name of a language mentioned in etymological or other linguistic discussion. objectName (name of an object) contains a proper noun or noun phrase used to refer to an object. rs (referencing string) contains a general purpose name or referring string. - model.persNamePart groups elements which form part of a personal name.
addName (additional name) contains an additional name component, such as a nickname, epithet, or alias, or any other descriptive phrase used within a personal name. forename (forename) contains a forename, given or baptismal name. genName (generational name component) contains a name component used to distinguish otherwise similar names on the basis of the relative ages or generations of the persons named. nameLink (name link) contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of. persPronouns (personal pronouns) indicates the personal pronouns used, or assumed to be used, by the individual being described. roleName (role name) contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank. surname (surname) contains a family (inherited) name, as opposed to a given, baptismal, or nick name. - model.placeNamePart groups elements which form part of a place name.
bloc (bloc) contains the name of a geo-political unit consisting of two or more nation states or countries. country (country) contains the name of a geo-political unit, such as a nation, country, colony, or commonwealth, larger than or administratively superior to a region and smaller than a bloc. district (district) contains the name of any kind of subdivision of a settlement, such as a parish, ward, or other administrative or geographic unit. geogName (geographical name) identifies a name associated with some geographical feature such as Windrush Valley or Mount Sinai. placeName (place name) contains an absolute or relative place name. region (region) contains the name of an administrative unit such as a state, province, or county, larger than a settlement, but smaller than a country. settlement (settlement) contains the name of a settlement such as a city, town, or village identified as a single geo-political or administrative unit.
Any number of elements from the model.addrPart class may appear within an address and in any order. None of them is required.
Where code letters are commonly used in addresses (for example, to identify regions or countries) a useful practice is to supply the full name of the region or country as the content of the element, but to supply the abbreviatory code as the value of the global n attribute, so that (for example) an application preparing formatted labels can readily find the required information. Other components of addresses may be represented using the general-purpose name element or (when the additional module for names and dates is included) the more specialized elements provided for that purpose.
<street>110 Southmoor Road</street>
<name type="city">Oxford</name>
<postCode>OX2 6RB</postCode>
<name type="country">United Kingdom</name>
</address>
<name type="org">Università di Bologna</name>
<name type="country">Italy</name>
<postCode>40126</postCode>
<name type="city">Bologna</name>
<street>via Marsala 24</street>
</address>
tel
URI scheme:
For further discussion of ways of regularizing the names of places, see section 3.6 Names, Numbers, Dates, Abbreviations, and Addresses. A full postal address may also include the name of the addressee, tagged as above using the general purpose name element.
<street>110 Southmoor Road</street>
<settlement>Oxford</settlement>
<postCode>OX2 6RB</postCode>
<country>United Kingdom</country>
</address>
TEI: Numbers and Measures⚓︎3.6.3 Numbers and Measures
This section describes elements provided for the simple encoding of numbers and measurements and gives some indication of circumstances in which this may usefully be done. The following phrase level elements are provided for this purpose:
- num (number) contains a number, written in any form.
type indicates the type of numeric value. Suggested values include: 1] cardinal; 2] ordinal; 3] fraction; 4] percentage value supplies the value of the number in standard form. - measure (measure) contains a word or phrase referring to some quantity of an object or commodity,
usually comprising a number, a unit, and a commodity name.
type specifies the type of measurement in any convenient typology. - measureGrp (measure group) contains a group of dimensional specifications which relate to the same object, for example the height and width of a manuscript page.
Like names or abbreviations, numbers can occur virtually anywhere in a text. Numbers are special in that they can be written with either letters or digits (twenty-one, xxi, and 21) and their presentation is language-dependent (e.g. English 5th becomes Greek 5.; English 123,456.78 equals French 123.456,78).
For many kinds of application, e.g. natural-language processing or machine translation, numbers are not regarded as ‘lexical’ in the same way as other parts of a text. For these and other applications, the num element provides a convenient method of distinguishing numbers from the surrounding text. For other kinds of application, numbers are only useful if normalized: here the num element is useful precisely because it provides a standardized way of representing a numerical value.
<num type="cardinal" value="21">twenty-one</num>
<num type="percentage" value="10">ten percent</num>
<num type="percentage" value="10">10%</num>
<num type="ordinal" value="5">5th</num>
<num type="fraction" value="0.5">1/2</num>
Sometimes it may be desired to mark something as numerical which cannot be accurately normalized, for example an expression such as dozens; less frequently the number may be recognisable linguistically as such but may use a notation with which the encoder is unfamiliar. To help in these situations, the num element may also bear either or both of the following attributes from the att.ranging class:
- att.ranging provides attributes for describing numerical ranges.
atLeast gives a minimum estimated value for the approximate measurement. atMost gives a maximum estimated value for the approximate measurement.
In its fullest form, a measure consists of a number, a phrase expressing units of measure and a phrase expressing the commodity being measured, though not all of these components need be present in every case. It may be helpful to distinguish measures from surrounding text for two reasons. Firstly, a measure may be expressed using a particular notation or system of abbreviations which the encoder does not wish to regard as lexical. Secondly, a quantitative application may wish to distinguish and normalize the internal components of a measure, in order to perform calculations on them.
<list type="gloss">
<label>Age</label>
<item>Unimportant</item>
<label>Head</label>
<item>Small and round</item>
<label>Eyes</label>
<item>Green</item>
<label>Complexion</label>
<item>White</item>
<label>Hair</label>
<item>yellow</item>
<label>Features</label>
<item>Mobile</item>
<label>Neck</label>
<item>
<measure>13¾"</measure>
</item>
<label>Upper arm</label>
<item>
<measure>11"</measure>
</item>
<!--...-->
</list>
<!-- ... -->
</div>
- att.measurement provides attributes to represent a regularized or normalized measurement.
quantity (quantity) specifies the number of the specified units that comprise the measurement unit (unit) indicates the units used for the measurement, usually using the standard symbol for the desired units. Suggested values include: 1] m (metre); 2] kg (kilogram); 3] s (second); 4] Hz (hertz); 5] Pa (pascal); 6] Ω (ohm); 7] L (litre); 8] t (tonne); 9] ha (hectare); 10] Å (ångström); 11] mL (millilitre); 12] cm (centimetre); 13] dB (decibel); 14] kbit (kilobit); 15] Kibit (kibibit); 16] kB (kilobyte); 17] KiB (kibibyte); 18] MB (megabyte); 19] MiB (mebibyte) commodity (commodity) indicates the substance that is being measured
<item>
<measure type="volume" quantity="2"
unit="bag" commodity="hops">ii bags hops</measure>
</item>
<item>
<measure type="volume" quantity="6"
unit="truss" commodity="cloth">six trusses Woolen and linen goods</measure>
</item>
<item>
<measure type="weight" quantity="5"
unit="ton" commodity="coal">5 tonnes coale</measure>
</item>
</list>
- att.ranging provides attributes for describing numerical ranges.
min where the measurement summarizes more than one observation or a range, supplies the minimum value observed. max where the measurement summarizes more than one observation or a range, supplies the maximum value observed. atLeast gives a minimum estimated value for the approximate measurement. atMost gives a maximum estimated value for the approximate measurement.
<lb/>holding <measure unit="mL" commodity="blood"
min="85.2">more than three ounces</measure>.
[...] Then
<lb/>we may suppose in man that a single heart beat
<lb/>would force out either
<measure unit="mL" commodity="blood"
max="13.2" min="3.7">a half ounce, three drams,
<lb/>or even one dram of blood</measure>, which because of the
<lb/>valvular block could not flow back that way into
<lb/>the heart.</p>
<measure type="height" quantity="14">xiv</measure>
<measure type="width" quantity="5">v</measure>
<measure type="depth" quantity="10">x</measure>
</measureGrp>
- unit contains a symbol, a word or a phrase referring to a unit of measurement in any kind of formal or informal system.
- att.measurement provides attributes to represent a regularized or normalized measurement.
unit (unit) indicates the units used for the measurement, usually using the standard symbol for the desired units. Suggested values include: 1] m (metre); 2] kg (kilogram); 3] s (second); 4] Hz (hertz); 5] Pa (pascal); 6] Ω (ohm); 7] L (litre); 8] t (tonne); 9] ha (hectare); 10] Å (ångström); 11] mL (millilitre); 12] cm (centimetre); 13] dB (decibel); 14] kbit (kilobit); 15] Kibit (kibibit); 16] kB (kilobyte); 17] KiB (kibibyte); 18] MB (megabyte); 19] MiB (mebibyte) - att.typed provides attributes that can be used to classify or subclassify elements in any way.
- att.global provides attributes common to all elements in the TEI encoding scheme.
<num>1</num>, <num>2</num>, <num>5</num>, <num>7</num>
<unit type="length" unit="mm">millimètres</unit>
</measure>
<unit type="rate" unit="cm/s">
<unit type="space">cm</unit> per <unit type="time">second</unit>
</unit>.</p>
TEI: Dates and Times⚓︎3.6.4 Dates and Times
Dates and times, like numbers, can appear in widely varying culture- and language-dependent forms, and can pose similar problems in automatic language processing. Such elements constitute the model.dateLike class, of which the default members are:
- date (date) contains a date in any format.
- time (time) contains a phrase defining a time of day in any format.
These elements have some additional attributes by virtue of being members of the att.datable and att.duration classes which, in turn, are members of the att.datable.w3c and att.duration.w3c classes. In particular, the when and calendar attributes will be discussed here:
- att.datable.w3c provides attributes for normalization of elements that contain datable events conforming
to the W3C XML Schema Part 2: Datatypes Second Edition.
when supplies the value of the date or time in a standard form, e.g. yyyy-mm-dd. - att.calendarSystem provides attributes for indicating calendar systems to which a date belongs.
calendar indicates one or more systems or calendars to which the date represented by the content of this element belongs.
Dates can occur virtually anywhere in a text, but in some contexts (e.g. bibliographic citations) their encoding is recommended or required rather than optional. Times can also appear anywhere but encoding these is more generally optional.
Partial dates or times (e.g. 1990, September 1990, twelvish) can be expressed in the when attribute by simply omitting a part of the value supplied. Imprecise dates or times (for example early August, some time after ten and before twelve) may be expressed as date or time ranges.
These mechanisms are useful primarily for fully specified dates or times known with certainty. If component parts of dates or times are to be marked up, or if a more complex analysis of the meaning of a temporal expression is required, the techniques described in chapter 14 Names, Dates, People, and Places should be used in preference to the simple method outlined here.
Where the certainty (i.e. reliability) of the date or time is in question, the encoder should record this fact using the mechanisms discussed in chapter 22 Certainty, Precision, and Responsibility. The same chapter also discusses various methods of recording the precision of numerical or temporal assertions.
in the Year of Our Lord One Thousand Nine Hundred and
Seventy-seven of the Republic the Two Hundredth and first
and of the University the Eighty-Sixth.</date>
<date when="2001-09">September 2001</date>
<date when="2001-09-11">11 Sep 01</date>
<date when="--09-11">9/11</date>
<date when="--09">September</date>
<date when="---11">Eleventh of the month</date>
<time when="08:48:00">8:48</time>
<date when="2001-09-11T12:48:00">Sept 11th, 12 minutes before 9 am</date>
<date from="1918" to="1923">1918 to 1923</date>
— had been, he suspected,
somehow very important.</p>
manuscript (Codex Regius 2365) from
<date notBefore="1250" notAfter="1300">the second half of the
thirteenth century</date>, and <title>Hervarar
saga</title> dates from <date when="1300">around 1300</date>.</p>
The calendar attribute may be used to specify a date in any calendar system; if the when attribute is also supplied, it should specify the equivalent date in the Gregorian calendar.
TEI: Abbreviations and Their Expansions⚓︎3.6.5 Abbreviations and Their Expansions
It is sometimes desirable to mark abbreviations in the copy text, whether to trigger special processing for them, to provide the full form of the word or phrase abbreviated, or to allow for different possible expansions of the abbreviation. Abbreviations may be transcribed as they stand, or expanded; they may be left unmarked, or marked using these tags:
- abbr (abbreviation) contains an abbreviation of any sort.
- expan (expansion) contains the expansion of an abbreviation.
the identity of a <abbr>CC</abbr> is defined by that calibration of values which
motivates the elements of its <abbr>GSP</abbr>; ...
languages is currently nailing on <abbr>OOP</abbr> extensions.
<abbr type="initial">M.</abbr> Deegan is
the Director of the <abbr type="acronym">CTI</abbr> Centre for Textual Studies.
the Director of the <abbr>CTI</abbr> Centre for Textual Studies.
<abbr>RELAX NG</abbr>
<expan>regular
language for <choice>
<abbr>XML</abbr>
<expan>extensible markup
language</expan>
</choice>, next
generation</expan>
</choice>
Abbreviation is a particularly important feature of manuscript and other source materials, the transcription of which needs more detailed treatment than is possible using these simple elements. A more detailed set of recommendations is discussed in 12.3.1 Altered, Corrected, and Erroneous Texts, which includes additional elements made available for the purpose by the transcr module.
TEI: Simple Links and Cross-References⚓︎3.7 Simple Links and Cross-References
Cross-references or links between one location in a document and one or more other locations, either in the same or different XML documents, may be encoded using the elements ptr and ref, as discussed in this section. These elements both ‘point’ from one location in a document, the place that the element itself appears, to another (or to several), specified by means of a target attribute, supplied by the att.pointing class:
- att.pointing provides a set of attributes used by all elements which point to other elements by
means of one or more URI references.
target specifies the destination of the reference by supplying one or more URI References.
Linkages of several other kinds are also provided for in these guidelines; see further chapter 17 Linking, Segmentation, and Alignment.
The complete XPointer specification is managed by the W3C<note place="foot">
<ptr target="https://www.w3.org/TR/xptr-framework/"/>,
<ptr target="https://www.w3.org/TR/xptr-element/"/>,
<ptr target="https://www.w3.org/TR/xptr-xmlns/"/>, and
<ptr target="https://www.w3.org/TR/xptr-xpointer/"/>
</note>;
for a discussion of TEI schemes for XPointer, see
<ptr target="#eSATS"/>.</p>
<!--... -->
<div xml:id="eSATS">
<!--... -->
</div>
For an introduction to the use of links in general, see 17 Linking, Segmentation, and Alignment. The complete XPointer specification is managed by the W3C17; for a discussion of TEI schemes for XPointer, see 17.2.4 TEI XPointer Schemes.
- ptr (pointer) defines a pointer to another location.
- ref (reference) defines a reference to another location, possibly modified by additional text or comment.
The elements ptr and ref are the default members of the phrase-level model class model.ptrLike. As members of the classes att.pointing, att.typed, att.cReferencing, and att.internetMedia they also carry the following attributes:
- att.pointing provides a set of attributes used by all elements which point to other elements by
means of one or more URI references.
target specifies the destination of the reference by supplying one or more URI References. evaluate (evaluate) specifies the intended meaning when the target of a pointer is itself a pointer. - att.cReferencing provides attributes that may be used to supply a canonical reference as a means of identifying the target of a pointer.
cRef (canonical reference) specifies the destination of the pointer by supplying a canonical reference expressed using the scheme defined in a refsDecl element in the TEI header. - att.typed provides attributes that can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology. subtype (subtype) provides a sub-categorization of the element, if needed. - att.internetMedia provides attributes for specifying the type of a computer resource using a standard
taxonomy.
mimeType (MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type.
The cRef attribute may be used to express the target of a cross reference using some canonical referencing scheme, such as those typically used for ancient texts. In this case, the referencing scheme must be defined using the cRefPattern or citeStructure elements discussed below (3.11.4 Declaring Reference Systems); the definition these provide may be used to translate the value of the cRef attribute into a conventional pointer value, such as one that might be supplied by the target attribute. It is an error to supply both cRef and target values.
<item>Saints aid rejected in mel. <ptr target="#p299"/>
</item>
<item>Sallets censured <ptr target="#p143 #p144"/>
</item>
<item>Sanguine mel. signs <ptr target="#p263"/>
</item>
<item>Scilla or sea onyon, a purger of mel. <ptr target="#p442"/>
</item>
</list>
...
<pb xml:id="p144"/>
...
<pb xml:id="p263"/>
...
<pb xml:id="p299"/>
...
<pb xml:id="p442"/>
...
<!-- ... -->
<note xml:id="a51" type="footnote">text of annotation</note>
<term rend="ldquo rdquo">rewriting systems</term>, have a long history
among mathematicians, but the specific form of <ptr target="#fig22"/>
was first studied extensively by Chomsky <ptr type="bibliog" target="#chom59"/>.
<!-- ... -->
<figure xml:id="fig22">
<graphic url="fig22.jpg"/>
</figure>
<!-- elsewhere, in the bibliography -->
<bibl xml:id="chom59">
<!-- citation for the book referenced above -->
</bibl>
The ptr and ref elements have many applications in addition to the simple cross-referencing facilities illustrated in this section. In conjunction with the analytic tools discussed in chapters 17 Linking, Segmentation, and Alignment, 18 Simple Analytic Mechanisms, and 19 Feature Structures, they may be used to link analyses of a text to their object, to combine corresponding segments of a text, or to align segments of a text with a temporal or other axis or with each other.
is available in the TEI GitHub Repository; <ref target="https://github.com/TEIC/TEI/blob/dev/P5/Source/guidelines-en.xml"
mimeType="application/tei+xml">guidelines-en.xml</ref>
is the root document used to create the English version
of these Guidelines.</p>
TEI: Lists⚓︎3.8 Lists
The following elements are provided for the encoding of lists, their constituent items, and the labels or headings associated with them:
- list (list) contains any sequence of items organized as a list.
- item (item) contains one component of a list.
- label (label) contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary.
- head (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc.
- headLabel (heading for list labels) contains the heading for the label or term column in a glossary list or similar structured list.
- headItem (heading for list items) contains the heading for the item or gloss column in a glossary list or similar structured list.
The list element may be used to mark any kind of list: numbered, lettered, bulleted, or unmarked. Lists formatted as such in the copy text should in general be encoded using this element, with an appropriate value for the rend attribute. Suggested values for rend include:
- bulleted (items preceded by bullets or similar markings)
- inline (items rendered within continuous prose, with no linebreaks)
- numbered (items preceded by numbers or letters)
- simple (items rendered as blocks, but with no bullet or number)
Some of these values may of course be combined; a list may be inline, but also be rendered with numbers. An example appears below. For more sophisticated and detailed description of list rendering, consider using the style attribute with Cascading Stylesheet properties and values, as described in the W3C's CSS Lists and Counters Module Level 3.
the composition of six, or even five quartos.
<list rend="inline numbered">
<label>(1)</label>
<item>My first rough manuscript, without any
intermediate copy, has been sent to the press.</item>
<label>(2)</label>
<item>Not a sheet has been seen by any human
eyes, excepting those of the author and the printer:
the faults and the merits are exclusively my own.</item>
</list>
the composition of six, or even five quartos.
<list rend="inline numbered">
<item n="1">My first rough manuscript, without any
intermediate copy, has been sent to the press.</item>
<item n="2">Not a sheet has been seen by any human
eyes, excepting those of the author and the printer:
the faults and the merits are exclusively my own.</item>
</list>
divided into <list rend="inline">
<item n="a">those that belong to the Emperor, </item>
<item n="b">embalmed ones, </item>
<item n="c">those that are trained, </item>
<item n="d">suckling pigs, </item>
<item n="e">mermaids, </item>
<item n="f">fabulous ones, </item>
<item n="g">stray dogs, </item>
<item n="h">those that are included in this classification, </item>
<item n="i">those that tremble as if they were mad, </item>
<item n="j">innumerable ones, </item>
<item n="k">those drawn with a very fine camel's-hair brush, </item>
<item n="l">others, </item>
<item n="m">those that have just broken a flower vase, </item>
<item n="n">those that resemble flies from a distance. </item>
</list>
<head>Report of the conduct and progress of Ernest Pontifex.
Upper Vth form — half term ending Midsummer 1851</head>
<label>Classics</label>
<item>Idle listless and unimproving</item>
<label>Mathematics</label>
<item>ditto</item>
<label>Divinity</label>
<item>ditto</item>
<label>Conduct in house</label>
<item>Orderly</item>
<label>General conduct</label>
<item>Not satisfactory, on account of his great
unpunctuality and inattention to duties</item>
</list>
type="gloss"
not to have labels. For example:
<head>Unit Three — Vocabulary</head>
<label xml:lang="la">acerbus, -a, -um </label>
<item>bitter, harsh</item>
<label xml:lang="la">ager, agrī, M. </label>
<item>field</item>
<label xml:lang="la">audiō, īre,
īvī, ītus </label>
<item>hear, listen (to)</item>
<label xml:lang="la">bellum, -ī, N. </label>
<item>war</item>
<label xml:lang="la">bonus, -a, -um </label>
<item>good</item>
</list>
<head>Unit Three — Vocabulary</head>
<label>
<term xml:lang="la">acerbus, -a, -um</term>
</label>
<item>
<gloss>bitter, harsh</gloss>
</item>
<label>
<term xml:lang="la">ager, agrī, M. </term>
</label>
<item>
<gloss>field</gloss>
</item>
<label>
<term xml:lang="la">audiō, -īre, -īvī, -ītus</term>
</label>
<item>
<gloss>hear, listen (to)</gloss>
</item>
<label>
<term xml:lang="la">bellum, -ī, N. </term>
</label>
<item>
<gloss>war</gloss>
</item>
<label>
<term xml:lang="la">bonus, -a, -um</term>
</label>
<item>
<gloss>good</gloss>
</item>
</list>
preferable to the use of a worn-out expression.
<list type="gloss">
<headLabel>TRITE</headLabel>
<headItem>SIMPLE, STRAIGHTFORWARD</headItem>
<label>bury the hatchet </label>
<item>stop fighting, make peace</item>
<label>at loose ends </label>
<item>disorganized</item>
<label>on speaking terms </label>
<item>friendly</item>
<label>fair and square </label>
<item>completely honest</item>
<label>at death's door </label>
<item>near death</item>
</list>
<label>EVIL</label>
<item>
<list rend="bulleted">
<item>I am cast upon a horrible desolate island, void
of all hope of recovery.</item>
<item>I am singled out and separated as it were from
all the world to be miserable.</item>
<item>I am divided from mankind — a solitaire; one
banished from human society.</item>
</list>
</item>
<label>GOOD</label>
<item>
<list rend="bulleted">
<item>But I am alive; and not drowned, as all my
ship's company were.</item>
<item>But I am singled out, too, from all the ship's
crew, to be spared from death...</item>
<item>But I am not starved, and perishing on a barren place,
affording no sustenances....</item>
</list>
</item>
</list>
Lists of different types may be nested to arbitrary depths in this way.
TEI: Notes, Annotation, and Indexing⚓︎3.9 Notes, Annotation, and Indexing
TEI: Notes and Simple Annotation⚓︎3.9.1 Notes and Simple Annotation
The following element is provided for the encoding of discursive notes, whether already present in the copy text or supplied by the encoder:
- note (note) contains a note or annotation.
A note is any additional comment found in a text, marked in some way as being out of the main textual stream. All notes should be marked using the same tag, note, whether they appear as block notes in the main text area, at the foot of the page, at the end of the chapter or volume, in the margin, or in some other place.
Notes may be in a different hand or typeface, may be authorial or editorial, and may have been added later. Attributes may be used to specify these and other characteristics of notes, as detailed below.
A note is usually attached to a specific point or span within a text, which we term here its point of attachment. In conventional printed text, the point of attachment is represented by some siglum such as a star or cross, or a superscript digit.
When encoding such a text, it is conventional to replace this siglum by the content of the annotation, duly marked up with a note element. This may not always be possible for example with marginal notes, which may not be anchored to an exact location. For ease of processing, it may be adequate to position marginal notes before the relevant paragraph or other element. In printed texts, it is sometimes conventional to group notes together at the foot of the page on which their points of attachment appear. This practice is not generally recommended for TEI-encoded texts, since the pagination of a particular printed text is unlikely to be of structural significance. In some cases, however, it may be desirable to transcribe notes not at their point of attachment to the text but at their point of appearance, typically at the end of the volume, or the end of the chapter. In such cases, the target attribute of the note may be used to indicate the point of attachment. It is also possible to encode the point of attachment itself, using the ptr or ref element, pointing from that to the body of the note placed elsewhere.
In cases where the note is applied not to a point but to a span of text, not itself represented as a TEI element, the target attribute may use an appropriate pointer expression, for example using the range() function to specify the span of attachment.
For further discussion of pointing to points and spans in the text, see section 3.7 Simple Links and Cross-References.
<l>And from my neck so free</l>
<l>The albatross fell off, and sank</l>
<l>Like lead into the sea.
<note type="gloss" place="margin">The spell begins to break</note>
</l>
<l>The self-same moment I could pray</l>
<l>And from my neck so free</l>
<l>The albatross fell off, and sank</l>
<l>Like lead into the sea.</l>
<note type="gloss" place="margin">The spell begins to break</note>
</lg>
<l>The self-same moment I could pray</l>
<l>And from my neck so free</l>
<l>The albatross fell off, and sank</l>
<l>Like lead into the sea.</l>
<label place="margin">The spell begins to break</label>
</lg>
distinct entities or objects of any sort.<note n="1" place="bottom">We
explain below why we use the uncommon term
<mentioned>collection</mentioned> instead of the expected
<mentioned>set</mentioned>. Our usage corresponds to the
<mentioned>aggregate</mentioned> of many mathematical writings and to
the sense of <mentioned>class</mentioned> found in older logical
writings.</note> The elements ...
In addition to transcribing notes already present in the copy text, researchers may wish to add their own notes or comments to it. The note element may be used for either purpose, but it will usually be advisable to distinguish the two categories. One way might be to use the type attribute shown above, categorizing notes as authorial, editorial, etc. Where notes derive from many sources, or where a more precise attribution is required, the resp attribute may be used to point to a definition of the person or other agency responsible for the content of the note.
<!-- ... -->
<l>The self-same moment I could pray;
<note place="margin" resp="#STC"
type="gloss">The spell begins to break</note>
<note place="bottom" resp="#JLL">The turning point of the poem...</note>
</l>
</lg>
#JLL
and #STC
must point to some more information identifying the agency concerned. The syntax used
is identical to that used for other cross-references, as discussed in 3.7 Simple Links and Cross-References; thus in this case, the TEI header for this text might contain a title statement
like the following:
<title>The Rime of the Ancient Mariner: an annotated edition</title>
<author xml:id="STC">Samuel Taylor Coleridge</author>
<editor xml:id="JLL">John Livingston Lowes</editor>
</titleStmt>
When annotating the electronic text by means of analytic notes in some structured vocabulary, e.g. to specify the topics or themes of a text, the span and interp elements may be more effective than the free form note element; these elements are available when the module for simple analysis is selected (see section 18.3 Spans and Interpretations).
TEI: Encoding Grouped Notes⚓︎3.9.1.1 Encoding Grouped Notes
The following element is provided for the grouping of notes:
- noteGrp (note group) contains a group of notes.
A text may have multiple alternative versions of the same note, such as the same annotation expressed in multiple languages, or both an extensive note and a short form for different audiences. In such cases multiple note elements may be grouped within a noteGrp element.
Typically, the note elements within a noteGrp would be differentiated by use of attributes such as xml:lang or type, while sharing the same point of attachment. This differentiation can be made either implicitly in case of inline notes, or explicitly via a target attribute, which may be specified on the noteGrp itself.
in duplicibus Quatuortemporibus
<noteGrp>
<note type="short">Quatuor Tempora, so called dry fast days.</note>
<note type="full">Quatuor Tempora, so called dry fast days (Wednesday, Friday, and Saturday)
falling on each of the quarters of the year. In the first quarter they were called Cinerum
(following Ash Wednesday), second Spiritus (following Pentecost), third Crucis
(after the Exaltation of the Holy Cross, September 14th), and Luciae
in the fourth (after the feast of St. Lucia, December 13th).
</note>
</noteGrp>
totaliter expediui.
</p>
TEI: Index Entries⚓︎3.9.2 Index Entries
The indexing of scholarly texts is a skilled activity, involving substantial amounts of human judgment and analysis. It should not therefore be assumed that simple searching and information retrieval software will be able to meet all the needs addressed by a well-crafted manual index, although it may complement them for example by providing free text search. The role of an index is to provide access via keywords and phrases which are not necessarily present in the text itself, but must be added by the skill of the indexer.
TEI: Pre-existing Indexes⚓︎3.9.2.1 Pre-existing Indexes
<!--...-->
<list type="index">
<item>Women, how cause of mel. <ref>193</ref>; their vanity in
apparell taxed, <ref>527</ref>; their counterfeit tears
<ref>547</ref>; their vices <ref>601</ref>, commended,
<ref>624</ref>.</item>
<item>Wormwood, good against mel. <ref>443</ref>
</item>
<item>World taxed, <ref>181</ref>
</item>
<item>Writers of the cure of mel. 295</item>
<!--...-->
</list>
</div>
<list>
<item>how cause of mel. <ref>193</ref>;</item>
<item>their vanity in apparell taxed, <ref>527</ref>;</item>
<item>their counterfeit tears <ref>547</ref>;</item>
<item>their vices
<list>
<item>
<ref>601</ref>,</item>
<item> commended, <ref>624</ref>.</item>
</list>
</item>
</list>
</item>
<!-- in the text --><pb xml:id="P624"/>
<!-- start of page 624 -->
<!-- in the index -->
<ref target="#P624">624</ref>
TEI: Auto-generated Indexes⚓︎3.9.2.2 Auto-generated Indexes
It can also be useful, however, to generate a new index from a machine-readable text, whether the text is being written for the first time with the tags here defined, or as an addition to a text transcribed from some other source. Depending on the complexity of the text and its subject matter, such an automatically-generated index may not in itself satisfy all the needs of scholarly users. However it can assist a professional indexer to construct a fully adequate index, which might then be post-edited into the digital text, marked-up along the lines already suggested for preserving pre-existing index material.
Indexes generally contain both references to specific pages or sections and references to page ranges or sequences. The same element is used in either case:
- index (index entry) marks a location to be indexed for whatever purpose.
Like the interp element described in 18.3 Spans and Interpretations this element may be used simply to provide descriptive or interpretive label of some kind for any location within a text, to be processed in any way by analytic software, but its main purpose is to facilitate the generation of an index for a printed version of the text. An index element may be placed anywhere within a text, between or within other elements. The headwords to be used when making up this index are given by the term elements within the index element. The location of the generated index might be specified by means of a processing instruction within the text, such as the following (the exact form of the PI is of course dependent on the application software in use):
<?tei indexplacement ?>⚓
Alternatively, the special purpose divGen element might be used.
<index>
<term>Lemmatization, Arabic</term>
</index>and are beginning to build parsers.</p>
The effect of this is to document an index entry for the term ‘Lemmatization, Arabic’, which when processed could reference the location of the original index element.
topic of Arabic lemmatisation
<index spanTo="#ALAMEND">
<term>Lemmatization, Arabic</term>
</index> concerning which it is important to note [...]
<!-- much learned material omitted here -->
and now we can build our parser.<anchor xml:id="ALAMEND"/>
</p>
This would generate the same index entries as the previous example, but the reference would be to the whole span of text between the location of the index element and the location of the element identified by the code ALAMEND, rather than a single point, and thus might (for example) include a sequence of page numbers.
Although the position of the index element in the text provides the target location that will be specified in the generated index entry, no part of the text itself is used to construct that entry. Index terms appearing in the entry come solely from the content of term elements, which consequently may have to repeat words or phrases from the text proper. This need not be done verbatim, thus giving scope for normalization of spelling (as in the example above) or other modifications which may assist generation of an index in a desired form or sequence.
<!-- definition of the glyph here -->
</char>
<p>The Artist formerly known as Prince <index>
<term sortKey="Prince">
<g ref="#PrinceGlyph"/>
</term>
</index>...</p>
<index indexName="INDEX-PERSONS">
<term>Ashford, John</term>
</index> was,
coincidentally, born in
<index indexName="INDEX-PLACES">
<term>Ashford
(Kent)</term>
</index>Ashford...</p>
<index>
<term>lemmatization</term>
<index>
<term>arabic</term>
</index>
</index>
...</p>
<term>Women</term>
<index>
<term>their vices</term>
<index>
<term>commended</term>
</index>
</index>
</index>
When processing such index elements, the duplication required to make the structure explicit will normally be removed, so as to produce entries like those quoted above. However, this is not required by the encoding recommended here.
<div type="appendix">
<head>Bibliography</head>
<listBibl>
<bibl> ... </bibl>
</listBibl>
</div>
<divGen n="Index Nominum"
type="INDEX-NAMES"/>
<divGen n="Index Loci" type="INDEX-PLACES"/>
</back>
<divGen n="A1" type="INDEX-NAMES">
<head>An Index of Names</head>
</divGen>
</back>
If a processing instruction is used, then these parameters for the generated index may be supplied in some other way.
One final feature frequently found in manually-created indexes to printed works cannot readily be encoded by the means provided here, namely cross-references internal to the index term listing. For example, if all references to the TEI in a text have been indexed using the index term Text Encoding Initiative, it may also be helpful to include an entry under the term TEI containing some text such as ‘see Text Encoding Initiative’. Such internal cross-references must be added as part of the post-editing phase for an auto-generated index.
TEI: Graphics and Other Non-textual Components⚓︎3.10 Graphics and Other Non-textual Components
Graphics, such as illustrations or diagrams, appear in many different kinds of text, and often with different purposes. Audio or video clips may also appear. In some cases, such media form an integral part of a text (indeed, some texts—comic books for example—may be almost entirely graphic); in others the graphic or video may be a kind of optional extra. In some cases, the text may be incomprehensible unless the media is included; in others, the presence of the media adds little to the sense of the work. It will therefore be a matter of encoding policy as to whether or how media found in a source text are transferred to a new encoded version of the same. In documents which are ‘born digital’, media such as graphics and other non-textual components may be particularly salient, but their inclusion in an archival form of the document concerned remains an editorial decision.
Considered as structural components, media may be anchored to a particular point in the text, or they may float either completely freely, or within some defined scope, such as a chapter or section. Time-based media such as audio or video may need to be synchronized with particular parts of a written text. Media of all kinds often contain associated text such as a heading or label. These Guidelines provide the following different elements to indicate their appearance within a text:
- figure (figure) groups elements representing or containing graphic information such as an illustration, formula, or figure.
- media indicates the location of any form of external media such as an audio or video clip etc.
- graphic (graphic) indicates the location of a graphic or illustration, either forming part of a text, or providing an image of it.
- binaryObject provides encoded binary data representing an inline graphic, audio, video or other object.
Media files may be encoded in a number of different ways:
- in some non-XML or binary format such as PNG, JPEG, MP3, MP4 etc.
- in an XML format such as SVG
- in a TEI XML format such as the notation for graphs and trees described in 20 Graphs, Networks, and Trees
In the last two cases, the presence of the graphic will be indicated by an appropriate XML element, drawn from the SVG namespace in the second case, and its content will fully define the graphic to be produced. In the first case, however, one of the elements graphic or media is used to mark the presence of the graphic only and the visual content itself is stored outside the XML document at a location referenced by means of a url attribute. This attribute is provided by membership of these elements in the att.resourced class. Alternatively, if it is small, the media information may be embedded directly within the document using some suitable binary format such as Base64; in this case the binaryObject element may be used to contain it.
The elements graphic, media, and binaryObject are made available as members of the class model.graphicLike when this module is included in a schema. These elements are also members of the class att.media, from which they inherit the following attributes:
- att.internetMedia provides attributes for specifying the type of a computer resource using a standard
taxonomy.
mimeType (MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type. - att.media provides attributes for specifying display and related properties of external media.
width Where the media are displayed, indicates the display width. height Where the media are displayed, indicates the display height. scale Where the media are displayed, indicates a scale factor to be applied when generating the desired display size.
through my first, second, third, and
fourth volumes. — In the fifth volume
I have been very good, — the precise
line I have described in it being this:
<graphic url="zigzag2.png"
mimeType="image/png"/>
By which it appears, that except at the
curve, marked A. where I took a trip
to Navarre, — and the indented curve B.
which is the short airing when I was
there with the Lady Baussiere and her
page, — I have not taken the least frisk
...</p>
<graphic url="http://www.iath.virginia.edu/gants/Ornaments/Heads/hp-ral02.gif"/>
</head>
The figure element discussed in 15.4 Specific Elements for Graphic Images provides additional capabilities, for example the ability to combine a number of images into a hierarchically organized structure or a block of images. The figure element carries a type attribute, which can be used to distinguish different kinds of graphic component within a single work, for example, maps as opposed to illustrations. It also provides the ability to associate an image with additional information such as a heading or a description.
TEI: Reference Systems⚓︎3.11 Reference Systems
By reference system we mean the system by which names or references are associated with particular passages of a text (e.g. Ps. 23:3 for the third verse of Psalm 23 or Amores 2.10.7 for Ovid's Amores, book 2, poem 10, line 7). Such names make it possible to mark a place within a text and enable other readers to find it again. A reference system may be based on structural units (chapters, paragraphs, sentences; stanza and verse), typographic units (page and line numbers), or divisions created specifically for reference purposes (chapter and verse in Biblical texts). Where one exists, the traditional reference system for a text should be preserved in an electronic transcript of it, if only to make it easier to compare electronic and non-electronic versions of the text.
Reference systems may be recorded in TEI-encoded texts in any of the following ways:
- where a reference system exists, and is based on the same logical structure as that of the text's markup, the reference for a passage may be recorded as the value of the global xml:id or n attribute on an appropriate tag, or may be constructed by combining attribute values from several levels of tags, as described below in section 3.11.1 Using the xml:id and n Attributes.
- where there is no pre-existing reference system, the global xml:id or n attributes may be used to construct one (e.g. collections and corpora created in electronic form), as described below in section 3.11.2 Creating New Reference Systems.
- where a reference system exists which is not based on the same logical structure as that of the text's markup (for example, one based on the page and line numbers of particular editions of the text rather than on the structural divisions of it), any of a variety of methods for encoding the logical structure representing the reference system may be employed, as described in chapter 21 Non-hierarchical Structures.
- where a reference system exists which does not correspond to any particular logical structure, or where the logical structure concerned is of no interest to the encoder except as a means of supporting the referencing system, then references may be encoded by means of milestone elements, which simply mark points in the text at which values in the reference system change, as described below in section 3.11.3 Milestone Elements.
The specific method used to record traditional or new reference systems for a text should be declared in the TEI header, as further described in section 3.11.4 Declaring Reference Systems and in section 17.2.5 Canonical References.
When a text has no pre-existing associated reference system of any kind, these Guidelines recommend as a minimum that at least the page boundaries of the source text be marked using one of the methods outlined in this section. Retaining page breaks in the markup is also recommended for texts which have a detailed reference system of their own. Line breaks in prose texts may be, but need not be, tagged.18
TEI: Using the xml:id and n Attributes⚓︎3.11.1 Using the xml:id and n Attributes
When traditional reference schemes represent a hierarchical structuring of the text which mirrors that of the marked-up document, the n attribute defined for all elements may be used to indicate the traditional identifier of the relevant structural units. The n attribute may also be used to record the numbering of sections or list items in the copy text if the copy-text numbering is important for some reason, for example because the numbers are out of sequence.
<div2 n="1" type="book">
<!-- ... -->
</div2>
<div2 n="2" type="book">
<div3 n="1" type="poem">
<!-- ... -->
</div3>
<div3 n="2" type="poem">
<!-- ... -->
</div3>
<!-- ... -->
<div3 n="10" type="poem">
<l n="1"> ... </l>
<l n="2"> ... </l>
<!-- ... -->
<l n="7"> ... </l>
</div3>
<!-- ... -->
</div2>
<!-- ... -->
</div1>
<div2 n="Amores 1" type="book">
<!-- ... -->
</div2>
<div2 n="Amores 2" type="book">
<div3 n="Amores 2.1" type="poem">
<!-- ... -->
</div3>
<!-- ... -->
<div3 n="Amores 2.10" type="poem">
<!-- ... -->
<l n="Amores 2.10.7"> ... </l>
<!-- ... -->
</div3>
<!-- ... -->
</div2>
<!-- ... -->
</div1>
<div2 xml:id="am.1" type="book">
<!-- ... -->
</div2>
<div2 xml:id="am.2" type="book">
<div3 xml:id="am.2.1" type="poem">
<!-- ... -->
</div3>
<!-- ... -->
<div3 xml:id="am.2.10" type="poem">
<!-- ... -->
<l xml:id="am.2.10.7"> ... </l>
<!-- ... -->
</div3>
<!-- ... -->
</div2>
<!-- ... -->
</div1>
To document the usage and to allow automatic processing of these standard references, it is recommended that the TEI header be used to declare whether standard references are recorded in the n or xml:id attributes and which elements may carry standard references or portions of them. For examples of declarations for the reference systems just shown, see section 3.11.4 Declaring Reference Systems.
Using the n attribute one can specify only a single standard referencing system, a limitation not without problems, since some editions may define structural units differently and thus create alternative reference systems. For example, another edition of the Amores considers poem 10 a continuation of poem 9, and therefore would specify the same line as Amores 2.9.31. In order to record both of these reference systems one could employ any of a variety of methods discussed in chapter 21 Non-hierarchical Structures.
TEI: Creating New Reference Systems⚓︎3.11.2 Creating New Reference Systems
If a text has no canonical reference system of its own, a new custom reference system may be used.
The global attributes n and xml:id may be used to assign reference identifiers to segments of the text. Identifiers specified by either attribute apply to the entire element for which they are given. xml:id attributes must be unique within a single document, and xml:id values must begin with a letter. No such restrictions are made on the values of n attributes.
Determining a referencing system for a TEI encoding depends on many factors that may either be derived from textual structure, or influenced by extra-textual contingencies such as project and file management concerns. It is important, therefore, that the attribute used, the elements which can bear standard reference identifiers, and the method for constructing standard reference identifiers, should all be declared in the header as described in section 2.3.6 The Reference System Declaration.
The Guidelines do not recommend one specific method for creating new referencing systems; however, the rest of this section lists some possibly useful strategies.
TEI: Referencing system derived from markup⚓︎3.11.2.1 Referencing system derived from markup
A new referencing system may be derived from the structure of the electronic text, specifically from the markup of the text. As with any reference system intended for long-term use, it is important to see the reference as an established, unchanging point in the text. Should the text be revised or rearranged, the reference-system identifiers associated with any section of text must stay with that section of text, even if it means the reference numbers fall out of sequence. (A new reference system may always be created beside the old one if out-of-sequence numbers must be avoided.)
A convenient method of mechanically generating unique values for xml:id or n attributes based on the structure of the document is to construct, for each element,
a domain-style address comprising a series of components separated by full stops, with one component for
each level of the document hierarchy. Two methods may be used. In the typed path form of identifier, each component in the identifier takes the form of an element
identifier, a hyphen, and a number, for example p-2
. The element name specifies what type of element is to be sought, and the number
specifies which occurrence of that element type is to be selected. (The hyphen and
number may be omitted if there is only one element of the given type.) In the untyped path form of identifier, each component consists of a number, indicating which element
in the sequence of nodes at each level is to be selected. To make the resulting identifier
a valid XML identifier, it may need to be prefixed with an unchanging alphabetic letter.
Identifiers generated with these methods should use the text element as their starting point, rather than the TEI or body elements. The TEI element may be taken as a starting point only if identifiers need to be generated for the teiHeader, which is not usually the case; using the body element as a root would prevent assignment of identifiers for the front and back matter. The component corresponding to the root element can be omitted from identifiers, if no confusion will result. In collections and corpora, the component corresponding to the root may be replaced by the unique identifier assigned to the text or sample.
<front xml:id="Front" n="AB.1">
<div xml:id="Front.div-1" n="AB.1.1">
<p> ... </p>
</div>
<titlePage xml:id="Front.titlePage"
n="AB.1.2">
<titlePart> ... </titlePart>
</titlePage>
<div xml:id="Front.div-2" n="AB.1.3">
<p> ... </p>
</div>
</front>
<body xml:id="Body" n="AB.2">
<p xml:id="Body.p-1" n="AB.2.1"> ... </p>
<p xml:id="Body.p-2" n="AB.2.2"> ... </p>
<div xml:id="Body.div-1" n="AB.2.3">
<head xml:id="Body.div-1.head"
n="AB.2.3.1"> ... </head>
<p xml:id="Body.div-1.p-1" n="AB.2.3.2"> ... </p>
<p xml:id="Body.div-1.p-2" n="AB.2.3.3"> ... </p>
</div>
<div xml:id="Body.div-2" n="AB.2.4">
<head xml:id="Body.div-2.head"
n="AB.2.4.1"> ... </head>
<p xml:id="Body.div-2.p-1" n="AB.2.4.2"> ... </p>
<p xml:id="Body.div-2.p-2" n="AB.2.4.3"> ... </p>
</div>
</body>
</text>
If the xml:id attribute is used to record the reference identifiers generated, each value should record the entire path. If the n attribute is used, each value may record either the entire path or only the subpath from the parent element. The attribute used, the elements which can bear standard reference identifiers, and the method for constructing standard reference identifiers, should all be declared in the header as described in section 2.3.6 The Reference System Declaration.
TEI: Referencing systems based on project conventions⚓︎3.11.2.2 Referencing systems based on project conventions
A reference system may be based on an agreed project-specific convention for xml:id attributes. Every convention will have strengths and weaknesses and it is left to encoders to make a decision that enables them to locate information in their TEI document.
Here are some examples of referencing systems that have been used in TEI project:
- Title-based identifiers: identifiers constructed with a number of characters from the main document title, followed by an incremental number. E.g. HOL001, HOL002, etc. using a fixed number of digits; or without fixed digits: HOL1, HOL2, etc.
- Based on markup, with prefix: identifiers constructed on the markup itself, as described in the previous section. To facilitate uniqueness in a corpus, each identifier may be prefixed with the identifier of the root TEI element. E.g. RootID-Body-p-1.
- Opaque identifiers: computed identifiers using either a randomized algorithm or a universally unique identifier (UUID) algorithm. Note that XSLT's function generate-id() only guarantees identifier unique to the document being processed.
XML well-formedness requires only that xml:id attributes be unique within a single document. However, it is also worth keeping in mind that for operating with referencing systems across a corpus of TEI files it is helpful (or even necessary in some circumstances) to have unique identifiers across the whole corpus.
Values of xml:id may be either populated computationally or manually. In the latter case, it is advisable to put measures in place to avoid human error. Custom data types and Schematron rules may be defined in a customization ODD, and a check digit may be added to prevent unwanted changes. 19
TEI: Milestone Elements⚓︎3.11.3 Milestone Elements
Where the desired reference system does not correspond to any particular structural hierarchy, or the document combines multiple structural hierarchies (as further discussed in 21 Non-hierarchical Structures), simpler though less expressive methods may be necessary. In such cases the simplest solution may be just to mark up changes in the reference system where they occur, by using one or more of the following milestone elements:
- milestone (milestone) marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element.
- gb (gathering beginning) marks the beginning of a new gathering or quire in a transcribed codex.
- pb (page beginning) marks the beginning of a new page in a paginated document.
- lb (line beginning) marks the beginning of a new (typographic) line in some edition or version of a text.
- cb (column beginning) marks the beginning of a new column of a text on a multi-column page.
These elements simply mark the points in a text at which some category in a reference system changes. They have no content but subdivide the text into regions, rather in the same way as milestones mark points along a road, thus implicitly dividing it into segments. The elements gb, pb, cb, and lb are specialized types of milestone, marking gathering, page, column, and line boundaries respectively. The global n attribute is used in each case to provide a value for the particular unit associated with this milestone (for example, the page or line number). Since it is not structural, validation of a reference system based on milestones cannot readily be checked by an XML parser, so it will be the responsibility of the encoder or the application software to ensure that they are given in the correct order.
<body>
<milestone unit="part" n="1"/>
<div1 n="1" type="chapter">
<p>
<!-- ... -->
</p>
</div1>
<div1 n="2" type="chapter">
<p>
<!-- ... -->
</p>
</div1>
<div1 n="3" type="chapter">
<p>
<!-- ... -->
</p>
<milestone unit="part" n="2"/>
<p>
<!-- ... -->
</p>
</div1>
</body>
</text>
<body>
<div1 n="1" type="part">
<milestone unit="chapter" n="1"/>
<p>
<!-- ... -->
</p>
<milestone unit="chapter" n="2"/>
<p>
<!-- ... -->
</p>
<milestone unit="chapter" n="3"/>
<p>
<!-- ... -->
</p>
</div1>
<div1 n="2" type="part">
<p>
<!-- ... -->
</p>
<milestone unit="chapter" n="4"/>
<p>
<!-- ... -->
</p>
</div1>
</body>
</text>
<milestone unit="speaker" n="Man"/>
<l>Oh what is this I cannot see</l>
<l>With icy hands gets a hold on me</l>
<milestone unit="speaker" n="Death"/>
<l>Oh I am Death, none can excel</l>
<l>I open the doors of heaven and hell</l>
</lg>
Milestone tags also make it possible to record the reference systems used in a number of different editions of the same work. The reference system of any one edition can be recreated from a text in which all are marked by simply ignoring all elements that do not specify that edition on their ed attribute.
<milestone ed="E2" unit="work"/>
<milestone ed="E1" unit="book"/>
<milestone ed="E1" unit="poem"/>
<milestone ed="E2" unit="poem"/>
<milestone ed="E2" unit="book"/>
<milestone ed="E1" unit="poem"/>
<milestone ed="E2" unit="poem"/>
In this case no n value is specified, since the numbers rise predictably and the application can keep a count from the start of the document, if desired.
<milestone ed="E1" unit="book" n="1"/>
<milestone ed="E1" unit="poem" n="1"/>
<milestone ed="E1" unit="poem" n="2"/>
<milestone ed="E1" unit="book" n="2"/>
<milestone ed="E1" unit="book" n="1"/>
<milestone ed="E1" unit="poem" n="1.1"/>
<milestone ed="E1" unit="poem" n="1.2"/>
<milestone ed="E1" unit="book" n="2"/>
When using milestone tags, line numbers may be supplied for every line or only periodically (every fifth, every tenth line). The latter may be simpler; the former is more reliable.
The style of numbering used in the values of n is unrestricted: for the example above, I.i, I.ii, and I.iii could have been used equally well if preferred. The special value unnumbered should be reserved for marking sections of text which fall outside the normal numbering system (e.g. chapter heads, poem numbers, titles, or speaker attributions in a verse drama).
By default, there are no constraints on the values supplied for the ed attribute. If it is felt appropriate to enforce such a restriction, the techniques described in 24.3 Customization may be used, for example to specify that the attribute must specify one of a predefined set of values.
See below, section 3.11.4 Declaring Reference Systems, for examples of declarations for the reference systems just shown.
Milestone elements may be used to mark any kind of shift in the properties associated with a piece of text, whether or not would normally be considered a reference system. For example, they may be used to mark changes in narrative voice in a prose text, or changes of speaker in a dramatic text, where these are not marked using structural elements such as sp, perhaps in order to avoid a clash of hierarchies.
TEI: Declaring Reference Systems⚓︎3.11.4 Declaring Reference Systems
Whatever kind of reference system is used in an electronic text, it is recommended that the TEI header contain a description of its construction in the refsDecl element described in section 2.3.6 The Reference System Declaration. As described there, the declaration may consist either of a formal declaration using the cRefPattern or citeStructure elements, or an informal description in prose. One of the former is recommended because unlike prose they can be processed by software.
<refsDecl>
<cRefPattern matchPattern="([^ ]+) ([0-9]+)\.([0-9]+)\.([0-9]+)"
replacementPattern="#xpath(//div1[@n='$1']/div2[@n='$2']/div3[@n='$3']/l[@n='$4']">
<p>A canonical reference is assembled with
<list>
<item>the name of the <label>work</label>: the
<att>n</att> of a <gi>div1</gi>,</item>
<item>a space,</item>
<item>the number of the <label>book</label>: the
<att>n</att> of a child <gi>div2</gi>,</item>
<item>a full stop</item>
<item>the number of the <label>poem</label>: the
<att>n</att> of a child <gi>div3</gi>,</item>
<item>the line number: the <att>n</att> value of a
child <gi>l</gi>
</item>
</list>
</p>
</cRefPattern>
<cRefPattern matchPattern="([^ ]+) ([0-9]+)\.([0-9]+)"
replacementPattern="#xpath(//div1[@n='$1']/div2[@n='$2']/div3[@n='$3']">
<p>Same as above, but without the last component (full
stop followed by the <gi>l</gi>'s <att>n</att>.</p>
</cRefPattern>
<cRefPattern matchPattern="([^ ]+) ([0-9]+)"
replacementPattern="#xpath(//div1[@n='$1']/div2[@n='$2']">
<p>Same as above, but without the poem component (full
stop followed by the <gi>div3</gi>'s <att>n</att>.</p>
</cRefPattern>
</refsDecl>
</encodingDesc>
<cRefPattern matchPattern="([^ ]+ [0-9]+\.[0-9]+\.[0-9]+)"
replacementPattern="#xpath(//l[@n='$1')"/>
</refsDecl>
<cRefPattern matchPattern="([^ ]+ [0-9]+\.[0-9]+\.[0-9]+)"
replacementPattern="#xpath(//l[@n='$1')"/>
<cRefPattern matchPattern="([^ ]+ [0-9]+\.[0-9]+)"
replacementPattern="#xpath(//div2[@n='$1')"/>
</refsDecl>
<citeStructure unit="work" match="//div1"
use="@n">
<citeStructure unit="book" match="div2"
use="@n" delim=" ">
<citeStructure unit="poem" match="div3"
use="@n" delim=".">
<citeStructure unit="line" match="l"
use="@n" delim="."/>
</citeStructure>
</citeStructure>
</citeStructure>
</refsDecl>
<citeStructure unit="work" match="//div1"
use="@n">
<citeData property="http://purl.org/dc/terms/title"
use="head"/>
<citeStructure unit="book" match="div2"
use="@n" delim=" ">
<citeData property="http://purl.org/dc/terms/title"
use="head"/>
<citeStructure unit="poem" match="div3"
use="@n" delim=".">
<citeData property="http://purl.org/dc/terms/title"
use="head"/>
<citeStructure unit="line" match="l"
use="@n" delim="."/>
</citeStructure>
</citeStructure>
</citeStructure>
</refsDecl>
title
. For convenience, property URIs may be abbreviated using prefixDef.<p>Standard references to work, book, poem, and line may be
constructed from the milestone tags in the text.</p>
</refsDecl>
<refState ed="E1" unit="work" delim=" "/>
<refState ed="E1" unit="book" delim="."/>
<refState ed="E1" unit="poem" delim=":"/>
<refState ed="E1" unit="line"/>
</refsDecl>
TEI: Bibliographic Citations and References⚓︎3.12 Bibliographic Citations and References
Bibliographic references (that is, full descriptions of bibliographic items such as books, articles, films, broadcasts, songs, etc.) or pointers to them may appear at various places in a TEI text. They are required at several points within the TEI header's source description, as discussed in section 2.2.7 The Source Description; they may also appear within the body of a text, either singly (for example within a footnote), or collected together in a list as a distinct part of a text; detailed bibliographic descriptions of manuscript or other source materials may also be required. These Guidelines propose a number of specialized elements to encode such descriptions, which together constitute the model.biblLike class.
- model.biblLike groups elements containing a bibliographic description.
bibl (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. biblFull (fully-structured bibliographic citation) contains a fully-structured bibliographic citation, in which all components of the TEI file description are present. biblStruct (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order. listBibl (citation list) contains a list of bibliographic citations of any kind. msDesc (manuscript description) contains a description of a single identifiable manuscript or other text-bearing object such as an early printed book.
Lists of such elements may also be encoded using the following element:
- listBibl (citation list) contains a list of bibliographic citations of any kind.
In printed texts, the individual constituents of a bibliographic reference are conventionally marked off from each other and from the flow of text by such features as bracketing, italics, special punctuation conventions, underlining, etc. In electronic texts, such distinctions are also important, whether in order to produce acceptably formatted output or to facilitate intelligent retrieval processing,20 quite apart from the need to distinguish the reference itself as a textual object with particular linguistic properties.
It should be emphasized that for references as for other textual features, the primary or sole consideration is not how the text should be formatted when it is printed or displayed. The distinctions permitted by the scheme outlined here may not necessarily be all that particular formatters or bibliographic styles require, although they should prove adequate to the needs of many such commonly used software systems.21 The features distinguished and described below (in section 3.12.2 Components of Bibliographic References) constitute a set which has been useful for a wide range of bibliographic purposes and in many applications, and which moreover corresponds to a great extent with existing bibliographic and library cataloguing practice. For a fuller account of that practice as applied to electronic texts see section 2.2.7 The Source Description; for a brief mention of related library standards see section 2.8 Note for Library Cataloguers.
The most commonly used elements in the model.biblLike class are biblStruct and bibl. biblStruct will usually be easier to process mechanically than bibl because its structure is more constrained and predictable. It is suited to situations in which the objective is to represent bibliographic information for machine processing directly by other systems or after conversion to some other bibliographic markup formats such as BibTeXML or MODS. Punctuation delimiting the components of a print citation is not permitted directly within a biblStruct element; instead, the presence and order of child elements must be used to reconstruct the punctuation required by a particular style.
By contrast, bibl allows for considerable flexibility in that it can include both delimiting punctuation and unmarked-up text; and its constituents can also be ordered in any way. This makes it suitable for marking up bibliographies in existing documents, where it is considered important to preserve the form of references in the original document, while also distinguishing important pieces of information such as authors, dates, publishers, and so on. bibl may also be useful when encoding ‘born digital’ documents which require use of a specific style guide when rendering the content; its flexibility makes it easier to provide all the information for a reference in the exact sequence required by the target rendering, including any necessary punctuation and linking words, rather than using an XSLT stylesheet or similar to reorder and punctuate the data.
The third element in the model.biblLike class, biblFull, has a content model based on the fileDesc element of the TEI header. Both are based on the International Standard for Bibliographic Description (ISBD), which forms the basis of several national standards for bibliographic citations. The order of child elements in both biblFull and fileDesc corresponds to the order of bibliographic description ‘areas’ in ISBD with two minor exceptions. First, the extent element, corresponding to the physical description area in ISBD, appears just after the publication, production, distribution, etc. area in ISBD, not before it as in TEI. Second, biblFull and fileDesc use the child element publicationStmt to cover not only the publication, production, distribution, etc. area but also the resource identifier and terms of availability area associated with that publication. Despite these inconsistencies, users encoding citations and attempting to format them according to a standard that closely adheres to ISBD may find that biblFull, used with its child elements and without delimiting punctuation, provides an appropriate granularity of encoding with elements that can easily be rendered for the reader. However, it is important to note that some ISBD-derived citation formats (such as ANSI/NISO Z39.29 and ГОСТ 7.1) are not entirely conformant to ISBD either, since they may begin with a statement of authorship that does not map to the ISBD statement of responsibility.
TEI: Methods of Encoding Bibliographic References and Lists of References⚓︎3.12.1 Methods of Encoding Bibliographic References and Lists of References
The members of the model.biblLike class all share a number of possible component sub-elements. For the bibl and biblStruct elements, exactly the same sub-elements are concerned, and they are described together in section 3.12.2 Components of Bibliographic References; for the biblFull element, the sub-elements concerned are fully described in section 2.2 The File Description.
was <bibl>Tufte's <title>Envisioning
Information</title>
</bibl>, although he may
never have actually read it.</p>
was Tufte's <title>Envisioning Information</title>,
although he may never have actually read it.</p>
<monogr>
<author>
<persName>
<forename>Edward</forename>
<forename full="init">R.</forename>
<surname>Tufte</surname>
</persName>
<idno type="scopus">6506403994</idno>
<idno type="lcaf">https://id.loc.gov/authorities/names/n50012763.html</idno>
</author>
<title level="m">Envisioning Information</title>
<imprint>
<pubPlace>Cheshire, Conn.</pubPlace>
<publisher>Graphics Press</publisher>
<date when="1990"/>
</imprint>
</monogr>
</biblStruct>
<titleStmt>
<title>Envisioning Information</title>
<author>Tufte, Edward R[olf]</author>
</titleStmt>
<extent>126 pp.</extent>
<publicationStmt>
<publisher>Graphics Press</publisher>
<pubPlace>Cheshire, Conn. USA</pubPlace>
<date>1990</date>
</publicationStmt>
</biblFull>
to="1023">1013–23</biblScope> </monogr> <note>Apparently a draft of section 4 of <title level="m">Literary Machines</title>.</note> </biblStruct> <bibl xml:id="NELSON88"><author><persName><forename>Ted</forename> <surname>Nelson</surname></persName></author>: <title level="u">Literary Machines</title> (privately published, <date when="1987">1987</date>).</bibl> <bibl xml:id="BAXTER88"><author><persName><surname>Baxter</surname>, <forename>Glen</forename></persName></author>: <title level="m">Glen Baxter His Life: the years of struggle</title> <pubPlace>London</pubPlace>: <publisher>Thames and Hudson</publisher>, <date when="1988">1988</date>.</bibl> </listBibl>
<head>Bibliography</head>
<item>
<bibl xml:id="NEL80">
<author>Nelson, T. H.</author>
<title level="a">Replacing the printed word:
a complete literary system</title>.
<title level="m">Information Processing '80:
Proceedings of the IFIPS Congress, October 1980</title>.
<editor>Simon H. Lavington</editor>
<publisher>North-Holland</publisher>:
<pubPlace>Amsterdam</pubPlace>,
<date>1980</date>.
<biblScope>pp 1013–23
</biblScope>
<note>Apparently a draft of section 4 of
<title>Literary Machines</title>.</note>
</bibl>
</item>
<item>
<bibl xml:id="NEL88">Ted Nelson: <title>Literary Machines</title>
(privately published, 1987)</bibl>
</item>
<item>
<bibl xml:id="BAX88">
<author>Baxter, Glen</author>
<title>Glen Baxter His Life: the years of struggle</title>
London: Thames and Hudson, 1988.
</bibl>
</item>
</list>
TEI: Components of Bibliographic References⚓︎3.12.2 Components of Bibliographic References
This section discusses commonly occurring components of bibliographic references and elements used for encoding them. They fall into four groups:
- elements for grouping components of the analytic, monographic, and series levels in a structured bibliographic reference
- titles of various kinds, and statements of intellectual responsibility (authorship, etc.)
- information relating to the publication, pagination, etc. of an item (most of these constitute the default members of the model.biblPart class)
- annotation, commentary, and further detail
The following sections describe the elements which may be used to represent such information within a bibl or biblStruct element. Within the former, elements from the model.biblPart class, other phrase-level elements, and plain text may be combined without other constraint; within the latter, such of these elements as exist for a given reference must be distinguished, and must also be presented in a specific order, discussed further below (section 3.12.2.9 Order of Components within References).
TEI: Analytic, Monographic, and Series Levels⚓︎3.12.2.1 Analytic, Monographic, and Series Levels
In common library practice a clear distinction is made between an individual item within a larger collection and a free-standing book, journal, or collection. Similarly a book in a series is distinguished sharply from the series within which it appears. An article forming part of a collection which itself appears in a series thus has a bibliographic description with three quite distinct levels of information:
- the analytic level, giving the title, author, etc., of the article;
- the monographic level, giving the title, editor, etc., of the collection;
- the series level, giving the title of the series, possibly the names of its editors, etc., and the number of the volume within that series.
In the same way, an article in a journal requires at least two levels of information: the analytic level describing the article itself, and the monographic level describing the journal.
A different identifying number may be supplied for any of these three items, that is, for the analytic item, the monographic item, or the series.
subtype="magazine_article" xml:id="beaupaire_1911">
<author>
<name>
<surname>Beaupaire</surname>
(<forename>Edmond</forename>)</name>
</author>,
<title level="a">A propos de la rue de la Femme-sans-Tête</title>,
<bibl type="monogr">
<title level="j">La Cité</title>,
<date when="1911-01">janvier 1911</date>, pp. <biblScope unit="page" from="5" to="17">5-17</biblScope>.
</bibl>
</bibl>
Within biblStruct, the levels are distinguished by the use of the following distinct elements:
- analytic (analytic level) contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication.
- monogr (monographic level) contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item (i.e. as a separate physical object).
- series (series information) contains information about the series in which a book or other bibliographic item has appeared.
For purposes of TEI encoding, journals and anthologies are both treated as monographs; a journal title should thus be tagged as a <title level="j"> element within a monogr element. Individual articles in the journal or collected texts should be treated at the ‘analytic’ level. When an article has been printed in more than one journal or collection, the bibliographic reference may have more than one monogr element, each possibly followed by one or more series elements. A series element always relates to the most recently preceding monogr element. (Whether reprints of an article are treated in the same bibliographic reference or a separate one varies among different styles. Library lists typically use a different entry for each publication, while academic footnoting practice typically treats all publications of the same article in a single entry.)
The biblScope element is used to supply further information about the location of some part of a bibliographic reference. It specifies where to find the component in which it appears within the immediately preceding component of a different level.
<analytic>
<author>Albert Schachter</author>
<title level="a">Iolaos</title>
</analytic>
<monogr>
<title level="m">Herakles to Poseidon</title>
<imprint>
<date>1986</date>
</imprint>
<biblScope unit="page">64-70</biblScope>
</monogr>
<monogr>
<title level="m">Cults of Boiotia</title>
<imprint>
<pubPlace>London</pubPlace>
</imprint>
<extent>4 vols.</extent>
<biblScope unit="part">2</biblScope>
</monogr>
<series>
<title level="s">Bulletin of the Institute of Classical Studies
Supplements</title>
<biblScope unit="volume">38</biblScope>
</series>
</biblStruct>
<analytic>
<author>
<persName>
<surname>Thaller</surname>
<forename>Manfred</forename>
</persName>
</author>
<title level="a">A Draft Proposal for a Standard for the
Coding of Machine Readable Sources</title>
</analytic>
<monogr>
<title level="j">Historical Social Research</title>
<imprint>
<date when="1986-10">October 1986</date>
</imprint>
<biblScope unit="volume">40</biblScope>
<biblScope unit="page" from="3" to="46">3-46</biblScope>
</monogr>
<monogr>
<title level="m">Modelling Historical Data:
Towards a Standard for Encoding and
Exchanging Machine-Readable Texts</title>
<editor>
<persName>
<forename>Daniel</forename>
<forename full="init">I.</forename>
<surname>Greenstein</surname>
</persName>
</editor>
<imprint xml:lang="de">
<pubPlace>St. Katharinen</pubPlace>
<publisher>Max-Planck-Institut für Geschichte
In Kommission bei
Scripta Mercaturae Verlag</publisher>
<date when="1991"/>
</imprint>
</monogr>
<series xml:lang="de">
<title level="s">Halbgraue Reihe
zur Historischen Fachinformatik</title>
<respStmt>
<resp>Herausgegeben von</resp>
<name type="person">Manfred Thaller</name>
<name type="org">Max-Planck-Institut für Geschichte</name>
</respStmt>
<title level="s">Serie A: Historische Quellenkunden</title>
<biblScope unit="volume">11</biblScope>
</series>
</biblStruct>
The practice of analytic vs. monographic citation, as described here, should be distinguished from the practice of including within one citation a reference to another work, which the encoder considers to be related to in some way: see further 3.12.2.7 Related Items below.
If an identifier is available for the analytic item, it should be represented by means of an idno element placed within the analytic element, as in the following example where a DOI (Digital Object identifier) is supplied for the article in question.
<analytic>
<author>
<forename>James</forename>
<forename>H.</forename>
<surname>Coombs</surname>
</author>
<author>
<forename>Allen</forename>
<surname>Renear</surname>
</author>
<author>
<forename>Steven</forename>
<forename>J.</forename>
<surname>DeRose</surname>
</author>
<title level="a">Markup Systems and The Future of Scholarly Text
Processing</title>
<idno type="DOI">10.1145/32206.32209</idno>
<ref target="http://xml.coverpages.org/coombs.html">http://xml.coverpages.org/coombs.html</ref>
</analytic>
<monogr>
<title level="j">Communications of the ACM</title>
<imprint>
<date>1987</date>
</imprint>
<biblScope unit="volume">30</biblScope>
<biblScope unit="issue">11</biblScope>
<biblScope unit="page">933–947</biblScope>
</monogr>
</biblStruct>
Punctuation must not appear between the elements within a structured bibliographic entry encoded with biblStruct or biblFull, unless it is contained within the elements it delimits. When (as in most of the examples in this chapter) entries are encoded without any inter-element punctuation, they can be usually be processed more easily by rendering systems able to output bibliographic references in any of several styles.
<author>
<persName>
<surname>Nelson</surname>,
<forename>T.</forename>
<forename>H.</forename>
</persName>
</author>
<date when="1980">1980</date>.
<title level="a">Replacing the printed word: a complete literary
system</title>. In <title level="m">Information Processing '80: Proceedings of the
IFIPS Congress, October 1980</title>,
ed.
<editor>
<persName>
<forename>Simon</forename>
<forename>H.</forename>
<surname>Lavington</surname>
</persName>
</editor>,
<biblScope unit="page">1013-23</biblScope>.
<pubPlace>Amsterdam</pubPlace>: <publisher>North-
Holland</publisher>. (<note>Apparently a draft of section 4 of
<ref target="#NELSON_88">
<title level="m">Literary
Machines</title>
</ref>.</note>)
</bibl>
TEI: Titles, Authors, and Editors⚓︎3.12.2.2 Titles, Authors, and Editors
Bibliographic references typically include the title of the work being cited and the names of those intellectually responsible for it. For articles in journals or collections, such statements should appear both for the analytic and for the monographic level. The following elements are provided for tagging such elements:
- title (title) contains a title for any kind of work.
- author (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority.
- editor contains a secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.
- respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organizations which have played a role in the production or distribution of a bibliographic work.
- resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organization's role in the production or distribution of a work.
- name (name, proper noun) contains a proper noun or noun phrase.
- meeting contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it.
- sponsor (sponsor) specifies the name of a sponsoring organization or institution.
- funder (funding body) specifies the name of an individual, institution, or organization responsible for the funding of a project or text.
- distributor (distributor) supplies the name of a person or other agency responsible for the distribution of a text.
- principal (principal researcher) supplies the name of the principal researcher responsible for the creation of an electronic text.
The elements author, editor, respStmt, meeting, sponsor, funder, and principal are the default members of the model.respLike class, a subclass of the model.biblPart class to which the constituents of the bibl element belong.
In bibliographic references, all titles should be tagged as such, whether analytic, monographic, or series titles. The single element title is used for all these cases. When it appears directly within an analytic, monogr, or series element, title is interpreted as belonging to the appropriate level. However, it is recommended that the level attribute be used to signal this explicitly.
<analytic>
<author ref="http://id.loc.gov/authorities/names/no2001067434">
<persName>
<forename>Lucy</forename>
<forename>Allen</forename>
<surname>Paton</surname>
</persName>
</author>
<title>Notes on Manuscripts of the
<title level="m" xml:lang="fr">Prophécies de Merlin</title>
</title>
</analytic>
<monogr>
<title level="j">PMLA</title>
<imprint>
<date>1913</date>
</imprint>
<biblScope unit="volume">8</biblScope>
<biblScope unit="page">122</biblScope>
</monogr>
</biblStruct>
In some bibliographic applications, it may prove useful to distinguish main titles from subordinate titles, parallel titles, etc. The type attribute is provided to allow this distinction to be recorded.
<title level="a" type="main">Studies on the physiology of
the hibernating hedgehog, 15</title>
<title level="a" type="sub">Effects of seasonal
and temperature changes on the in vitro glycerol release from
brown adipose tissue</title>
<title level="j">Ann. Acad. Sci. Fenn., Ser. A4</title>
<date>1972</date>
<biblScope unit="volume">187</biblScope>
<biblScope unit="page" to="4">1-4</biblScope>
</bibl>
<title level="m" type="main">The swan lake ballet</title>
= <title level="m" type="parallel"
xml:lang="fr">Le lac des cygnes</title>
: <title level="m" type="sub" xml:lang="fr">grand ballet en 4 actes</title>
: <title level="m" type="sub">op. 20</title>
[Score].
New York: Broude Brothers; [1951] (B.B. 59). vi, 685 p.</bibl>
The elements author and editor have fairly obvious significance for printed books and articles; for other kinds of bibliographic items their proper usage may be less obvious. The author element should be used for the person or agency with primary responsibility for a work's intellectual content, and the element editor for other people or agencies with some responsibility for that content, whether or not they are called ‘editor’. An organization such as a radio or television station is usually accounted ‘author’ of a broadcast, for example, while the author of a government report will usually be the agency which produced it. A translator, illustrator, or compiler, may however be marked by means of the editor element, optionally using the role attribute to specify the nature of their responsibility more exactly.
Many bibliographic and Linked Data applications require disambiguation of author names using unique identifiers. Both the author and editor elements may contain one or more idno elements, to supply such identifiers. Alternatively, if only a single identifier is to be recorded, the key or ref attribute may be used, as further discussed in 3.6.1 Referring Strings.
<author ref="http://viaf.org/viaf/95301405">John Warrack</author>. „Es waren seine letzten Töne!“
In <editor ref="http://viaf.org/viaf/263865979">Joachim Veit</editor>
and <editor ref="http://viaf.org/viaf/268371810">Frank Ziegler</editor> eds. Weber-Studien Bd. 3, Mainz (1996), pp.300–317
</bibl>
For anyone else with responsibility for the work, the respStmt element should be used. The nature of the responsibility is indicated by means of a resp element, and the person, organization, etc. responsible by a name, persName, or orgName element. Strings such as ‘unknown’ may be encoded using the rs element. A respStmt should comprise either at least one of the four naming elements (name, persName, orgName, or rs) followed by one or more resp elements, or at least one resp element followed by one or more of the four naming elements.
Examples of secondary responsibility of this kind include the roles of illustrator, translator, encoder, and annotator. The respStmt element may also be used for editors, if it is desired to record the specific terms in which their role is described.
Examples of author and editor may be found in sections 3.12.1 Methods of Encoding Bibliographic References and Lists of References, and 3.12.2.1 Analytic, Monographic, and Series Levels; wherever author and editor may occur, the respStmt element may also occur. When one of these elements precedes or immediately follows a title, it applies to that title; when it follows an edition element or occurs within an edition statement, it applies to the edition in question.
<author>Lominandze, DG</author>.
<title level="m">Cyclotron waves in plasma</title>.
<respStmt>
<resp>Translated by</resp>
<name>AN. Dellis</name>
</respStmt>;
<respStmt>
<resp>edited by</resp>
<name>SM. Hamberger</name>
</respStmt>.
<edition>1st ed.</edition>
<pubPlace>Oxford</pubPlace>:
<publisher>Pergamon Press</publisher>,
<date>1981</date>.
<extent>206 p.</extent>
<title level="s">International series in natural philosophy</title>.
<note place="inline">Translation of:
<title xml:lang="ru-Latn" level="m">Ciklotronnye volny v
plazme</title>.
<idno type="ISBN">0-08-021680-3</idno>.
</note>
</bibl>
This example retains the original punctuation and editorial conventions of the source (ISO 690:1987) and is therefore encoded using the bibl element.
<monogr xml:lang="de">
<title level="m">Des Minnesangs Frühling</title>
<note place="inline">Mit 1 Faksimile</note>
<edition>36., neugestaltete und erweiterte Auflage</edition>
<respStmt>
<resp>Unter Benutzung der Ausgaben von <name>Karl
Lachmann</name> und <name>Moriz Haupt</name>, <name>Friedrich
Vogt</name> und <name>Carl von Kraus</name> bearbeitet von</resp>
<name>Hugo Moser</name>
<name>Helmut Tervooren</name>
</respStmt>
<imprint>
<pubPlace>Stuttgart</pubPlace>
<publisher>S. Hirzel Verlag</publisher>
<date>1977</date>
</imprint>
<biblScope unit="volume">I Texte</biblScope>
</monogr>
</biblStruct>
<resp>proofreading</resp>
<persName from="1994-02" to="1994-05">Ashley Cross</persName>
<persName from="1994-06" to="1994-10">Loren Noveck</persName>
</respStmt>
<persName>Erica Dillon</persName>
<resp when="2000-08">annotated uncredited citations</resp>
<resp when="2001-03">encoded named entities</resp>
</respStmt>
<monogr>
<title level="m">Proceedings of a workshop on corpus resources</title>
<respStmt>
<resp>Programme Organizer</resp>
<name>Geoffrey Leech</name>
</respStmt>
<meeting>DTI Speech and Language Technology Club meeting, 3-4
January 1990, Wadham College, Oxford</meeting>
<imprint>
<pubPlace>Oxford</pubPlace>
</imprint>
</monogr>
</biblStruct>
TEI: Document Identifiers⚓︎3.12.2.3 Document Identifiers
<monogr>
<author>
<forename>John</forename>
<surname>Downame</surname>
</author>
<title type="short">Foure treatises tending to disswade all Christians from foure no lesse hainous then common sinnes</title>
<idno type="stc2ndEd">7141</idno>
<imprint>
<pubPlace>At London</pubPlace>
<publisher>Imprinted by Felix Kyngston, for William Welby, and are to be sold at his shop in Pauls Church-yard at the signe of the Greyhound</publisher>
<date when="1609">1609</date>
</imprint>
</monogr>
</biblStruct>
However, some bibliographic references actually require identifiers of various types because they do not include a statement of the title and the names of those intellectually responsible for it. The following elements may be used for such purposes:
- orgName (organization name) contains an organizational name.
- idno (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way.
- classCode (classification code) contains the classification code used for this text in some standard classification system.
- date (date) contains a date in any format.
For example, a citation to a patent typically includes a country or organization code (a two-character code identifying a patent authority) and a serial number for the patent (whose structure varies by patent authority). The citation might also contain a kind code (which characterizes a particular publication for the patent and which corresponds to a specific stage in the patent procedure) and the date when the patent was filed with or published by the issuing authority. For bibliographic references to patents, the above elements may be used as follows:
- orgName, within authority, may be used to contain the code of the patent authority. The type attribute may be used to specify the type of patent authority (such as a national patent office or a supra-national patent organization).
- idno may be used to contain the serial number assigned by the corresponding patent authority.
- classCode may be used to contain the kind code of the patent document.
- date may be used to contain the date of the patent document. The type attribute may be used to specify whether this corresponds to the filing date of a patent application or the publication date of a patent publication.
status="publication">
<monogr>
<authority>
<orgName type="national">US</orgName>
</authority>
<idno type="docNumber">6885550</idno>
<imprint>
<classCode scheme="http://www.uspto.gov/">B1</classCode>
<date type="publicationDate"
when="2005-04-26">April 26, 2005</date>
</imprint>
</monogr>
</biblStruct>
TEI: Imprint, Size of a Document, and Reprint Information⚓︎3.12.2.4 Imprint, Size of a Document, and Reprint Information
By imprint is meant all the information relating to the publication of a work: the person or organization by whose authority and in whose name a bibliographic entity such as a book is made public or distributed (whether a commercial publisher or some other organization), the place and the date of publication. It may also include a full address for the publisher or organization. A full bibliographic references will usually also specify the number of pages in a print publication (or equivalent information for non-print materials), and possibly also the specific location of the material being cited within its containing publication. The following elements are provided to hold this information:
- imprint groups information relating to the publication or distribution of a bibliographic item.
- address (address) contains a postal address, for example of a publisher, an organization, or an individual.
- pubPlace (publication place) contains the name of the place where a bibliographic item was published.
- publisher (publisher) provides the name of the organization responsible for the publication or distribution of a bibliographic item.
- date (date) contains a date in any format.
- extent (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units.
- idno (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way.
Members of the model classes model.imprintPart and model.dateLike may appear inside an imprint element in a specific location within a biblStruct, or alternatively, they may appear alongside any other bibliographic component inside a bibl.
- model.imprintPart groups the bibliographic elements which occur inside imprints.
biblScope (scope of bibliographic reference) defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work. distributor (distributor) supplies the name of a person or other agency responsible for the distribution of a text. publisher (publisher) provides the name of the organization responsible for the publication or distribution of a bibliographic item. pubPlace (publication place) contains the name of the place where a bibliographic item was published. - model.dateLike groups elements containing temporal expressions.
date (date) contains a date in any format. time (time) contains a phrase defining a time of day in any format.
For bibliographic purposes, usually only the place (or places) of publication are required, possibly including the name of the country, rather than a full address; the element pubPlace is provided for this purpose. Where however the full postal address is likely to be of importance in identifying or locating the bibliographic item concerned, it may be supplied and tagged using the address element described in section 3.6.2 Addresses. Alternatively, if desired, the rs or name elements described in section 3.6.1 Referring Strings may be used; this involves no claim that the information given is either a full address or the name of a city.
<monogr>
<author>Nicholas, Charles K.</author>
<author>Welsch, Lawrence A.</author>
<title level="m">On the interchangeability of SGML and ODA</title>
<idno type="NIST">NISTIR 4681</idno>
<imprint>
<pubPlace>Gaithersburg, MD</pubPlace>
<publisher>National Institute of Standards and Technology
</publisher>
<date when="1992-01">January 1992</date>
</imprint>
<extent>19 pp.</extent>
</monogr>
</biblStruct>
<monogr>
<author>Hansen, W.</author>
<title level="u">Creation of hierarchic text
with a computer display</title>
<idno type="ANL">ANL-7818</idno>
<note place="inline">Ph.D. dissertation</note>
<imprint>
<publisher>Dept. of Computer Science, Stanford Univ.</publisher>
<pubPlace>Stanford, CA</pubPlace>
<date when="1971-06">June 1971</date>
</imprint>
</monogr>
</biblStruct>
In this second example, the idno element is used to provide the identifier allocated to the thesis by the Argonne National Laboratory. Since it applies to the monographic element, the idno should be provided as a direct child of the monogr element, rather than elsewhere in the biblStruct element.
The specialist elements publisher and distributor are provided to cover the most common roles related to the production and distribution of a bibliographical item, but other roles such as printer and bookseller may also need to be encoded, and respStmt is available inside imprint for this purpose.
<monogr>
<author>Shirley, James</author>
<title type="main">The gentlemen of Venice</title>
<title type="sub">a tragi-comedie presented at the private
house in Salisbury Court by Her Majesties servants</title>
<note place="inline">[Microform]</note>
<imprint>
<pubPlace>London</pubPlace>
<publisher>H. Moseley</publisher>
<date>1655</date>
</imprint>
<extent>78 p.</extent>
</monogr>
<monogr>
<imprint>
<pubPlace>New York</pubPlace>
<publisher>Readex Microprint</publisher>
<date>1953</date>
</imprint>
<extent>1 microprint card, 23 x 15 cm.</extent>
</monogr>
<series>
<title level="s">Three centuries of drama: English, 1642–1700</title>
</series>
</biblStruct>
status="publication">
<monogr>
<authority>
<orgName type="national">EP</orgName>
</authority>
<idno type="docNumber">1558513</idno>
<imprint>
<classCode scheme="http://www.epo.org/">A1</classCode>
<date type="publicationDate"
when="2005-08-03"/>
</imprint>
</monogr>
<monogr>
<imprint>
<classCode scheme="http://www.epo.org/">B1</classCode>
<date type="publicationDate"
when="2009-09-09"/>
</imprint>
</monogr>
</biblStruct>
The above bibliographic reference discloses different publications of the patent EP1558513 during the patenting procedure. The first publication from 3 August 2005 has the kind code "A1" indicating that it is a published patent application comprising the European search report issued after carrying out the search at the European Patent Office, whereas the second publication from 9 September 2009 has the kind code "B1" indicating that it was published after the patent application has been granted.
An alternative way of handling the above situations would be to use the relatedItem element described in section 3.12.2.7 Related Items below.
TEI: Scopes and Ranges in Bibliographic Citations⚓︎3.12.2.5 Scopes and Ranges in Bibliographic Citations
Many bibliographic citations contain data limiting the citation to one or more volumes, issues, or pages, or to a name or number of a subdivison of the host work. These come in two varieties:
- the scope of a bibliographic reference (encoded using biblScope)
- the range of a work cited (encoded using citedRange)
Where it is desired to distinguish different classes of such information (volume number, page number, chapter number, etc.), the unit attribute may be used with any convenient typology (see the element definitions for biblScope and citedRange for some suggested values).
<analytic>
<author>
<persName>
<surname>Wrigley</surname>
<forename full="init">E.</forename>
<forename full="init">A.</forename>
</persName>
</author>
<title level="a">Parish registers and the historian</title>
</analytic>
<monogr>
<author>
<persName>
<surname>Steel</surname>
<forename full="init">D.</forename>
<forename full="init">J.</forename>
</persName>
</author>
<author>
<persName>
<surname>Steel</surname>
<forename full="init">A.</forename>
<forename full="init">E.</forename>
<forename full="init">F.</forename>
</persName>
</author>
<title level="m">General sources of births, marriages and deaths before 1837</title>
<imprint>
<pubPlace>London</pubPlace>
<publisher>Society of Genealogists</publisher>
<date when="1968"/>
</imprint>
<biblScope unit="page" from="155" to="167">155–167</biblScope>
</monogr>
<series>
<title level="s">National index of parish registers</title>
<biblScope unit="volume">1</biblScope>
</series>
</biblStruct>
<analytic>
<author>Boguraev, Branimir</author>
<author>Neff, Mary</author>
<title level="a">Text Representation, Dictionary Structure,
and Lexical Knowledge</title>
</analytic>
<monogr>
<title level="j">Literary & Linguistic Computing</title>
<imprint>
<date>1992</date>
</imprint>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page">110-112</biblScope>
</monogr>
</biblStruct>
<analytic>
<author>Chesnutt, David</author>
<title level="a">Historical Editions in the States</title>
</analytic>
<monogr>
<title level="j">Computers and the Humanities</title>
<imprint>
<date when="1991-12">(December, 1991):</date>
</imprint>
<biblScope>25.6</biblScope>
<biblScope from="377" to="380">377–380</biblScope>
</monogr>
</biblStruct>
Historical Editions in the Statesthat includes a full bibliographic reference would be encoded using biblStruct as follows:
<analytic>
<author>Chesnutt, David</author>
<title level="a">Historical Editions in the States</title>
</analytic>
<monogr>
<title level="j">Computers and the Humanities</title>
<imprint>
<date when="1991-12">(December, 1991):</date>
</imprint>
<biblScope>25.6</biblScope>
<biblScope unit="page" from="377" to="380">377–380</biblScope>
</monogr>
<citedRange>378</citedRange>
</biblStruct>
TEI: Series Information⚓︎3.12.2.6 Series Information
Series information may (in bibl elements) or must (in biblStruct elements) be enclosed in a series element or (in a biblFull element) a seriesStmt element. The title of the series may be tagged <title level="s">, the volume number <biblScope unit="volume">, and responsibility statements for the series (e.g. the name and affiliation of the editor, as in the example in section 3.12.2.1 Analytic, Monographic, and Series Levels) may be tagged editor or respStmt. Any identifier associated with the series itself should be marked using the idno element.
TEI: Related Items⚓︎3.12.2.7 Related Items
In bibliographic parlance, a related item is any bibliographic item which, though related to that being defined, is distinct from it. The distinction between analytic and monographic items made above may be thought of as a special case of this kind of ‘related’ item. More usually however, the term is applied to such items as translations, continuations, different versions, parts, etc.
The element relatedItem is provided as a means of documenting such associated items:
- relatedItem contains or references some other bibliographic item which is related to the present one in some specified manner, for example as a constituent or alternative version of it.
<monogr>
<author>Swinburne, Algernon Charles</author>
<title level="m">Swinburne's <title level="m">Atalanta in Calydon</title>: A Facsimile of the
First Edition</title>
<editor>Georges Lafourcade</editor>
<imprint>
<pubPlace>London</pubPlace>
<publisher>Oxford UP</publisher>
<date>1930</date>
</imprint>
</monogr>
<relatedItem type="otherEdition">
<ref target="#bibl04"/>
</relatedItem>
</biblStruct>
<biblStruct xml:id="bibl04">
<monogr>
<author> Swinburne, Algernon Charles</author>
<title level="m">Atalanta in Calydon</title>
<imprint>
<pubPlace>London</pubPlace>
<publisher>Edward Moxon</publisher>
<date>1865</date>
</imprint>
</monogr>
</biblStruct>
<monogr>
<author>Shirley, James</author>
<title type="main">The gentlemen of Venice</title>
<imprint>
<pubPlace>New York</pubPlace>
<publisher>Readex Microprint</publisher>
<date>1953</date>
</imprint>
<extent>1 microprint card, 23 x 15 cm.</extent>
</monogr>
<series>
<title level="s">Three centuries of drama: English, 1642–1700</title>
</series>
<relatedItem type="otherEdition">
<biblStruct>
<monogr>
<author>Shirley, James</author>
<title type="main" level="m">The gentlemen of Venice</title>
<title type="sub" level="m">a tragi-comedie presented at the private
house in Salisbury Court by Her Majesties servants</title>
<imprint>
<pubPlace>London</pubPlace>
<publisher>H. Moseley</publisher>
<date when="1655">1655</date>
</imprint>
<extent>78 p.</extent>
</monogr>
</biblStruct>
</relatedItem>
</biblStruct>
<monogr>
<author>Tolkien, J.R.R.</author>
<title level="m">Den hobbit</title>
<title type="sub">aus dem Engleschen iwwersat</title>
<editor role="translator">Henry Wickens</editor>
<imprint>
<pubPlace>Esch-sur-Sûre</pubPlace>
<publisher>Op der Lay S. àr. L</publisher>
<date>2002</date>
</imprint>
</monogr>
<relatedItem type="translatedFrom">
<bibl>
<author>Tolkien, J.R.R.</author>
<title level="m">The Hobbit</title>.
<publisher>Collins</publisher>
<date>1997</date>
</bibl>
</relatedItem>
</biblStruct>
target="http://www.example.com/bibliography.xml#TOLK97"/>
TEI: Notes and Statement of Language⚓︎3.12.2.8 Notes and Statement of Language
Explanatory notes about the publication of unusual items, the form of an item (e.g. [Score] or [Microform]), or its provenance (e.g. translation of ...) may be tagged using the note element. The same element may be used for any descriptive annotation of a bibliographic entry in a database.
- note (note) contains a note or annotation.
<author>Coombs, James H., Allen H. Renear,
and Steven J. DeRose.</author>
<title level="a">Markup Systems and the Future of Scholarly
Text Processing.</title>
<title level="j">Communications of the ACM</title>
<biblScope>30.11 (November 1987): 933–947.</biblScope>
<note>Classic polemic supporting descriptive over procedural
markup in scholarly work.</note>
</bibl>
- textLang (text language) describes the languages and writing systems identified within the
bibliographic work being described, rather than its description.
mainLang (main language) supplies a code which identifies the chief language used in the bibliographic work. otherLangs (other languages) one or more codes identifying any other languages used in the bibliographic work.
The mainLang and otherLangs attributes should both provide language identifiers in the same form as used for xml:lang as described at vi.1. Language Identification. Where additional detail is needed correctly to describe a language, or to discuss its deployment in a given text, this should be done using the langUsage element in the TEI header, within which individual language elements document the languages used: see 2.4.2 Language Usage.
TEI: Order of Components within References⚓︎3.12.2.9 Order of Components within References
The order of elements in bibl elements is not constrained.
In biblStruct elements, the analytic element, if it occurs, must come first, followed by one or more monogr and series elements, which may appear intermingled (as long as a monogr element comes first), and then zero or more of the following in any order: note, witDetail, idno, ptr, ref, relatedItem, and citedRange. Within analytic, the title(s), author(s), editor(s), and other statements of responsibility may appear in any order; it is recommended that all forms of the title be given together. Within monogr, the author, editor, and statements of responsibility may either come first or else follow the monographic title(s). Following these, the elements listed below, if present, must appear in the following order:
- notes on the publication (and meeting elements describing the conference, in the case of a proceedings volume)
- edition elements, each followed by any related editor or respStmt elements
- imprint
- biblScope
Within imprint, the elements allowed may appear in any order.
Finally, within the series information in a biblStruct, the sequence of elements is not constrained.
If more detailed structuring of a bibliographic description is required, the biblFull element should be used. This is not further described here, as its contents are essentially equivalent to those of the fileDesc element in the teiHeader, which is fully described in section 2.2 The File Description.
TEI: Bibliographic Pointers ⚓︎3.12.3 Bibliographic Pointers
</bibl>) ...
<analytic>
<author>
<forename>Suzana</forename>
<surname>Sukovic</surname>
</author>
<title level="a">Beyond the Scriptorium: The Role of the Library in Text
Encoding</title>
<ref target="https://www.dlib.org/dlib/january02/sukovic/01sukovic.html">https://www.dlib.org/dlib/january02/sukovic/01sukovic.html</ref>
</analytic>
<monogr>
<title level="j">D-Lib</title>
<imprint>
<biblScope unit="volume">8</biblScope>
<biblScope unit="issue">1</biblScope>
<date>2002</date>
</imprint>
</monogr>
</biblStruct>
<monogr>
<author>
<persName>
<forename>Germain</forename>
<surname>Brice</surname>
</persName>
</author>
<title level="m">Description de la ville de Paris et de tout ce qu’elle contient de plus remarquable, par Germain Brice ; enrichie d’un nouveau plan et de figures dessinées et gravées correctement. 7e édition, revue et augmentée par l’auteur</title>
<imprint>
<date when="1717">1717</date>
<pubPlace>Paris</pubPlace>
<publisher>F. Fournier</publisher>
</imprint>
<extent>In-12</extent>
</monogr>
<ptr type="catBnf"
target="http://catalogue.bnf.fr/ark:/12148/cb30160624f/"/>
</biblStruct>
TEI: Relationship to Other Bibliographic Schemes⚓︎3.12.4 Relationship to Other Bibliographic Schemes
The bibliographic tagging defined here can capture the distinctions required by most bibliographic encoding systems; for the benefit of users of some commonly used systems, the following lists of equivalences are offered, showing the relationship of the markup defined here to the fields defined for bibliographic records in the Scribe, BibTeX, and ProCite systems.
Listed below are the equivalences between the various bibliographic fields defined for use in the Scribe and BibTeX systems of bibliographic databases and the elements defined in this module.23 Elements and structures available in the module defined here which have no analogues in Scribe and BibTeX are not noted.
- address
- tag as placeName or address
- annote
- tag as note
- author
- tag as author
- booktitle
- tag as <title level="m"> or title within monogr
- chapter
- tag as <biblScope unit="chap">
- date
- used only to record date entry was made in the bibliographic database; not supported
- edition
- tag as edition
- editor
- tag as editor or respStmt
- editors
- tag as multiple editor or respStmt elements
- fullauthor
- use the reg element, possibly inside a choice element, inside either an author or name
- fullorganization
- use the reg element, possibly inside a choice element, inside a <name type="org">
- howpublished
- tag as note, possibly using the form <note place="inline">
- institution
- used only for issuer of technical reports; tag as publisher
- journal
- tag as <title level="j"> or title within monogr
- key
- used to specify an alternate sort key for the bibliographic item, for use instead of author's or editor's name; not supported
- meeting
- tag as meeting or as note
- month
- use date; if the date is not in a trivially parseable form, use the when attribute to provide a normalized equivalent in one of the format from XML Schema Part 2: Datatypes Second Edition
- note
- tag as note
- number
- tag as <biblScope unit="issue"> or <biblScope unit="number">; for technical report numbers, use <idno type="docno">
- organization
- used only for sponsor of conference; use <name type="org"> within respStmt within meeting element
- pages
- tag as <biblScope unit="pp">
- publisher
- tag as publisher
- school
- used only for institutions at which thesis work is done; tag as publisher
- series
- tag as <title level="s"> or title within series
- title
- tag as title in appropriate context or with appropriate level value
- volume
- tag as <biblScope unit="volume">
- year
- tag as date; if the date is not in a trivially parseable form, use the when attribute to provide an ISO-format equivalent
TEI: Passages of Verse or Drama⚓︎3.13 Passages of Verse or Drama
The following elements are included in the core module for the convenience of those encoding texts which include mixtures of prose, verse and drama.
- l (verse line) contains a single, possibly incomplete, line of verse.
- lg (line group) contains one or more verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
- sp (speech) contains an individual speech in a performance text, or a passage presented as such in a prose or verse text.
- speaker contains a specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
- stage (stage direction) contains any kind of stage direction within a dramatic text or fragment.
Full details of other, more specialized, elements for the encoding of texts which are predominantly verse or drama are described in the appropriate chapter of part three (for verse, see the verse base described in chapter 6 Verse; for performance texts, see the drama base described in chapter 7 Performance Texts). In this section, we describe only the elements listed above, all of which can appear in any text, whichever of the three modes prose, verse, or drama may predominate in it.
TEI: Core Tags for Verse⚓︎3.13.1 Core Tags for Verse
Like other written texts, verse texts or poems may be hierarchically subdivided, for example into books or cantos. These structural subdivisions should be encoded using the general purpose div or div1 (etc.) elements described below in chapters 4 Default Text Structure and 6 Verse. The fundamental unit of a verse text is the verse line rather than the paragraph, however.
<l>Of that Forbidden Tree, whose<lb/> mortal tast</l>
<l>Brought Death into the World,<lb/> and all our woe,</l>
<l>With loss of Eden, till one greater Man</l>
<l>Restore us, and regain the blissful Seat...</l>
The l element should not be used to represent typographic lines in non-verse materials: if the line-breaking points in a prose text are considered important for analysis, they should be marked with the lb element. Alternatively, a neutral segmentation element such as seg or ab may be used; see further discussion of these elements in chapter 17 Linking, Segmentation, and Alignment. The l element is a member of the model.lLike class, which is a subclass of the model.divPart class, along with elements from the model.pLike (paragraph-like) class.
- att.typed provides attributes that can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology. subtype (subtype) provides a sub-categorization of the element, if needed.
<l>Come fill up the Glass,</l>
<l rend="indent">Round, round let it pass,</l>
<l>'Till our Reason be lost in our Wine:</l>
<l rend="indent">Leave Conscience's Rules</l>
<l rend="indent">To Women and Fools,</l>
<l>This only can make us divine.</l>
</lg>
<lg n="Chorus" type="refrain">
<l>Then a Mohock, a Mohock I'll be,</l>
<l>No Laws shall restrain</l>
<l>Our Libertine Reign,</l>
<l>We'll riot, drink on, and be free.</l>
</lg>
<lg type="octet">
<l>Thus speaks the Muse, and bends her brow severe:—</l>
<l>“Did I, <name>Lætitia</name>, lend my choicest lays,</l>
<l>And crown thy youthful head with freshest bays,</l>
<l>That all the' expectance of thy full-grown year</l>
<l>Should lie inert and fruitless? O revere</l>
<l>Those sacred gifts whose meed is deathless praise,</l>
<l>Whose potent charms the' enraptured soul can raise</l>
<l>Far from the vapours of this earthly sphere!</l>
</lg>
<lg type="sestet">
<l>Seize, seize the lyre! resume the lofty strain!</l>
<l>'T is time, 't is time! hark how the nations round</l>
<l>With jocund notes of liberty resound,—</l>
<l>And thy own <name>Corsica</name> has burst her chain!</l>
<l>O let the song to <name>Britain's</name> shores rebound,</l>
<l rend="indent(-1)">Where Freedom's once-loved voice is heard,
alas! in vain.”</l>
</lg>
</lg>
<l>More tight at this, then thou: Dispatch. O Loue,</l>
<l>That thou couldst see my Warres to day, and knew'st</l>
<l>The Royall Occupation, thou should'st see</l>
<l part="I">A Workeman in't.</l>
<stage>Enter an Armed Soldier.</stage>
<l part="F">Good morrow to thee, welcome.</l>
<!-- ... -->
<l>Unprofitably travelling toward the grave,</l>
<l>Like a false steward who hath much received</l>
<l part="I">And renders nothing back.</l>
</lg>
<lg type="para" n="7">
<l part="F">Was it for this</l>
<l>That one, the fairest of all rivers, loved</l>
<l>To blend his murmurs with my nurse's song,</l>
<!-- ... -->
</lg>
<speaker>First Voice</speaker>
<lg type="stanza" part="I">
<l>But why drives on that ship so fast</l>
<l>Withouten wave or wind?</l>
</lg>
</sp>
<sp>
<speaker>Second Voice</speaker>
<lg type="stanza" part="F">
<l>The air is cut away before,</l>
<l>And closes from behind.</l>
</lg>
</sp>
For alternative methods of aligning groups of lines which do not form simple hierarchic groups, or which are discontinuous, see the more detailed discussion in chapter 17 Linking, Segmentation, and Alignment. For discussion of other elements and attributes specific to the encoding of verse, see chapter 6 Verse.
TEI: Core Tags for Drama⚓︎3.13.2 Core Tags for Drama
Like other written texts, dramatic and other performance texts such as cinema or TV scripts are often hierarchically organized, for example into acts and scenes. These structural subdivisions should be encoded using the general purpose div or div1 (etc.) elements described below in chapters 4 Default Text Structure and 7 Performance Texts. Within these divisions, the body of a performance text typically consists of speeches, often prefixed by a phrase indicating who is speaking, and occasionally interspersed with stage directions of various kinds.
<head>Scene 2.</head>
<stage type="setting">Peachum, Filch.</stage>
<sp>
<speaker>FILCH.</speaker>
<p>Sir, Black Moll hath sent word her Trial comes on in
the Afternoon, and she hopes you will order Matters
so as to bring her off.</p>
</sp>
<sp>
<speaker>PEACHUM.</speaker>
<p>Why, she may plead her Belly at worst; to my
Knowledge she hath taken care of that Security.
But, as the Wench is very active and industrious,
you may satisfy her that I'll soften the Evidence.</p>
</sp>
<sp>
<speaker>FILCH.</speaker>
<p>Tom Gagg, sir, is found guilty.</p>
</sp>
</div2>
<head>ACT I</head>
<div2 n="1" type="Scene">
<head>SCENE I</head>
<stage rend="italic">Enter Barnardo and Francisco,
two Sentinels, at several doors</stage>
<sp>
<speaker>Barn</speaker>
<l part="Y">Who's there?</l>
</sp>
<sp>
<speaker>Fran</speaker>
<l>Nay, answer me. Stand and unfold yourself.</l>
</sp>
<sp>
<speaker>Barn</speaker>
<l part="I">Long live the King!</l>
</sp>
<sp>
<speaker>Fran</speaker>
<l part="M">Barnardo?</l>
</sp>
<sp>
<speaker>Barn</speaker>
<l part="F">He.</l>
</sp>
<sp>
<speaker>Fran</speaker>
<l>You come most carefully upon your hour.</l>
</sp>
<sp>
<speaker>Barn</speaker>
<l>'Tis now struck twelve. Get thee to bed, Francisco.</l>
</sp>
<sp>
<speaker>Fran</speaker>
<l>For this relief much thanks. 'Tis bitter cold,</l>
<l part="I">And I am sick at heart.</l>
</sp>
</div2>
</div1>
<add place="margin">Now call'd <name xml:id="barnardo">Bernardo</name> &
<name xml:id="francisco">Francesco</name>.</add>
</stage>
<sp who="#francisco">
<speaker>1.</speaker>
<l part="Y">Stand: who is that?</l>
</sp>
<sp who="#barnardo">
<speaker>2.</speaker>
<l part="Y">Tis I.</l>
</sp>
<sp who="#francisco">
<speaker>1.</speaker>
<l>O you come most carefully vpon your watch,</l>
</sp>
<sp who="#barnardo">
<speaker>2.</speaker>
<l>And if you meete Marcellus and Horatio,</l>
<l>The partners of my watch, bid them make haste.</l>
</sp>
<sp who="#francisco">
<speaker>1.</speaker>
<l part="Y">I will: See who goes there.</l>
</sp>
<stage>Enter Horatio and Marcellus.</stage>
<div2 n="1" type="scene">
<head rend="italic">Actus primus, Scena prima.</head>
<stage rend="italic" type="setting">A tempestuous
noise of Thunder and Lightning heard: Enter
a Ship-master, and a Boteswaine.</stage>
<sp>
<speaker>Master.</speaker>
<p>Bote-swaine.</p>
</sp>
<sp>
<speaker>Botes.</speaker>
<p>Heere Master: What cheere?</p>
</sp>
<sp>
<speaker>Mast.</speaker>
<p>Good: Speake to th' Mariners: fall
too't, yarely, or we run our selues a ground,
bestirre, bestirre. <stage type="move">Exit.</stage>
</p>
</sp>
<stage type="move">Enter Mariners.</stage>
<sp>
<speaker>Botes.</speaker>
<p>Heigh my hearts, cheerely, cheerely my harts: yare,
yare: Take in the toppe-sale: Tend to th' Masters whistle:
Blow till thou burst thy winde, if roome e-nough.</p>
</sp>
</div2>
</div1>
<speaker>The reverend Doctor Opimian</speaker>
<p>I do not think I have named a single unpresentable fish.</p>
</sp>
<sp>
<speaker>Mr Gryll</speaker>
<p>Bream, Doctor: there is not much to be said for bream.</p>
</sp>
<sp>
<speaker>The Reverend Doctor Opimian</speaker>
<p>On the contrary, sir, I think there is much to be said for him.
In the first place ...</p>
<p>Fish, Miss Gryll — I could discourse to you on fish by the
hour: but for the present I will forbear ...</p>
</sp>
<speaker>Lord Curryfin</speaker>
<stage>(after a pause).</stage>
<p>
<q>Mass</q> as the second grave-digger says
in <title>Hamlet</title>, <q>I cannot tell.</q>
</p>
</sp>
<p>A chorus of laughter dissolved the sitting.</p>
TEI: Overview of the Core Module ⚓︎3.14 Overview of the Core Module
All the elements described in this chapter are provided by the core module.
- Module core: Elements common to all TEI documents
-
- Elements defined: abbr add addrLine address analytic author bibl biblScope biblStruct binaryObject cb choice cit citedRange corr date del desc distinct divGen editor ellipsis email emph expan foreign gap gb gloss graphic head headItem headLabel hi imprint index item l label lb lg list listBibl measure measureGrp media meeting mentioned milestone monogr name note noteGrp num orig p pb postBox postCode ptr pubPlace publisher q quote rb ref reg relatedItem resp respStmt rs rt ruby said series sic soCalled sp speaker stage street teiCorpus term textLang time title unclear unit
- Classes defined: att.milestoneUnit
The selection and combination of modules to form a TEI schema is described in 1.2 Defining a TEI Schema.