12 Representation of Primary Sources
Table of contents
- 12.1 Digital Facsimiles
- 12.2 Combining Transcription with Facsimile
- 12.3 Scope of Transcriptions
- 12.4 Aspects of Layout
- 12.5 Transcription and Ruby
- 12.6 Headers, Footers, and Similar Matter
- 12.7 Identifying Changes and Revisions
- 12.8 Other Primary Source Features not Covered in these Guidelines
- 12.9 Module for Transcription of Primary Sources
This chapter describes elements that may be used to represent primary source materials, such as manuscripts, printed books, ephemera, or other textual documents. Some of these specialized elements, particularly at phrase-level, add to the other elements available within text to deal with textual phenomena more specific to primary source transcription. Other structural and block-level elements described here can be used to represent primary source materials by prioritizing the encoding of their spatial features over their logical textual structure (that is, the elements described in chapter 4 Default Text Structure). These elements, facsimile, sourceDoc, and their children, may be used in parallel and in combination with an encoding of logical text structures with text, or as standalone representations. The element sourceDoc in particular provides a way of combining facsimile and transcriptions by embedding transcribed text. This approach focuses on physical and textual features that can be primarily described spatially, such as the sequence of pages in a manuscript, or the layout of a printed page. This is not meant to be the only way of transcribing primary sources in TEI, or even a preferred way; which approach is more appropriate will depend on the specific needs of your project.
Although this chapter discusses manuscript materials more frequently than other forms of written text, most of the recommendations presented are equally applicable to facsimiles of a wide variety of media, including printed matter, monumental inscriptions, and art. Each medium has its own vocabulary of agents. In the following examples, terms such as ‘scribe’, ‘author’, ‘editor’, ‘annotator’ or ‘corrector’ may be re-interpreted in terms more appropriate to the medium being transcribed. In printed material, for example, the ‘compositor’ plays a role analogous to the ‘scribe’, while in an authorial manuscript, the ‘author’ and the ‘scribe’ are the same person.
This module may be used in conjunction with other modules. These recommendations are not intended to meet every transcriptional circumstance likely to be faced by any scholar. They are intended rather as a base to enable encoding of the most common phenomena found in the course of scholarly transcription of primary source materials. These guidelines do not address the encoding of physical description of textual witnesses: the materials of the carrier, the medium of the inscribing implement, the organisation of the carrier materials themselves (such as quiring, collation), authorial instructions or scribal markup, etc., except insofar as these are involved in the broader question of manuscript description, as addressed by the msdescription module described in chapter 11 Manuscript Description.
This chapter begins by describing elements for handling digitally-encoded images of primary source materials for the purpose of creating digital facsimiles using the facsimile element (12.1 Digital Facsimiles).
The next section (12.2 Combining Transcription with Facsimile) describes two ways of combining a facsimile images with a transcription; either by referencing a parallel transcription in text, or by providing an ‘embedded’ transcription that prioritizes the encoding of a resource’s spatial features via the sourceDoc element and a number of transcriptional elements.
Section 12.3 Scope of Transcriptions documents elements that support scholars in recording information about specific features of the text written on its physical carrier, such as 12.3.1 Altered, Corrected, and Erroneous Texts and 12.3.2 Hands and Responsibility
Section 12.4 Aspects of Layout describes how complex page layouts may be represented.
Section 12.6 Headers, Footers, and Similar Matter introduces the element fw (forme work) for encoding material repeated from page to page that falls outside the stream of the text.
Section 12.7 Identifying Changes and Revisions describes how to document changes made during the production or revision of a primary source.
The chapter concludes with a technical overview of the structure and organization of the module described here. Some elements from other chapters are recontexualized for situations involving the transcription of primary source materials, whether within text or sourceDoc. Therefore, this overview should be read in conjunction with chapters 3 Elements Available in All TEI Documents and 5 Characters, Glyphs, and Writing Modes.
TEI: Digital Facsimiles⚓︎12.1 Digital Facsimiles
A common approach in the TEI to representing pre-existing sources involves transcribing or otherwise converting sources into character form before marking them up. However, it is also a common practice to make a different form of ‘digital text’ that is instead composed of digital images of the original source, typically one per page, or other written surface. We call such a resource a digital facsimile. A digital facsimile may, in the simplest case, just consist of a collection of images, with some metadata to identify them and the source materials portrayed. It may sometimes contain a variety of images of the same source pages, perhaps of different resolutions, or of different kinds. Such a collection may form part of any kind of document, for example a commentary of a codicological or paleographic nature, where there is a need to align explanatory text with image data. It may also be complemented by a transcribed or encoded version of the original source, which may be linked to the page images or ‘embedded’ as discussed in 12.2 Combining Transcription with Facsimile. In this section we present elements designed to support these various possibilities and discuss the associated mechanisms provided by these Guidelines.
When this module is included in a schema, the class att.global is extended to include two new pointer attributes, facs and change:
- att.global.facs provides attributes used to express correspondence between an element and all or
part of a facsimile image or surface.
facs (facsimile) points to one or more images, portions of an image, or surfaces which correspond to the current element. - att.global.change provides attributes allowing its member elements to specify one or more states or
revision campaigns with which they are associated.
change points to one or more change elements documenting a state or revision campaign to which the element bearing this attribute and its children have been assigned by the encoder.
The change attribute is discussed further below in section 12.7 Identifying Changes and Revisions. The facs attribute is used to associate any element in a transcription with an image of the corresponding part of the source, by means of the usual URI pointing mechanism.
<teiHeader>
<!--...-->
</teiHeader>
<text>
<pb facs="page1.png"/>
<!-- text contained on page 1 is encoded here -->
<pb facs="page2.png"/>
<!-- text contained on page 2 is encoded here -->
</text>
</TEI>
The recommended approach to encoding facsimiles is instead to use the facs attribute in conjunction with the elements facsimile or sourceDoc, and the elements surface, surfaceGrp, and zone, which are also provided by this module. These elements make it possible to accommodate multiple images of each page, as well as to record the position and relative size of elements identified on any kind of written surface and to link such elements with digital facsimile images of them. Typical applications include the provision of full text search in ‘digital facsimile editions’, and ways of annotating graphics, for example so as to identify individuals appearing in group portraits and link them to data about the people represented.
The following elements are available to represent components of a digital facsimile:
- facsimile contains a representation of some written source in the form of a set of images rather than as transcribed or encoded text.
- sourceDoc contains a transcription or other representation of a single source document potentially forming part of a dossier génétique or collection of sources.
- surface defines a written surface as a two-dimensional coordinate space, optionally grouping one or more graphic representations of that space, zones of interest within that space, and, when using an embedded transcription approach, transcriptions of the writing within them.
- surfaceGrp (surface group) defines any kind of useful grouping of written surfaces, for example the recto and verso of a single leaf, which the encoder wishes to treat as a single unit.
- zone defines any two-dimensional area within a surface element.
points [att.coordinated] identifies a two dimensional area by means of a series of pairs of numbers, each of which gives the x,y coordinates of a point on a line enclosing the area. - path (path) defines any line passing through two or more points within a surface element.
points identifies a line within the container or bounding box specified by the parent element by means of a series of two or more pairs of numbers, each of which gives the x,y coordinates of a point on the line.
Either of the facsimile or sourceDoc elements may be used to represent a digital facsimile. Either may appear within a TEI document along with, or instead of, the text element introduced in section 4 Default Text Structure. The sourceDoc element is used when a digital facsimile contains a transcription that prioritizes the encoding of the spatial features and layout of a text document over the text’s logical textual structure; the text element should be used when it contains a textual transcription focused on its logical features. When the digital facsimile contains only images, however, only facsimile elements should be used. In this section, we first discuss the simpler case, returning to the use of the sourceDoc element in section 12.2 Combining Transcription with Facsimile below. When this module is selected therefore, a legal TEI document may thus comprise any of the following:
- a TEI header and a text element
- a TEI header and a facsimile element
- a TEI header and a sourceDoc element
- a TEI header, a facsimile element, and a text element
- a TEI header, one or more sourceDoc or facsimile elements, and a text element
Like the text element, a facsimile element may also contain an optional front or back element, used in the same way as described in sections 4.5 Front Matter and 4.7 Back Matter.
<graphic url="page1.png"/>
<graphic url="page2.png"/>
<graphic url="page3.png"/>
<graphic url="page4.png"/>
</facsimile>
<graphic url="page1.png"/>
<surface>
<graphic url="page2-highRes.png"/>
<graphic url="page2-lowRes.png"/>
</surface>
<graphic url="page3.png"/>
<graphic url="page4.png"/>
</facsimile>
The surface element provides a way of indicating that the two images of page2 represent the same surface within the source material. A surface might be one side of a piece of paper or parchment, an opening in a codex treated as a single surface by the writer, a face of a monument, a billboard, a membrane of a scroll, or indeed any two-dimensional surface, of any size.
<surfaceGrp n="leaf1">
<surface>
<graphic url="page1.png"/>
</surface>
<surface>
<graphic url="page2-highRes.png"/>
<graphic url="page2-lowRes.png"/>
</surface>
</surfaceGrp>
</facsimile>
Simply grouping related graphics is not however the main purpose of the surface element: rather it is to help identify the location and size of the various two-dimensional spaces constituting the digital facsimile. Note that the actual dimensions of the object represented are not provided by the surface element ; rather, the surface element defines an abstract coordinate space which may be used to address parts of the image. Four attributes supplied by the att.coordinated class are used to define this space.
- att.coordinated provides attributes that can be used to position their parent element within a two
dimensional coordinate system.
ulx gives the x coordinate value for the upper left corner of a rectangular space. uly gives the y coordinate value for the upper left corner of a rectangular space. lrx gives the x coordinate value for the lower right corner of a rectangular space. lry gives the y coordinate value for the lower right corner of a rectangular space.
By default, the same coordinate space is used for a surface and for all of its child elements.43 It may be most convenient to derive a coordinate space from a digital image of the surface in question such that each pixel in the image corresponds with a whole number of units (typically 1) in the coordinate space. In other cases it may be more convenient to use units such as millimetres. Neither practice implies any specific mapping between the coordinate system used and the actual dimensions of the physical object represented.
A surface element can contain one or more zone elements, each of which represents a region or bounding box defined in terms of the same coordinate space as that of its parent surface element. A zone may be rectangular or non-rectangular: a rectangular zone is defined by a sequence of four coordinates in the same way as a surface; a non-rectangular zone is defined using the attribute points, which provides a sequence of coordinates, each of which specifies a point on the perimeter of the zone.44
A zone may be used to define any region of interest, such as a detail or illustration, or some part of the surface which is to be aligned with a particular text element, or otherwise distinguished from the rest of the surface. A surface establishes a coordinate system which may be used to address parts or the whole of some digital representation of a written surface. A zone, by contrast, defines any arbitrary area of interest relative to that surface, using the same coordinate system. It might be bigger or smaller than its parent surface, or might overlap its boundaries. The only constraint is that it must be defined using the same coordinate system.
When an image of some kind is supplied within either a zone or a surface, the implication is that the image represents the whole of the zone or surface concerned. In the simple case therefore, we might imagine a surface defining a page, within which there is a graphic representing the whole of that page, and a number of zones defining parts of the page, each with its own graphic, each representing a part of the page. If however one of those graphics actually represents an area larger than the page (for example to include a binding or the surface of a desk on which the page rests), then it will be enclosed by a zone with coordinates larger than those of the parent surface.
For example, consider the following figure:
This is an image of a two page spread from a manuscript in the Badische Landesbibliothek, Karlsruhe. We have no information as to the dimensions of the original object, but the low resolution image displayed here contains 500 pixels horizontally and 321 pixels vertically. For convenience, we might map each pixel to one cell of the coordinate space.45
<surface ulx="0" uly="0" lrx="500" lry="321">
<graphic url="http://upload.wikimedia.org/wikipedia/commons/5/50/Handschrift.karlsruhe.blb.jpg"/>
</surface>
</facsimile>
If desired, the binaryObject element described in 3.10 Graphics and Other Non-textual Components (or any other element from the model.graphicLike class) may be used instead of a graphic element.
<surface ulx="50" uly="20" lrx="400"
lry="280">
<zone ulx="0" uly="0" lrx="500" lry="321">
<graphic url="http://upload.wikimedia.org/wikipedia/commons/5/50/Handschrift.karlsruhe.blb.jpg"/>
</zone>
<zone ulx="50" uly="20" lrx="210" lry="280">
<!-- left hand page -->
</zone>
<zone ulx="240" uly="25" lrx="400"
lry="280">
<!-- right hand page -->
</zone>
<zone ulx="90" uly="40" lrx="200" lry="225">
<!--- written part of left hand page -->
</zone>
</surface>
</facsimile>
As this example shows, in addition to acting as a container for graphic elements, zone elements may be used to identify parts of a surface for analytical purposes.
The relationship between zone and surface can be quite complex: for example, it may be appropriate to treat the whole of a two page spread as a single written surface, perhaps because particular written zones span both pages. A zone may contain a nested surface, if for example a page has an additional scrap of paper attached to it. A zone may be of any shape, not simply rectangular. Discussion of these and other cases are provided in section 12.2.2.1 Advanced Uses of surface and zone below.
<surface ulx="0" uly="0" lrx="200" lry="300">
<graphic url="Bovelles-49r.png"/>
</surface>
</facsimile>
<surface ulx="0" uly="0" lrx="200" lry="300">
<graphic url="Bovelles-49r.png"/>
<zone ulx="25" uly="25" lrx="180" lry="60">
<!-- contains the title -->
</zone>
<zone ulx="28" uly="75" lrx="175" lry="178"/>
<!-- contains the paragraph in italics -->
<zone ulx="105" uly="76" lrx="175"
lry="160"/>
<!-- contains the figure -->
<zone ulx="45" uly="125" lrx="60" lry="130"/>
<!-- contains the word "pendans" -->
</surface>
</facsimile>
<surface ulx="0" uly="0" lrx="200" lry="300">
<graphic url="Bovelles-49r.png"/>
<zone points="4.8,31.0 5.4,30.7 5.5,32.2 5.8,32.8 6.1,33.4 5.5,33.7 5.1,33.3 4.6,32.2"/>
</surface>
</facsimile>
<graphic url="Bovelles49r-detail.png"/>
</zone>
<graphic url="facs-fig3.jpg"/>
<path points="74,73 171,244"
xml:id="balan"/>
<path points="71,203 173,116"
xml:id="dindan"/>
</surface>
<graphic url="sterne.png"/>
<path points="65,511 88,510 92,517 100,521 107,520 112,516 116,512 117,511 152,512 156,508 162,505 169,505 174,506 178,509 179,512 236,514 136,479 208,493 270,512 328,513 336,525 339,528 339,536 331,539 328,535 329,530 334,526 342,521 345,519 350,512 397,514 402,498 414,515 425,515 435,531 440,513 475,513 475,518 477,520 479,521 481,522 483,521 484,518 486,516 486,514 491,512 494,514 496,520 496,529 493,535 494,539 497,543 501,543 504,543 507,540 508,537 507,526 505,518 502,510 501,508 501,503 503,501 506,500 510,500 512,503 513,507 511,513 543,516 552,513 552,501 550,496 549,490 552,486 562,487 564,468 559,465 557,462 556,457 558,453 562,450 570,451 573,446 579,433"/>
</surface>
TEI: Combining Transcription with Facsimile⚓︎12.2 Combining Transcription with Facsimile
A digitized source document may contain nothing more than page images and a small amount of metadata. It may also contain an encoded transcription of the pages represented, which may either be ‘embedded’ within a sourceDoc element, or supplied in parallel with a facsimile as defined above.
If the transcription is regarded as a text in its own right, organized and structured independently of its physical realization in the document or documents represented by the facsimile, then the recommended practice is to use the text element to contain such a structured representation, and to present it in parallel. The text element is a sibling of the facsimile and sourceDoc elements. This approach is illustrated in section 12.2.1 Parallel Transcription below. Alternatively, if the transcription is intended not to prioritize representation of the final text so much as the process by which the document came to take its present form, or the physical disposition of its component parts, it can be presented as an embedded transcription, as further described in section 12.2.2 Embedded Transcription below.
TEI: Parallel Transcription⚓︎12.2.1 Parallel Transcription
<surface ulx="0" uly="0" lrx="200" lry="300">
<zone xml:id="B49r" ulx="0" uly="0"
lrx="200" lry="300">
<graphic url="Bovelles-49r.png"/>
</zone>
<zone ulx="105" uly="76" lrx="175"
lry="160">
<graphic url="Bovelles49r-detail.png"/>
</zone>
<zone xml:id="B49rHead" ulx="25" uly="25"
lrx="180" lry="60"/>
<!-- contains the title -->
<zone xml:id="B49rPara2" ulx="28" uly="75"
lrx="175" lry="178"/>
<!-- contains the first paragraph in italics -->
<zone xml:id="B49rFig1" ulx="105" uly="76"
lrx="175" lry="160"/>
<!-- contains the figure -->
<zone xml:id="B49rW457" ulx="45" uly="125"
lrx="60" lry="130"/>
<!-- contains the word "pendans" -->
</surface>
</facsimile>
<fw>De Geometrie 49</fw>
<head facs="#B49rHead"> DU SON ET ACCORD DES CLOCHES ET <lb/> des alleures des
chevaulx, chariotz & charges, des fontaines:& <lb/> encyclie du monde,
& de la dimension du corps humain.<lb/> Chapitre septiesme</head>
<div n="1">
<p>Le son & accord des cloches pendans en ung mesme <lb/> axe, est faict en
contraires parties.</p>
<p rend="it" facs="#B49rPara2">LEs cloches ont quasi fi<lb/>gures de rondes
pyra<lb/>mides imperfaictes & <lb/> irregulieres: & leur accord se
<lb/> fait par reigle geometrique. Com<lb/>me si les deux cloches C & D
<lb/> sont <w facs="#B49rW457">pendans</w> à ung mesme axe <lb/> ou essieu A B:
je dis que leur ac<lb/>cord se fera en co<ex>n</ex>traires parties<lb/>
co<ex>m</ex>me voyez icy figuré. Car qua<ex>n</ex>d <lb/> lune sera en
hault, laultre declinera embas. Aultrement si elles decli<lb/>nent toutes deux
ensembles en une mesme partie, elles seront discord, <lb/> & sera leur
sonnerie mal plaisante à oyr.<figure facs="#B49rFig1">
<graphic url="Bovelles49r-detail.png"/>
</figure>
</p>
</div>
<surface start="#PB49R">
<graphic url="Bovelles-49r.png"/>
</surface>
</facsimile>
<text>
<body>
<div>
<!-- ... -->
<pb xml:id="PB49R"/>
<fw>De Geometrie 49</fw>
<!-- ... -->
</div>
</body>
</text>
TEI: Embedded Transcription⚓︎12.2.2 Embedded Transcription
An embedded transcription is one in which words and other written traces are encoded as subcomponents of elements representing the physical surfaces carrying them rather than independently of them.
The following elements are available for this purpose:
- sourceDoc contains a transcription or other representation of a single source document potentially forming part of a dossier génétique or collection of sources.
- surface defines a written surface as a two-dimensional coordinate space, optionally grouping one or more graphic representations of that space, zones of interest within that space, and, when using an embedded transcription approach, transcriptions of the writing within them.
- zone defines any two-dimensional area within a surface element.
- line contains the transcription of a topographic line in the source document.
- seg (arbitrary segment) represents any segmentation of text below the ‘chunk’ level.
The elements surface, surfaceGrp, and zone were introduced above in section 12.1 Digital Facsimiles. When supplied within a sourceDoc element, these elements may contain transcriptions of the written content of a source in addition to or as an alternative to digital images of them. Such transcription may be placed directly within the zone element, or within one or more line elements, for cases where the writing is linear, in the sense that it is composed of discrete tokens organized physically into groups, typically organized in a sequence corresponding with the way they are intended to be read. Depending on the directionality of the writing system used, this might be any combination of top-down and left to right, or vice versa. The element line may be used to hold a complete group of such tokens. Where, however, the lineation is not considered significant, any group of tokens may be indicated using the zone element. The seg element described in section 17.3 Blocks, Segments, and Anchors may also be used to indicate smaller sequences of tokens within zone, or line as appropriate.
Returning to the preceding example, we might transcribe the content of the zone to which we gave the identifier B49rPara2 within a sourceDoc element as follows:
<surface ulx="0" uly="0" lrx="200" lry="300">
<zone ulx="0" uly="0" lrx="200" lry="300">
<graphic url="Bovelles-49r.png"/>
</zone>
<!-- ... -->
<zone ulx="28" uly="75" lrx="175" lry="178">
<line>LEs cloches ont quasi
fi</line>
<line>gures de rondes pyra</line>
<line>mides imperfaictes &
</line>
<line> irregulieres: & leur accord se</line>
<line> fait par reigle geometrique. Com</line>
<line>me si les deux cloches C
& D </line>
<line> sont <zone ulx="45" uly="125" lrx="60"
lry="130">pendans</zone> à ung mesme axe</line>
<line> ou essieu A B: je dis que
leur ac</line>
<line>cord se fera en cõtraires parties</line>
<line> cõme
voyez icy figuré. Car quãd </line>
<line> lune sera en hault, laultre
declinera embas. Aultrement si elles declinent toutes deux ensembles en une
mesme partie, elles seront discord,</line>
<line> & sera leur sonnerie
mal plaisante à oyr.</line>
</zone>
<zone ulx="105" uly="76" lrx="175"
lry="160">
<graphic url="Bovelles49r-detail.png"/>
</zone>
</surface>
</sourceDoc>
As mentioned above, some or all of the written surfaces being transcribed may be composed of physically distinct scraps. In the following example, taken from the Walt Whitman Archive, two pieces of newsprint have been glued to a piece of blue paper on which a poem is being drafted:
The two pieces of newsprint might simply be regarded as special kinds of zone, but they are also new surfaces, since they might contain additional written zones themselves (such as the numbers in this case).
<zone>
<line>Poem</line>
<line>As in Visions of — at</line>
<line>night —</line>
<line>All sorts of fancies running through</line>
<line>the head</line>
</zone>
<zone>
<surface type="newsprint"
attachment="glue" flipping="false">
<zone>Spring has just set in here, and the weather[...] a steamer </zone>
<metamark function="sequence">2</metamark>
</surface>
</zone>
<zone>
<surface type="newsprint"
attachment="glue" flipping="false">
<zone>"The shores on either side of the Sound are... The In- </zone>
<metamark function="sequence">3</metamark>
</surface>
</zone>
</surface>
The metamark element used in this example is further discussed below (12.3.4.2 Metamarks)
Note that in this example we have not included any graphic element corresponding with the zone or surface elements identified in the transcription. The encoder may choose to complement a transcription with graphic representations of its source at whatever level is considered effective, or not at all. Equally, the encoder may choose to provide only graphics without any transcription, to provide only a structured (non-embedded) transcription, or to provide any combination of the three.
This example also lacks any coordinate information to specify either the size of the two newspaper fragments or whereabouts on the parent surface element they are to be found, other than the reading order implicit in their sequence. Such information could be added if desired by specifying a coordinate system on the outermost surface element, and then indicating values within that system for each of the two fragments, as was discussed above. We discuss this in further detail in section 12.2.2.1 Advanced Uses of surface and zone below.
TEI: Advanced Uses of surface and zone⚓︎12.2.2.1 Advanced Uses of <surface> and <zone>
As a child of sourceDoc, the surface element both identifies a specific area containing writing and provides a two dimensional set of coordinates which can be used to position and define dimensions for sub-parts of it. Furthermore, surfaces may nest within other surfaces, as in the case of ‘patches’ or other written materials attached to the main writing surface. In the general case, the position and dimensions of such nested surfaces will be defined using the same coordinate system as that supplied by the parent surface element. It is also possible, however, that a different coordinate system is required for such a nested surface, perhaps because it requires a more complex granularity. We consider both possibilities.
<zone ulx="1" uly="1" lrx="10" lry="10">
<line>Poem</line>
<line>As in Visions of — at</line>
<line>night —</line>
<line>All sorts of fancies running through</line>
<line>the head</line>
</zone>
<zone ulx="4" uly="4" lrx="20" lry="20">
<surface type="newsprint"
attachment="glue" flipping="false">
<zone>Spring has just set in here, and the weather[...] a steamer </zone>
<metamark function="sequence">2</metamark>
</surface>
</zone>
</surface>
<zone ulx="1" uly="1" lrx="10" lry="10">
<line>Poem</line>
<!-- ... -->
<line>the head</line>
</zone>
<zone ulx="4" uly="4" lrx="20" lry="20">
<surface ulx="0" uly="0" lrx="100"
lry="100">
<zone ulx="10" uly="10" lrx="90" lry="95"> Spring has just set in here, and the
weather [...] a steamer </zone>
</surface>
</zone>
</surface>
All of the examples so far given have involved rectangular zones, for clarity of exposition. As noted above, the points attribute may be used to define non-rectangular zones as a series of points. For example, in the last of the Whitman examples discussed in section 12.3.4.2 Metamarks above, we might wish to record the exact shape of the zone containing the metamark Entered. Since this is not a rectangular zone, we use the points attribute to indicate the points defining a polygon which contains it. The values used are expressed in terms of a coordinate space running from 0 to 229 in the X dimension, and 0 to 160 in the Y dimension.
<graphic url="whitman-02.jpg"/>
<zone xml:id="entered"
points="142,122 155,113 178,122 208,144 198,154 178,139"/>
</surface>
uly="16.14" lrx="0" lry="0">
<graphic url="stone.jpg"/>
<zone xml:id="county"
points="4.6,6.3 5.25,5.85 6.2,6.6 8.2,7.4 9.9,6.6 10.9,6.1 11.4,6.7 8.2,8.3 6.2,7.6"/>
</surface>
<surface ulx="50" uly="20" lrx="400"
lry="280">
<!-- ... -->
</surface>
</facsimile>
<surface ulx="50" uly="20" lrx="400"
lry="280">
<zone ulx="0" uly="0" lrx="500" lry="321">
<graphic url="http://upload.wikimedia.org/wikipedia/commons/5/50/Handschrift.karlsruhe.blb.jpg"/>
</zone>
</surface>
</facsimile>
TEI: Scope of Transcriptions⚓︎12.3 Scope of Transcriptions
When transcribing a primary source, whether using text or sourceDoc, scholars may wish to record information concerning individual readings of letters, words, or larger units.They may also wish to include other editorial material, such as comments on the status or possible origin of particular readings, corrections, or text supplied to fill lacunae.
Such elements may also be used for digital transcriptions in which the object is not to represent a finished text, but rather to represent the creative process, as evidenced by different ‘layers’ or ‘traces’ of writing in one or more documents. Transcriptions of this kind are closely focussed on the physical appearance of specific documents, needing to distinguish the traces of different writing activities on them, such as additions and deletions but also other indications of how the writing is to be read, such as indications of transposition, re-affirmation of writing which has been deleted, and so on. Such distinctions are considered of particular importance when dealing with authorial manuscripts, but are also relevant in the case of historical sources such as charters or other legal documents.
In either case, it is customary in transcriptions to register certain features of the source, such as ornamentation, underlining, deletion, areas of damage and lacunae. This chapter provides ways of encoding such information:
- methods of recording editorial or other alterations to the text, such as expansion of abbreviations, corrections, conjectures, etc. (section 12.3.1 Altered, Corrected, and Erroneous Texts)
- methods of describing important extra-linguistic phenomena in the source: unusual spaces, lines, page and line breaks, changes of manuscript hand, etc. (section 12.3.2 Hands and Responsibility)
- methods of representing aspects of layout such as spacing or lines 12.4 Aspects of Layout
- methods of representing material such as running heads, catch-words, and the like (section 12.6 Headers, Footers, and Similar Matter)
The remainder of this chapter describes a model for encoding such transcriptions, in which elements such as mod, del, etc. are used to mark writing traces and their functions within the document. Each such element can be assigned to one or more editorially-defined modification groups, termed a change, by means of a global change attribute, which references a definition for the modification group concerned, typically provided within the TEI header creation element; see further 12.7 Identifying Changes and Revisions. The transcription itself may be embedded within the elements surface and zone described in section 12.1 Digital Facsimiles, or provided in parallel within a text element. Within a zone, the transcription may be organized topographically in terms of lines of writing, using the line element, or in terms of further nested zones, or as a combination of the two (12.2.2 Embedded Transcription).
These recommendations are not intended to meet every transcriptional circumstance likely to be faced by any scholar. Rather, they should be regarded as a base which can be elaborated if necessary by different scholars in different disciplines
As a rule, all elements which may be used in the course of a transcription of a single witness may also be used in a critical apparatus, i.e. within the elements proposed in chapter 13 Critical Apparatus. This can generally be achieved by nesting a particular reading containing tagged elements from a particular witness within the rdg element in an app structure.
Just as a critical apparatus may contain transcriptional elements within its record of variant readings in various witnesses, one may record variant readings in an individual witness by use of the apparatus mechanisms app and rdg. This is discussed in section 13.3 Using Apparatus Elements in Transcriptions.
TEI: Altered, Corrected, and Erroneous Texts⚓︎12.3.1 Altered, Corrected, and Erroneous Texts
In the detailed transcription of any source, it may prove necessary to record various types of actual or potential alteration of the text: expansion of abbreviations, correction of the text (either by author, scribe, or later hand, or by previous or current editors or scholars), addition, deletion, or substitution of material, and similar matters. The sections below describe how such phenomena may be encoded using either elements defined in the core module (defined in chapter 3 Elements Available in All TEI Documents) or specialized elements available only when the module described in this chapter is available.
TEI: Core Elements for Transcriptional Work⚓︎12.3.1.1 Core Elements for Transcriptional Work
In transcribing individual sources of any type, encoders may record corrections, normalizations, additions, and omissions using the elements described in section 3.5 Simple Editorial Changes. Representation of abbreviations and their expansions may also involve use of elements described in section 3.6 Names, Numbers, Dates, Abbreviations, and Addresses. Elements particularly relevant to this chapter include:
- abbr (abbreviation) contains an abbreviation of any sort.
- add (addition) contains letters, words, or phrases inserted in the source text by an author, scribe, or a previous annotator or corrector.
- choice (choice) groups a number of alternative encodings for the same point in a text.
- corr (correction) contains the correct form of a passage apparently erroneous in the copy text.
- del (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, or a previous annotator or corrector.
- expan (expansion) contains the expansion of an abbreviation.
- gap (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible.
- sic (Latin for thus or so) contains text reproduced although apparently incorrect or inaccurate.
All of these elements bear additional attributes for specifying who is responsible for the interpretation represented by the markup, and the associated certainty. In addition, some of them bear an attribute allowing the markup to be categorized by type and source.
- att.editLike provides attributes describing the nature of an encoded scholarly intervention or
interpretation of any kind.
evidence indicates the nature of the evidence supporting the reliability or accuracy of the intervention or interpretation. Suggested values include: 1] internal; 2] external; 3] conjecture - att.global.source provides attributes used by elements to point to an external source.
source specifies the source from which some aspect of this element is drawn. - att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text,
the markup or something asserted by the markup, and the degree of certainty associated
with it.
cert (certainty) signifies the degree of certainty associated with the intervention or interpretation. resp (responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber. - att.typed provides attributes that can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology. subtype (subtype) provides a sub-categorization of the element, if needed.
The specific aspect of the markup described by these attributes differs on different elements; for further discussion, see the relevant sections below, especially section 12.3.2.2 Hand, Responsibility, and Certainty Attributes.
The following sections describe how the core elements just named may be used in the transcription of primary source materials.
TEI: Abbreviation and Expansion⚓︎12.3.1.2 Abbreviation and Expansion
The writing of manuscripts by hand lends itself to the use of abbreviation to shorten scribal labour. Commonly occurring letters, groups of letters, words, or even whole phrases, may be represented by significant marks. This phenomenon of manuscript abbreviation is so widespread and so various that no taxonomy of it is here attempted. Instead, methods are shown which allow abbreviations to be encoded using the core elements mentioned above.
A manuscript abbreviation may be viewed in two ways. One may transcribe it as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’. One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, ‘per’, ‘re’, ‘n’. Both of these views are supported by these Guidelines.
In many cases the glyph found in the manuscript source also exists in the Unicode character set: for example the common Latin brevigraph ⁊, standing for et and often known as the ‘Tironian et’ can be directly represented in any XML document as the Unicode character with code point U+204A (see further Character References and vi.1. Language Identification). In cases where it does not, these Guidelines recommend use of the g element provided by the gaiji module described in chapter 5 Characters, Glyphs, and Writing Modes. This module allows the encoder great flexibility both in processing and in documenting non-standard characters or glyphs, including the ability to provide detailed documentation and images for them.
this ladder
<!-- elsewhere -->
<charDecl>
<char xml:id="b-er">
<!-- definition for the er brevigraph -->
</char>
<char xml:id="b-per">
<!-- definition for the per brevigraph -->
</char>
</charDecl>
- ex (editorial expansion) contains a sequence of letters added by an editor or transcriber when expanding an abbreviation.
- am (abbreviation marker) contains a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation.
<am>
<g ref="#b-er"/>
</am>
<ex>er</ex>
</choice>y <choice>
<am>
<g ref="#b-per"/>
</am>
<ex>per</ex>
</choice>sone ...
As implied in the preceding discussion, making decisions about which of these various methods of representing abbreviation to use will form an important part of an encoder's practice. As a rule, the abbr and am elements should be preferred where it is wished to signify that the content of the element is an abbreviation, without necessarily indicating what the abbreviation may stand for. The ex and expan elements should be used where it is wished to signify that the content of the element is not present in the source but has been supplied by the transcriber, without necessarily indicating the abbreviation used in the original. The decision as to which course of action is appropriate may vary from abbreviation to abbreviation; there is no requirement that the same system be used throughout a transcription, although doing so will generally simplify processing. The choice is likely to be a matter of editorial policy. If the highest priority is to transcribe the text literatim (letter by letter), while indicating the presence of abbreviations, the choice will be to use abbr or am throughout. If the highest priority is to present a reading transcription, while indicating that some letters or words are not actually present in the original, the choice will be to use ex or expan throughout.
the final d could signify the plural ending (-es, -is, -ys>) but the
singular <hi rend="it">goode</hi> was used with the meaning <q>property</q>,
<q>wealth</q>, at this time (v. examples quoted in OED, sb. Good, C. 7, b,
c, d and 8 spec.)</note>
<choice>
<sic>goo<abbr>ɗ</abbr>
</sic>
<expan resp="#mp" cert="high">good<ex>e</ex>
</expan>
</choice> I was
welbeloued
If more than one expansion for the same abbreviation is to be recorded, multiple notes may be supplied. It may also be appropriate to use the markup for critical apparatus; an example is given in section 13.3 Using Apparatus Elements in Transcriptions.
TEI: Correction and Conjecture⚓︎12.3.1.3 Correction and Conjecture
ostendimus quod nutrimentum et
<choice>
<sic>angues</sic>
<corr>augens</corr>
</choice>.
Note that the corr element is used to provide a corrected form which is not present in the source; in the case of a correction made in the source itself, whether scribal, authorial, or by some other hand, the add, del, and subst elements described in 12.3.1.4 Additions and Deletions should be used.
the word create for we create nothing <supplied>we</supplied> develope.
As with expan and abbr, the choice as to whether to record simply that there is an apparent error, or simply that a correction has been applied, or to record both possible readings within a choice element is left to the encoder. The decision is likely to be a matter of editorial policy, which might be applied consistently throughout or decided case by case. If the highest priority is to present an uncorrected transcription while noting perceived errors in the original, the choice will typically be to use only sic throughout. If the highest priority is to present a reading transcription, while indicating that perceived errors in the original have been corrected, the choice will be to use only corr throughout.
Further information may be attached to instances of these elements by the note element and resp and cert attributes. Instances of these elements may also be classified according to any convenient typology using the type attribute.
membres maad, of generacioun And of so parfit wis a <choice xml:id="corr117">
<sic>wight</sic>
<corr>wright</corr>
</choice> ywroght?
<!-- ... -->
<note target="#corr117">This emendation of the Hengwrt copy text, based on a Latin
source and on the reading of three late and usually unauthoritative
manuscripts, was proposed by E. Talbot Donaldson in
<bibl>
<title>Speculum</title> 40 (1965) 626–33.</bibl>
</note>
<!-- somewhere in the header ... --><name xml:id="ETD">E Talbot Donaldson</name>
<!-- ... --> And of so parfit wis a <choice>
<sic>wight</sic>
<corr resp="#ETD" cert="medium">wright</corr>
</choice> ywroght?
<sic>mens</sic>
<corr>iners</corr>
</choice> que nutu dei gesta
sunt ... unde esset uiriliter
<choice xml:id="sic-2">
<corr>uegetata</corr>
<sic>negata</sic>
</choice>
graphically what the scribe should be copying but which does not make sense in
the context.</note>
<sic>mens</sic>
<corr type="graphSubs">iners</corr>
</choice> que nutu dei gesta sunt ... unde
esset uiriliter
<choice>
<corr type="graphSubs">uegetata</corr>
<sic>negata</sic>
</choice>
<sic>mens</sic>
<corr type="graphSubs">iners</corr>
<corr type="reversal">inres</corr>
</choice> que
nutu dei gesta sunt ...
<p>The following codes are used to categorize corrections identified in this
transcription: <list type="gloss">
<label>graphSubs</label>
<item>Substitution of a more familiar word which resembles graphically
what the scribe should be copying but which does not make sense in the
context.</item>
<!-- ... -->
</list>
</p>
</correction>
For a given project, it may well be desirable to limit the possible values for the type or subtype attributes automatically. This is easily done but requires customization of the TEI system using techniques described in 24.3 Customization, in particular 24.3.1.3 Modification of Attribute and Attribute Value Lists, which should be consulted for further information on this topic.
parfit wis a <choice>
<sic>wight</sic>
<corr resp="#mp" source="#Gg">wyf</corr>
</choice> ywroght?
Gg
. Each witness will be represented either by a witness element (see 13.1 The Apparatus Entry, Readings, and Witnesses) or more fully by an msDesc element (see 11 Manuscript Description):
<msIdentifier>
<settlement>Cambridge</settlement>
<repository>University Library</repository>
<idno>Gg.1. 27</idno>
</msIdentifier>
<!-- further description of the manuscript here -->
</msDesc>
<rdg wit="#Hg">wight</rdg>
<rdg wit="#Ln #Ry2 #Ld">wright</rdg>
<rdg wit="#Gg">wyf</rdg>
</app>
<rdg wit="#Hg">wight</rdg>
<rdg wit="#Ln #Ry2 #Ld">
<corr resp="#ETD">wright</corr>
</rdg>
<rdg wit="#Gg">
<corr resp="#mp">wyf</corr>
</rdg>
</app>
Like the resp attribute, the cert attribute may be used with both corr and rdg elements. When used on the rdg element, these attributes indicate confidence in and responsibility for identifying the reading within the sources specified; when used on the corr element they indicate confidence in and responsibility for the use of the reading to correct the base text. If no other source is indicated (either by the source attribute, or by the wit attribute of a parent rdg), the reading supplied within a corr has been provided by the person indicated by the resp attribute.
If it is desired to express certainty of or responsibility for some other aspect of the use of these elements, then the mechanisms discussed in chapter 22 Certainty, Precision, and Responsibility may be found useful. See also 12.3.2.2 Hand, Responsibility, and Certainty Attributes for further discussion of the issues of certainty and responsibility in the context of transcription.
TEI: Additions and Deletions⚓︎12.3.1.4 Additions and Deletions
Additions and deletions observed in a source text may be described using the following elements:
- add (addition) contains letters, words, or phrases inserted in the source text by an author, scribe, or a previous annotator or corrector.
- addSpan (added span of text) marks the beginning of a longer sequence of text added by an author, scribe, annotator or corrector (see also add).
- del (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, or a previous annotator or corrector.
- delSpan (deleted span of text) marks the beginning of a longer sequence of text deleted, marked as deleted, or otherwise signaled as superfluous or spurious by an author, scribe, annotator, or corrector.
Of these, add and del are included in the core module, while addSpan and delSpan are available only when using the module defined in this chapter. These particular elements are members of the att.spanning class, from which they inherit the following attribute:
- att.spanning provides attributes for elements which delimit a span of text by pointing mechanisms
rather than by enclosing it.
spanTo indicates the end of a span initiated by the element bearing this attribute.
Further characteristics of each addition and deletion, such as the hand used, its effect (complete or incomplete, for example), or its position in a sequence of such operations may conveniently be recorded as attributes of these elements, all of which are members of the att.transcriptional class:
- att.transcriptional provides attributes specific to elements encoding authorial or scribal intervention
in a text when transcribing manuscript or similar sources.
seq (sequence) assigns a sequence number related to the order in which the encoded features carrying this attribute are believed to have occurred. status indicates the effect of the intervention, for example in the case of a deletion, strikeouts which include too much or too little text, or in the case of an addition, an insertion which duplicates some of the text already present. Sample values include: 1] duplicate; 2] duplicate-partial; 3] excessStart; 4] excessEnd; 5] shortStart; 6] shortEnd; 7] partial; 8] unremarkable hand [att.written] points to a handNote element describing the hand considered responsible for the content of the element concerned.
at first sight. Others — and here is one of them — <add hand="#mb">do ever</add>
improve by recognition [...]
<handNote xml:id="mb">Max Beerbohm
holograph</handNote>
<handNote xml:id="dhl">D H Lawrence holograph</handNote>
If deletions are classified systematically, the type attribute may be useful to indicate the classification; when they are classified by the manner in which they were effected, or by their appearance, however, this will lead to a certain arbitrariness in deciding whether to use the type or the rend attribute to hold the information. In general, it is recommended that the rend attribute be used for description of the appearance or method of deletion, and that the type attribute be reserved for higher level or more abstract classifications.
RG
somewhere:
<person xml:id="RG">
<!-- information about Robert Graves here -->
</person>
</listPerson>
dictionary so much as a corpus of precedents <del hand="#RG">in the</del>:
current, obsolete, <add hand="#RG" place="above">cant,</add> cataphretic and
nonce-words are all included.
<add hand="#RG" place="above">for an abridgement</add>
</del> in explanation...
<subst>
<add>T</add>
<del>t</del>
</subst>he expressed
The add and del elements defined in the core module suffice only for the description of additions and deletions which fit within the structure of the text being transcribed, that is, which each deletion or addition is completely contained by the structural element (paragraph, line, division) within which it occurs. Where this is not the case, for example because an individual addition or deletion involves several distinct structural subdivisions, such as poems or prose items, or otherwise crosses a structural boundary in the text being encoded, special treatment is needed. The addSpan and delSpan elements are provided by this module for that purpose. (For a general discussion of the issue see further 21 Non-hierarchical Structures).
scribe="HelgiÓlafsson"/>
<!-- ... -->
<body>
<div>
<!-- text here -->
</div>
<addSpan n="added gathering" hand="#heol"
spanTo="#p025"/>
<div>
<!-- text of first added poem here -->
</div>
<div>
<!-- text of second added poem here -->
</div>
<div>
<!-- text of third added poem here -->
</div>
<div>
<!-- text of fourth added poem here -->
</div>
<anchor xml:id="p025"/>
<div>
<!-- more text here -->
</div>
</body>
<delSpan spanTo="#EPdelEnd" resp="#EP"
rend="strikethrough"/>
<l>To where Saint Mary Woolnoth kept the time,</l>
<l>With a dead sound on the final stroke of nine.</l>
<anchor xml:id="EPdelEnd"/>
<l>There I saw one I knew, and stopped him, crying "Stetson!</l>...
<delSpan rend="verticalStrike"
spanTo="#delend01"/> Tis moonlight
<del>upon</del>
<add>over</add> Oman's sky
</l>
<l>Her isles of pearl look lovelily<anchor xml:id="delend01"/>
</l>
The text deleted must be at least partially legible, in order for the encoder to be able to transcribe it. If all of part of it is not legible, the gap element should be used to indicate where text has not been transcribed, because it could not be. The unclear element described in section 12.3.3.1 Damage, Illegibility, and Supplied Text may be used to indicate areas of text which cannot be read with confidence. See further section 12.3.1.7 Text Omitted from or Supplied in the Transcription and section 12.3.3.1 Damage, Illegibility, and Supplied Text.
TEI: Substitutions⚓︎12.3.1.5 Substitutions
Substitution of one word or phrase for another is perhaps the most common of all phenomena requiring special treatment in transcription of primary textual sources. It may be simply one word written over the top of another, or deletion of one word and its replacement by another written above it by the same hand on the same occasion; the deletion and replacement may be done by different hands at different times; there may be a long chain of substitutions on the same stretch of text, with uncertainty as to the order of substitution and as to which of many possible readings should be preferred.
- subst (substitution) groups one or more deletions (or surplus text) with one or more additions when the combination is to be regarded as a single intervention in the text.
- substJoin (substitution join) identifies a series of possibly fragmented additions, deletions, or other revisions on a manuscript that combine to make up a single intervention in the text.
<delSpan rend="verticalStrike"
spanTo="#delend02"/> Tis moonlight
<subst>
<del>upon</del>
<add>over</add>
</subst> Oman's sky
</l>
<l>Her isles of pearl look lovelily<anchor xml:id="delend02"/>
</l>
with <subst>
<del seq="1">this</del>
<del seq="2">
<add seq="1">such a</add>
</del>
<add seq="2">a</add>
</subst> system, to appreciate its advantages.
A special case of a substitution may consist of a superfluous word or phrase that is silently replaced by some addition. E.g. a scribe abandons a word (without indicating it should be deleted), and then writes a replacement word immediately after. Here the encoder may interpret this as an ‘unmarked’ deletion which can then be combined with a corresponding addition to a substitution.
fann'd
<substJoin target="#change1 #change2"/>
<l>
<subst>
<del>Helping the worst amongst us</del>
<add>Dragging the worst amongt
us</add>
</subst>, who'd no boots
</l>
<l>But limped on, blood-shod. All went lame; <subst>
<del status="shortEnd">half-</del>
<add>all</add>
</subst> blind;</l>
<l>Drunk with fatigue ; deaf even to the hoots</l>
<l>Of tired, outstripped <del>fif</del> five-nines that dropped
behind.</l>
- the authorial slip (amongt for amongst) is retained without comment.
- the other two authorial corrections are marked as substitutions, each combining a deletion and an addition.
- the false start fif in the last line is simply marked as a deletion;
<rdg varSeq="1">
<del>this</del>
</rdg>
<rdg varSeq="2">
<del>
<add>such a</add>
</del>
</rdg>
<rdg varSeq="3">
<add>a</add>
</rdg>
</app> system, to appreciate its advantages.
TEI: Cancellation of Deletions and Other Markings⚓︎12.3.1.6 Cancellation of Deletions and Other Markings
An author or scribe may mark a word or phrase in some way, and then on reflection decide to cancel the marking. For example, text may be marked for deletion and the deletion then cancelled, thus restoring the deleted text. Such cancellation may be indicated by the restore element:
- restore (restore) indicates restoration of text to an earlier state by cancellation of an editorial or authorial marking or instruction.
This element bears the same attributes as the other transcriptional elements. These may be used to supply further information such as the hand in which the restoration is carried out, the type of restoration, and the person responsible for identifying the restoration as such, in the same way as elsewhere.
Another feature commonly encountered in manuscripts is the use of circles, lines, or arrows to indicate transposition of material from one point in the text to another. No specific markup for this phenomenon is proposed at this time. Such cases are most simply encoded as additions at the point of insertion and deletions at the point of encirclement or other marking.
TEI: Text Omitted from or Supplied in the Transcription⚓︎12.3.1.7 Text Omitted from or Supplied in the Transcription
Where text is not transcribed, whether because of damage to the original, or because it is illegible, or for some other reason such as editorial policy, the gap core element may be used to register the omission; where such text is transcribed, but the editor wishes to indicate that they consider it to be superfluous, for example because it is an inadvertent scribal repetition or an interpolation from another source, the surplus element may be used in preference. Where the editor believes text to be interpolated but genuine, the secl element may be used instead. Where text not present in the source is supplied (whether conjecturally or from other witnesses) to fill an apparent gap in the text, the supplied element may be used.
- gap (gap) indicates a point where material has been omitted in a transcription, whether
for editorial reasons described in the TEI header, as part of sampling practice, or
because the material is illegible, invisible, or inaudible.
reason (reason) gives the reason for omission. Suggested values include: 1] cancelled (cancelled); 2] deleted (deleted); 3] editorial (editorial); 4] illegible (illegible); 5] inaudible (inaudible); 6] irrelevant (irrelevant); 7] sampling (sampling) agent (agent) in the case of text omitted because of damage, categorizes the cause of the damage, if it can be identified. Sample values include: 1] rubbing (rubbing); 2] mildew (mildew); 3] smoke (smoke) - surplus (surplus) marks text present in the source which the editor believes to be superfluous
or redundant.
reason one or more words indicating why this text is believed to be superfluous, e.g. repeated, interpolated etc. - secl (secluded text) Secluded. Marks text present in the source which the editor believes
to be genuine but out of its original place (which is unknown).
reason one or more words indicating why this text has been secluded, e.g. interpolated etc. - supplied (supplied) signifies text supplied by the transcriber or editor for any reason; for
example because the original cannot be read due to physical damage, or because of
an obvious omission by the author or scribe.
reason one or more words indicating why the text has had to be supplied, e.g. overbinding, faded-ink, lost-folio, omitted-in-original.
unit="word"/>Sydney Smith
As noted above, the gap element should only be used where text has not been transcribed. If partially legible text has been transcribed, one of the elements damage and unclear should be used instead (these elements are described in section 12.3.3.1 Damage, Illegibility, and Supplied Text); if the text is legible and has been transcribed, but the editor wishes to indicate that they regard it is superfluous or redundant, then the element surplus may be used in preference to the core element sic used to indicate text regarded as erroneous.
Amongst the many examples cited in Hans Krummrey & Silvio Panciera's classic text on the editing of epigraphic inscriptions is the following. In a late classical inscription, the form ‘dedikararunt’ is encountered. The editor may choose any of the following three possibilities:
- mark this as an erroneous form
- additionally supply a corrected form
- indicate that the erroneous form contains surplus characters which the editor wishes to suppress
tradimento</surplus>
</l>
<l n="5">sì com' l'uccellator prende l'uccello</l>
<gap/>
<l n="43">e lettere dintorno che diriano <surplus reason="interpolated">in questa
guisa</surplus>
</l>
<l n="44">Più v'amo, dëa, che non faccio Deo</l>
<supplied reason="illegible" resp="#msm"
source="#Ry2">very humble
Servt</supplied> Sydney Smith
TEI: Hands and Responsibility⚓︎12.3.2 Hands and Responsibility
This section discusses in more detail the representation of aspects of responsibility perceived or to be recorded for the writing of a primary source. These include points at which one scribe takes over from another, or at which ink, pen, or other characteristics of the writing change. A discussion of the usage of the hand, resp, and cert attributes is also included.
TEI: Document Hands⚓︎12.3.2.1 Document Hands
For many text-critical purposes it is important to signal the person responsible (the hand) for the writing of a whole document, a stretch of text within a document, or a particular feature within the document. A hand, as the name suggests, need not necessarily be identified with a particular known (or unknown) scribe or author; it may simply indicate a particular combination of writing features recognized within one or more documents. The examples given above of the use of the hand attribute with coding of additions and deletions illustrate this.
The handNote element is used to provide information about each hand distinguished within the encoded document.
- handNote (note on hand) describes a particular style or hand distinguished within a manuscript.
A handNote element, with an identifier given by its xml:id attribute, may appear in either of two places in the TEI header, depending on which modules are included in a schema. When the transcr module defined by the present chapter is used, the element handNotes is available, within the profileDesc element of the TEI header, to hold one or more handNote elements. When the msdescription module defined in chapter 11 Manuscript Description is included, the handDesc element described in 11.7.2 Writing, Decoration, and Other Notations also becomes available as part of a structured manuscript description. The encoder may choose to place handNote elements identifying individual hands in either location without affecting their accessibility since the element is always addressed by means of its xml:id attribute. The handDesc element may be more appropriate when a full cataloguing of each manuscript is required; the handNotes element if only a brief characterization of each hand is needed. It is also possible to use the two elements together if, for example, the handDesc element contains a single summary describing all the hands discursively, while the handNotes element gives specific details of each. The choice will depend on individual encoders' priorities.
As shown above, the hand attribute is available on several elements to indicate the hand in which the content of the element (usually a deletion or addition) is carried out. The handShift element may also be used within the body of a transcription to indicate where a change of hand is detected for whatever reason.
- handShift (handwriting shift) marks the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint.
A handShift element can be used to indicate a change of hand even within an element with a hand attribute. The text following the handShift must be considered to be in the new hand.
Both handShift and handNote are members of the att.handFeatures class, and thus share the following attributes:
- att.handFeatures provides attributes describing aspects of the hand in which a manuscript is written.
scribe gives a name or other identifier for the scribe believed to be responsible for this hand. script characterizes the particular script or writing style used by this hand, for example secretary, copperplate, Chancery, Italian, etc. scribeRef points to a full description of the scribe concerned, typically supplied by a person element elsewhere in the description. scriptRef points to a full description of the script or writing style used by this hand, typically supplied by a scriptNote element elsewhere in the description. medium describes the tint or type of ink, e.g. brown, or other writing medium, e.g. pencil. scope specifies how widely this hand is used in the manuscript.
A single hand may employ different writing styles and inks within a document, or may change character. For example, the writing style might shift from ‘anglicana’ to ‘secretary’, or the ink from blue to brown, or the character of the hand may change. Simple changes of this kind may be indicated by assigning a new value to the appropriate attribute within the handShift element. It is for the encoder to decide whether a change in these properties of the writing style is so marked as to require treatment as a distinct hand.
Where such a change is to be identified, the new attribute indicates the hand applicable to the material following the handShift. The sequence of such handShift elements will often, but not necessarily, correspond with the order in which the material was originally written. Where this is not the case, the facilities described in section 12.7 Identifying Changes and Revisions may be found helpful.
As might be expected, a single hand may also vary renditions within the same writing style, for example medieval scribes often indicate a structural division by emboldening all the words within a line. Such changes should be indicated by use of the rend attribute, in the same manner as underlining, emboldening, font shifts, etc. are represented in transcription of a printed text, rather than by introducing a new handShift element.
<handShift medium="greenish-ink"/>
<l>And if the cattes skynne be slyk <handShift medium="black-ink"/> and
gaye</l>
<handNote xml:id="h1" script="copperplate"
medium="brown-ink">Carefully written
with regular descenders</handNote>
<handNote xml:id="h2" script="print"
medium="pencil">Unschooled
scrawl</handNote>
</handNotes>
more introduced and Established in this Parish according to the Rules and
Ceremonies of the Church of England and as under a good Consciencious and sober
Curate there would and ought to be <handShift new="#h2" resp="#das"/> and for that
purpose the parishioners pray
When a more precise or nuanced discussion of the writing in a manuscript is required, the handNote and scriptNote elements discussed in 11.7.2 Writing, Decoration, and Other Notations should be used. Either element may serve as the target for a handShift.
TEI: Hand, Responsibility, and Certainty Attributes⚓︎12.3.2.2 Hand, Responsibility, and Certainty Attributes
<choice>
<sic>One</sic>
<corr resp="#FB">one</corr>
</choice> must have lived ...
<!-- elsewhere -->
<respStmt xml:id="FB">
<resp>editorial changes</resp>
<name>Fredson Bowers</name>
</respStmt>
<respStmt xml:id="WJ">
<resp>authorial changes</resp>
<name>William James</name>
</respStmt>
The resp attribute, by contrast, indicates the person responsible for deciding to mark up this part of the text with this particular element. In the case of the add element, for example, the resp attribute will indicate the responsibility for identifying that the addition is indeed an addition, and also (if the hand attribute is supplied) to which hand it should be attributed. In this case, Bowers is credited with identifying the hand as that of William James. In the case of the corr element, the resp attribute indicates who is responsible for supplying the intellectual content of the correction reported in the transcription: here, Bowers' correction of ‘One’ to ‘one’. In the case of a deletion, the resp attribute will similarly indicate who bears responsibility for identifying or categorizing the deletion itself, while other attributes (hand most obviously) attribute responsibility for the deletion itself. It should be noted that the source attribute may be used in a simiilar fashion to indicate, for example, when an encoding decision is based on the work of a previous editor or on an article. In that case, the source would point to a bibl in the bibliography.
In cases where both the resp and cert attributes are defined for a particular element, the two attributes refer to the same aspect of the markup. The one indicates who is intellectually responsible for some item of information, the other indicates the degree of confidence in the information. Thus, for a correction, the resp attribute signifies the person responsible for supplying the correction, while the cert attribute signifies the degree of editorial confidence felt in that correction. For the expansion of an abbreviation, the resp attribute signifies the person responsible for supplying the expansion and the cert attribute signifies the degree of editorial confidence felt in the expansion.
This close definition of the use of the resp and cert attributes with each element is intended to provide for the most frequent circumstances in which encoders might wish to make unambiguous statements regarding the responsibility for and certainty of aspects of their encoding. The resp and cert attributes, as so defined, give a convenient mechanism for this. However, there will be cases where it is desirable to state responsibility for and certainty concerning other aspects of the encoding. For example, one may wish in the case of an apparent addition to state the responsibility for the use of the add element, rather than the responsibility for identifying the hand of the addition. It may also be that one editor may make an electronic transcription of another editor's printed transcription of a manuscript text—here, one will wish to assign layers of responsibility, so as to allow the reader to determine exactly what in the final transcription was the responsibility of each editor. In these complex cases of divided editorial responsibility for and certainty concerning the content, attributes, and application of a particular element, the more general mechanisms for representing certainty and responsibility described in chapter 22 Certainty, Precision, and Responsibility should be used.
<corr xml:id="c117">wright</corr>
<sic>wight</sic>
</choice>
<certainty target="#c117" locus="value"
degree="0.7"/>
<respons target="#c117" locus="value"
resp="#ETD"/>
The above discussion supposes that in each case an encoder is able to specify exactly what it is that one wishes to state responsibility for and certainty about. Situations may arise when an encoder wishes to make a statement concerning certainty or responsibility but is unable or unwilling to specify so precisely the domain of the certainty or responsibility. In these cases, the note element may be used with the type attribute set to ‘cert’ or ‘resp’ and the content of the note giving a prose description of the state of affairs.
TEI: Damage and Conjecture⚓︎12.3.3 Damage and Conjecture
The carrier medium of a primary source may often sustain physical damage which makes parts of it hard or impossible to read. In this section we discuss elements which may be used to represent such situations and give recommendations about how these should be used in conjunction with the other related elements introduced previously in this chapter.
TEI: Damage, Illegibility, and Supplied Text⚓︎12.3.3.1 Damage, Illegibility, and Supplied Text
The gap and supplied elements described above (section 12.3.1.7 Text Omitted from or Supplied in the Transcription) should be used with appropriate attributes where the degree of damage or illegibility in a text is such that nothing can be read and the text must be either omitted or supplied conjecturally or from one or more other sources. In many cases, however, despite damage or illegibility, the text may yet be read with reasonable confidence. In these cases, the following elements should be used:
- damage (damage) contains an area of damage to the text witness.
- damageSpan (damaged span of text) marks the beginning of a longer sequence of text which is damaged in some way but still legible.
As members of the class att.damaged, these elements bear the following attributes:
- att.damaged provides attributes describing the nature of any physical damage affecting a reading.
hand [att.written] points to a handNote element describing the hand considered responsible for the content of the element concerned. agent categorizes the cause of the damage, if it can be identified. Sample values include: 1] rubbing; 2] mildew; 3] smoke degree provides a coded representation of the degree of damage, either as a number between 0 (undamaged) and 1 (very extensively damaged), or as one of the codes high, medium, low, or unknown. The damage element with the degree attribute should only be used where the text may be read with some confidence; text supplied from other sources should be tagged as supplied. group assigns an arbitrary number to each stretch of damage regarded as forming part of the same physical phenomenon.
The class att.damaged is a subclass of the class att.dimensions, itself a subclass of the class att.ranging. Consequently these elements also therefore bear at least the following attributes:
- att.dimensions provides attributes for describing the size of physical objects.
extent indicates the size of the object concerned using a project-specific vocabulary combining quantity and units in a single string of words. unit names the unit used for the measurement Suggested values include: 1] cm (centimetres); 2] mm (millimetres); 3] in (inches); 4] line; 5] char (characters) quantity specifies the length in the units specified - att.ranging provides attributes for describing numerical ranges.
min where the measurement summarizes more than one observation or a range, supplies the minimum value observed. max where the measurement summarizes more than one observation or a range, supplies the maximum value observed. atLeast gives a minimum estimated value for the approximate measurement. atMost gives a maximum estimated value for the approximate measurement.
From the att.spanning class, damageSpan inherits the following additional attribute:
- att.spanning provides attributes for elements which delimit a span of text by pointing mechanisms
rather than by enclosing it.
spanTo indicates the end of a span initiated by the element bearing this attribute.
The following examples all refer to the recto of folio 5 of the unique manuscript of the Elder Edda. Here, the manuscript of Vóluspá has been damaged through irregular rubbing so that letters in various places are obscured and in some cases cannot be read at all.
<!-- ... -->
<pb n="5r"/>
<damageSpan agent="rubbing"
extent="whole leaf" spanTo="#damageEnd"/>
</p>
<p> [...] </p>
<p> [...] <pb n="5v" xml:id="damageEnd"/>
</p>
<l>Moves <damage agent="water" group="1">on: nor all your</damage> Piety nor
Wit</l>
<l>
<damageSpan agent="water" group="1"
spanTo="#washOut"/>Shall lure it back to
cancel half a Line,
</l>
<l>Nor all your Tears wash <anchor xml:id="washOut"/> out a Word of it</l>
A more general solution to this problem is provided by the join element discussed in 17.7 Aggregation which may be used to link together arbitrary elements of any kind in the transcription. Here, several phenomena of illegibility and conjecture all result from a single cause: an area of damage to the text caused by rubbing at various points. The damage is not continuous, and affects the text at irregular points. In cases such as this, the join element may be used to indicate which tagged features are part of the same physical phenomenon.
<supplied source="#msm">aga</supplied>
</damage>
yndisniota
dreki fliugandi naþr frann neþan <gap reason="illegible" agent="rubbing"
quantity="4" unit="letter"/>
<unclear>and the proof of this is</unclear>
<gap/>
<unclear>margin</unclear>
</damage>
TEI: Use of the gap, del, damage, unclear, and supplied Elements in Combination⚓︎12.3.3.2 Use of the <gap>, <del>, <damage>, <unclear>, and <supplied> Elements in Combination
The gap, damage, unclear, supplied, and del elements may be closely allied in their use. For example, an area of damage in a primary source might be encoded with any one of the first four of these elements, depending on how far the damage has affected the readability of the text. Further, certain of the elements may nest within one another. The examples given in the last sections illustrate something of how these elements are to be distinguished in use. This may be formulated as follows:
- where the text has been rendered completely illegible by deletion or damage and no text is supplied by the editor in place of what is lost: place an empty gap element at the point of deletion or damage. Note that the gap could be wrapped in a del or damage element. Use the reason attribute to state the cause (damage, deletion, etc.) of the loss of text.
- where the text has been rendered completely illegible by deletion or damage and text is supplied by the editor in place of what is lost: surround the text supplied at the point of deletion or damage with the supplied element. Use the reason attribute to state the cause (damage, deletion, etc.) of the loss of text leading to the need to supply the text.
- where the text has been rendered partly illegible by deletion or damage so that the text can be read but without perfect confidence: transcribe the text and surround it with the unclear element. Use the reason attribute to state the cause (damage, deletion, etc.) of the uncertainty in transcription and the cert attribute to indicate the confidence in the transcription.
- where there is deletion or damage but at least some of the text can be read with perfect confidence: transcribe the text and surround it with the del element (for deletion) or the damage element (for damage). Use appropriate attribute values to indicate the cause and type of deletion or damage. Observe that the degree attribute on the damage element permits the encoding to show that a letter, word, or phrase is not perfectly preserved, though it may be read with confidence.
- where there is an area of deletion or damage and parts of the text within that area can be read with perfect confidence, other parts with less confidence, other parts not at all: in transcription, surround the whole area with the del element (for deletion; or the delSpan element where it crosses a structural boundary); or the damage element (for damage). Text within the damaged area which can be read with perfect confidence needs no further tagging. Text within the damaged area which cannot be read with perfect confidence may be surrounded with the unclear element. Places within the damaged area where the text has been rendered completely illegible and no text is supplied by the editor may be marked with the gap element. For each element, one may use appropriate attribute values to indicate the cause and type of deletion or damage and the certainty of the reading.
The rules for combinations of the add and del elements, and for the interpretation of such combinations, are similar:
- if one add element (with identifier ADD1) contains another (with identifier ADD2), then the addition ADD1 was first made to the text, and later a second addition (ADD2) was made within that added text: This is the text <add xml:id="ADD1">with some added <add xml:id="ADD2">(interlinear!)</add> material</add> as
written. - if one del element contains another, and the seq attribute does not indicate otherwise, it should be assumed that the inner deletion was made before the enclosing one. In the following example, the word redundant was deleted before a second deletion removed the entire passage:
- if a del element contains an add element, the normal interpretation will be that an addition was made within a passage which was later deleted in its entirety:
- if an add element contains a del element, the normal interpretation will be that a deletion was made from a passage which had earlier been added:
- When some text has been blackened out so thoroughly that can no longer be read, the
encoding should be: runs out the door in <gap extent="1 word"/> shirt
- For consistency, one might prefer to encode the deletion as such, using del, and containing a gap, as in the following example:
- This is something that would be necessary if one wanted to encode a subst including an illegible deletion:
- If some parts of the deleted text are readable, and other parts unreadable, it should
be encoded as in the following example: Billy in The <del>
<gap extent="1 character"/>ng<gap extent="2 characters"/>
</del> Silver Dollar..
TEI: Marking up the Writing Process⚓︎12.3.4 Marking up the Writing Process
Modifications of various kinds (correction, addition, deletion, etc.) are frequently found within a single document, and may also be inferred when different documents are compared, although it may be an open question as to whether inter-document discrepancies should be regarded in the same way as intra-document alterations. When two witnesses are collated, we may observe that a word present in one is missing from the other: this does not necessarily imply that the word was added to the first witness, nor that it was deleted from the other.
In this section we discuss a number of elements which may be useful when attempting to record traces of the writing process within a document.
TEI: Generic Modification⚓︎12.3.4.1 Generic Modification
Most, if not all, transcriptional elements imply a certain level of semantic interpretation. For instance, using the add element to encode a word or phrase that occupies interlinear space involves a decision that it has been deliberately inserted as an addition rather than an alternative, and indeed a judgment that it was written after, rather than before, the other lines. Where it is felt desirable to keep the recording of ‘what is on the page’ entirely separate from ‘what is the editor’s interpretation’, the generic mod element may be preferred.
- mod represents any kind of modification identified within a single document.
This element simply indicates any kind of modification that has been identified in the document, without prejudice as to its function. Occurrences of the mod element may be categorized by means of their type attribute, and visual aspects of their appearance can be described by means of the rend attribute, but they provide no further interpretation of the function or intention of the passage so marked up. The spanTo attribute may be used to indicate the end of a modified passage if this extends across the boundaries of some other XML element, for example from the middle of one line tagged as a line to the middle of another line some distance further on in the document.
spanTo="#enduw"/>words with wavy
underline</line> <!-- more lines here -->
<line>wavy underlining finishes
here<anchor xml:id="enduw"/> more words</line>
The distinction between an example such as that above and the simple use of hi to mark the visual salience of the underlining (apart from the use of the spanTo attribute) is that hi does not imply that the visual effect being recorded is understood to represent some kind of modification.
TEI: Metamarks⚓︎12.3.4.2 Metamarks
By metamark we mean marks such as numbers, arrows, crosses, or other symbols introduced by the writer into a document expressly for the purpose of indicating how the text is to be read. Such marks thus constitute a kind of markup of the document, rather than forming part of the text.
- metamark contains or describes any kind of graphic or written signal within a document the
function of which is to determine how it should be read rather than forming part of
the actual content of the document.
function describes the function (for example status, insertion, deletion, transposition) of the metamark. target identifies one or more elements to which the metamark applies.
Unlike marginal notes or other additions to the text, metamarks are used by the writer to indicate a deliberate alteration of the writing itself, such as ‘move this passage over there’. An addition or annotation by contrast would typically concern some property of the passage other than its intended location or status within the text flow. A metamark may contain text, or some other graphic which the encoder wishes to represent, or it may simply consist of arrows, dots, lines etc. which the encoder simply describes.
The metamark element carries a function attribute which specifies the function of the metamark, using values such as reorder, flag, delete, insert or used. The passage to which the metamark applies may be indicated in either of two ways: the target attribute may be used to point to the element or elements containing the passage concerned, or the spanTo element may be used to point to a position in the document at which the passage concerned finishes. In the latter case, the metamark itself must be supplied at the position in the document where the passage concerned begins; in the former case it may be supplied at any convenient point. Both attributes should not be supplied.
The following example is taken from an 15th century legal book from the city of Göttingen, containing regulations of everyday life issued by the city council
<metamark function="flag" target="#zone-1"
change="#L2">lege</metamark>
<zone xml:id="zone-1" change="#L1">Ock en schullen de bruwere des hilgen dages
nicht over setten noch uppe den stillen fridach bruwen.</zone>
<addSpan spanTo="#endDel" change="#L2"/>
<zone>Noch nymande over setten, se en sehin denne erst, dat uppe den bonen neyn
stro noch, huw noch flaß ligghe, by pine eyner halven roden, deme bruwere so
wol alse dem bruwheren to murende.</zone>
<anchor xml:id="endDel"/>
The change attribute used here to indicate the sequencing of these various interventions is discussed below, in section 12.7 Identifying Changes and Revisions. The elements addSpan and delSpan are discussed in section 12.3.1.4 Additions and Deletions.
The metamark element may also be used to encode the symbols etc. often found in marked-up proofs such as the following, taken from the Walt Whitman archive:
In this example, the whole of what was originally the 14th section of the poem has been marked for deletion, both by horizontal and vertical lines, and by the proofreading mark resembling the ‘deleatur’ or ‘dele’ deletion symbol to left and right of the section. The deletion itself might be encoded by using the normal del or delSpan element, and the metamarks by the metamark element. This is quite a different case from that of the next example, in which the writer does not intend to suppress the content, but only to mark that it has been copied to another manuscript or reused.
This page contains internal deletions, additions, and retracings but these are semantically quite different from the apparent ‘deletion’ signalled by the larger of the two single vertical lines, which shows that the written material has been transferred or re-used, not deleted.
<metamark function="used" rend="line"
target="#X2"/>
<zone xml:id="X2">
<line>I am that halfgrown <add>angry</add> boy, fallen asleep</line>
<line>The tears of foolish passion yet undried</line>
<line>upon my cheeks.</line>
<!-- ... -->
<line>I pass through <add>the</add> travels and <del>fortunes</del> of
<retrace>thirty</retrace>
</line>
<line>years and become old,</line>
<line>Each in its due order comes and goes,</line>
<line>And thus a message for me comes.</line>
<line>The</line>
</zone>
<metamark function="used" target="#X2">Entered - Yes</metamark>
</surface>
In this embedded transcription example, we class as metamarks both the long vertical
line and the annotation ‘Entered - yes’. Both metamarks are assumed to indicate that the whole of the written zone with
identifier X2
is marked as having been used. metamark can be similarly used within text to encode the same phenomenon as part of a transcription that privileges logical
over physical and layout structures.
TEI: Fixation and Clarification⚓︎12.3.4.3 Fixation and Clarification
A writer may sometimes rewrite material a second time without significant change and in the same place. We consider this a distinct activity from addition as usually defined because no new textual material results; instead the status of existing material is reaffirmed. We may distinguish two variants of this: fixation where the first version was a tentative draft which is subsequently reaffirmed, for example by inking it over; and clarification, where the first version was badly written and has been rewritten for clarity. The element retrace is provided for both cases; its cause attribute may be used to distinguish these or other cases.
- retrace contains a sequence of writing which has been retraced, for example by over-inking, to clarify or fix it.
<retrace cause="unclear" change="#stage1">er</retrace>
</retrace> ...
The retrace element is used only for cases where text has been written multiple times. When metamarks and other markup-like strokes have been rewritten multiple times, the redo element described in the next section should be used in preference.
TEI: Confirmation, Cancellation, and Reinstatement of Modifications⚓︎12.3.4.4 Confirmation, Cancellation, and Reinstatement of Modifications
A writer may indicate that an alteration is itself to be altered: for example, a struck-through passage may be restored via a dotted underlining, or the underlining of a passage may be deleted by a wavy line.
The following elements are provided to represent these situations:
- redo indicates one or more cancelled interventions in a document which have subsequently
been marked as reaffirmed or repeated.
target points to one or more elements representing the interventions which are being reasserted. - undo indicates one or more marked-up interventions in a document which have subsequently
been marked for cancellation.
target points to one or more elements representing the interventions which are to be reverted or undone.
The element restore (12.3.1.6 Cancellation of Deletions and Other Markings) is provided for the comparatively simple case where a simple deletion is marked as having been subsequently cancelled. The undo element discussed here is more widely applicable and may be used for any kind of cancellation. It points to the element or elements which are being cancelled. These components need not be contiguous, provided that the cancellation is clearly a single act; each distinct act of cancellation requires a distinct undo element, however. Either of the attributes target or spanTo may be used to indicate the passages concerned.
Consider the following imaginary example:
We hypothesize that the text has gone through three states or changes, as follows:
- This is just some sample text, we need a real example.
- This is not a real example.
- This is just some text, not a real example.
<undo spanTo="#Xa" rend="dotted"
change="#s3"/>just some <anchor xml:id="Xa"/> sample <undo spanTo="#Xb" rend="dotted"
change="#s3"/>text, <anchor xml:id="Xb"/> we need
</del>
<add change="#s2">not</add> a real example.
<seg xml:id="X-a">just some</seg> sample <seg xml:id="X-b">text</seg>, we
need
</del>
<add change="#s2">not</add> a real example.
<undo target="#X-a #X-b" rend="dotted"
change="#s3"/>
TEI: Transpositions⚓︎12.3.4.5 Transpositions
A transposition occurs when metamarks are found in a document indicating that passages should be moved to a different position. Typically this may be done using arrows, asterisks or numbers, or other means. By definition the result of a transposition is not present in the document, and should not therefore be encoded, if the intention is to represent the actual appearance of the document. Instead, the following elements may be used to indicate the intended reordering:
- listTranspose supplies a list of transpositions, each of which is indicated at some point in a document typically by means of metamarks.
- transpose describes a single textual transposition as an ordered list of at least two pointers specifying the order in which the elements indicated should be re-combined.
<metamark rend="underline"
function="transposition" target="#ib01" place="above">2.</metamark> og
<seg xml:id="ib02">hör</seg>
<metamark rend="underline"
function="transposition" target="#ib02" place="above">1.</metamark>
<!-- either nearby or elsewhere, typically inside <profileDesc> or <standOff>: -->
<listTranspose>
<transpose>
<ptr target="#ib02"/>
<ptr target="#ib01"/>
</transpose>
</listTranspose>
<metamark function="transposition"
place="margin-left">2.)</metamark> thi da er du med Himmelen i Pagt; —
</line>
<line xml:id="ib4">
<metamark function="transposition"
place="margin-left">1.)</metamark> da kan du
Folkets Jøkelhjerter tine;
</line>
<!-- either nearby or elsewhere, typically inside <profileDesc> or <standOff>: -->
<listTranspose>
<transpose>
<ptr target="#ib4"/>
<ptr target="#ib3"/>
</transpose>
</listTranspose>
One or more listTranspose elements may be supplied either embedded within the text, in the profileDesc of the header, or in a standOff depending on local preference. Each listTranspose can contain one or more transpose elements, each of which defines a single transposition.
TEI: Alternative Readings⚓︎12.3.4.6 Alternative Readings
In this example two alternative readings are provided, but no preference is indicated. While the author apparently first composed the line ‘Alone before his native river -’, at some later point, he entertained the possibility of using the word beside instead of before. The manuscript supplies no indication of which word Moore favours at this point, although in fact, in the first printed edition of Lalla Rookh the word beside was chosen.
<add place="above" xml:id="alt2">beside</add> his native river —
<alt target="#alt1 #alt2" mode="excl"
weights="0 1"/>
The mode attribute here indicates that the two possible readings indicated by the target attribute are mutually exclusive. The weights attribute indicates the relative importance or preference to be attached to the two readings on a scale running from zero (most improbable) to one (most probable). In this case, we have a very strong preference for the second reading because this is the one that appears in the published version of the poem. The alt element is further discussed in section 17.8 Alternation.
TEI: Instant Corrections⚓︎12.3.4.7 Instant Corrections
The use of elements such as del and add necessarily implies that the modifications they indicate were made at some time after the original writing. An exception to this is where a false start or ‘instant’ correction has been identified: the author starts to write, and then immediately corrects what has been written.
The instant attribute defined by this module may be used on any element which is a member of
the att.editLike class to modify this default assumption. When the value of instant is set to true, the addition or deletion is considered to belong to the same change as its parent
element, while false
means some change later than that of its parent.
- The letter T is written and then immediately deleted
- The word The is written, deleted, and replaced by the word His
- The added word His is then deleted
- The initial letter i of the words iron necklace is overwritten with a capital I
<mod type="subst">
<del>The</del>
<add place="above">
<del rend="overstrike">His</del>
</add>
</mod>
<mod type="subst">
<del rend="overwritten">i</del>
<add place="superimposed">I</add>
</mod>ron necklace
TEI: Aspects of Layout⚓︎12.4 Aspects of Layout
<surfaceGrp type="quire" n="1">
<surfaceGrp type="leaf" n="1">
<surface type="recto" n="1r">
<!-- ... -->
</surface>
<surface type="verso" n="1v">
<!-- ... -->
</surface>
</surfaceGrp>
<surfaceGrp type="leaf" n="2">
<surface type="recto" n="2r">
<!-- ... -->
</surface>
<surface type="verso" n="2v">
<!-- ... -->
</surface>
</surfaceGrp>
<!-- other leaves in first quire -->
</surfaceGrp>
<!-- other quires here -->
</sourceDoc>
In some cases, it may be preferable to define surfaces corresponding with each two page opening, for example where it is clear that the writer regarded each such opening as a single writing surface, with written zones or other features crossing the page divide. An example is shown here:
The coloured lines added to this image indicate a number of zones of writing, colour coded to indicate the order in which they were written (purple, then green, then red). For example, the zone marked in red on the left contains a note referring to the purple zone on the right.
This approach assumes that the transcription will primarily be organized in the same way as the physical layout of the source, using embedded transcription elements. Alternatively, where the a non-embedded transcription has been provided, using the text element, it is still possible to record gathering breaks, page breaks, column breaks, line breaks etc in the source, using the elements described in section 3.11 Reference Systems. Detailed metadata about the physical make-up of a source will usually be summarized by the physDesc component of an msDesc element discussed in 11.7 Physical Description.
TEI: Space⚓︎12.4.1 Space
The author or scribe may have left space for a word, or for an initial capital, and for some reason the word or capital was never supplied and the space left empty. The presence of significant space in the text being transcribed may be indicated by the space element.
- space (space) indicates the location of a significant space in the text.
resp (responsible party) (responsible party) indicates the individual responsible for identifying and measuring the space.
Note that this element should not be used to mark normal inter-word space or the like.
god if wommen had writen storyes As <space quantity="7" unit="chars"/> han within her
oratoryes
<supplied reason="space" resp="#ETD"
source="#Hg">preestes</supplied> han within
her oratoryes
TEI: Lines⚓︎12.4.2 Lines
declaring I would prosecute by law — hindered a man's proceedings who <hi rend="underline">had obtained all the letters to Mr Boyd</hi>
by law — hindered a man's proceedings who <hi xml:id="cstart1" rend="underline">had
obtained all the letters to Mr Boyd</hi>
<!-- ... -->
<certainty target="#cstart1" locus="start"
degree="0.70">
<desc>may begin with previous word</desc>
</certainty>
Where the area of text marked overlaps other areas of text, for example crossing a structural division, one of the spanning mechanisms mentioned above must be used; for example where the line is thought to mark a deletion, the delSpan element may be used. Where it is desired simply to record the marking of a span of text in circumstances where it is not possible to surround the text with a hi element, the span element may be used with the rend or type attribute indicating the style of line-marking.
More work needs to be done on clarifying the treatment of other textual features marked by lines which might so overlap or nest. For example, in many Middle English manuscripts (e.g. the Jesus and Digby verse collections), marginal sidebars may indicate metrical structure: couplets may be linked in pairs, with the pairs themselves linked into stanzas. Or, marginal sidebars may indicate emphasis, or may point out a region of text on which there is some annotation: in many manuscripts of Chaucer's Wife of Bath's Prologue lines 655–8 are marked with nesting parentheses against which the scribe has written nota.
Such features could be captured by use of the note element, containing a prose description of the manuscript at this point, enhanced by a link to a visual representation (or facsimile) of the feature in question. For example, in the Chaucer example just cited, one may wish to record that the nota is written in the Hengwrt manuscript in the right margin against a single large left parenthesis bracketing the four lines, with two right parentheses in the right margin bracketing two overlapping pairs of lines: the first and third, the second and fourth. The note element allows us to record that the scribe wrote nota, but is not well-adapted to show that the nota points both at all four lines and at two pairs of lines within the four lines. The metamark element discussed in section 12.3.4.2 Metamarks above provides better facilities for this kind of complex annotation.
TEI: Transcription and Ruby⚓︎12.5 Transcription and Ruby
These Guidelines also provide special elements to support the encoding of ruby annotations, which are common in East Asian textual traditions. These elements provide a method of capturing a specific type of annotation, in addition to the generic methods like the note or interp elements. Both the specific and general methods should integrate well with the transcriptional elements described above, allowing authorial and scribal features to be captured in conjunction with ruby base text and annotations. See 3.4.2 Ruby Annotations for more information about these elements.
TEI: Headers, Footers, and Similar Matter⚓︎12.6 Headers, Footers, and Similar Matter
Such information as page numbers, signatures, or catchwords may be recorded in a specialized fw element provided for that purpose. Although the name derives from the term forme work, used in description of early printed documents (the ‘forme’ being the block used to hold movable type), the fw element may be used for such features of any document, written or printed. Note that the purpose of this element is to record page numbers etc. actually present in the document being encoded, not necessarily to provide a complete or accurate pagination of it.
Information about pagination etc. may also be provided using the n attribute of the pb or gb elements, or by other appropriate milestone elements, as further discussed in section 3.11 Reference Systems: since this information is usually provided by the encoder, it is not subject to the constraint that it should be present only if textually present in the source being encoded. In text-critical situations it may be useful to provide both a normalized version of the pagination and a representation of the catch-word or numbering, especially when the latter presents a variant reading, or is significant for compositor identification.
- fw (forme work) contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page.
The fw element may be used to encode any of the unchanging portions of a page forme, such as:
- running heads (whether repeated or changing on every page, or alternating pages)
- running footers
- page numbers
- catch-words
- other material repeated from page to page, which falls outside the stream of the text
It should not be used for marginal glosses, annotations, or textual variants, which should be tagged using gloss, note, or the text-critical tags described in chapter 13 Critical Apparatus, respectively.
<fw type="pageNum" place="top-right">29</fw>
<fw type="sig" place="bot-centre">E3</fw>
<fw type="catch" place="bot-right">TEMPLE</fw>
TEI: Identifying Changes and Revisions⚓︎12.7 Identifying Changes and Revisions
A major purpose of genetic editing is the identification of ‘revision campaigns’ or, more generally, changes. An editor may wish to regard a particular set of alterations (deletions, additions, substitutions, transpositions, etc.) or any other act of writing as a single object for which we use the general term change, to indicate both that one or more of such phenomena preceded or followed another and also to indicate that they are related in some way, for example that one is a consequence of the other. They might also wish to group together certain revisions, regardless of when they might have occurred, based on a variety of other shared characteristics (e.g., corrections of factual errors or revisions that incorporate suggestions made by a given reader). To document this we need:
- a system to assign phenomena to a particular ‘change’ as defined above
- a way to characterize each such change, in itself and in relation to others.
The element creation (within the TEI header profile description) contains all information relating to the genesis or production of a text. It may contain a listChange element which contains a number of change elements, one for each set of alterations identified:
- listChange groups a number of change descriptions associated with either the creation of a source
text or the revision of an encoded text.
ordered indicates whether the ordering of its child change elements is to be considered significant or not. - change (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file.
In the following example an editor has identified four distinct sets of alterations:
<creation>
<listChange ordered="true">
<change xml:id="ST-1">First stage, written in ink </change>
<change xml:id="ST-2">Second stage, with revisions written in the author's hand
using pencil</change>
<change xml:id="ST-3">Fixation of the pencilled revisions together with further
revisions in the author's hand using ink</change>
<change xml:id="ST-4">Additions in a different hand, probably at a later
stage</change>
</listChange>
</creation>
</profileDesc>
The listChange element carries an attribute ordered, which can take the values true
or false
(the default). The attribute specifies whether the order of child elements signifies
a temporal order for the revision campaigns which they document. In the example above,
the editor has asserted that the four sets distinguished are ordered chronologically
according to the order of the change elements. If necessary, listChange elements can be nested hierarchically. This may be helpful in two cases. Firstly
one can build up hypotheses about related revisions step-by-step, starting with change elements of smaller coverage, whose members are certainly related, and then in a
subsequent pass grouping these in turn, thereby extending their reach.
<creation>
<listChange>
<change xml:id="o">An unrelated change note</change>
<listChange xml:id="m">
<change xml:id="m1">Alterations on one manuscript page, certainly
related</change>
<change xml:id="m2">Alterations on another manuscript page, certainly
related</change>
</listChange>
<change xml:id="p">Another unrelated change note</change>
</listChange>
</creation>
</profileDesc>
A nested listChange element is also useful to indicate a partial ordering of change elements.
<change xml:id="ST1">The first stage</change>
<listChange ordered="false">
<!-- We have no information about the order of these changes, except that they both followed ST1 and preceded STX -->
<change xml:id="ST-rev1">A revision of the first stage</change>
<change xml:id="ST-rev2">Another revision of the first stage</change>
</listChange>
<change xml:id="STX">The last stage</change>
</listChange>
In addition to the possibility of being ordered by their sequence within a listChange element, change elements may carry a number of attributes from the att.datable class (period, when, notBefore, notAfter, from, and to) which allow each element to be dated as exactly or inexactly as necessary, in the same way as is currently possible for the TEI date element.
<creation>
<date notAfter="1816-07-18"/>
<listChange ordered="true">
<change xml:id="mod1" when="1816-07-16">The first draft of
<title>Persuasion</title>, completed by the date <date>July 16 1816</date>
which is written after the word <q>Finis</q> at <ref target="#pers-30">page
30</ref>.</change>
<change xml:id="mod2"
notBefore="1816-07-16">After the <date>16th of July</date>
Austen starts revision of the two final chapters, by rewriting the end and
adding a new zone (<ref target="#transp-1">pages 32-35</ref>) to be inserted at
<ref target="#insertion-p1">page 19</ref>. This stage is documented by the
deletion of the date (<date>July 16 1816</date>) at <ref target="#pers-30">page
30</ref>, and the addition of more text and of a new date (<date>July 18.
1816</date>) at <ref target="#pers-31">page 31</ref>
</change>
<change notBefore="1816-07-18">Before publication, after <date>July 18th,
1816</date> chapters 10-11 were broken into three chapters, 10, 11, 12, as
witnessed by the print.</change>
</listChange>
</creation>
</profileDesc>
Each change element, apart from declaring a distinct moment or phase in the creation of the document, may also contain references to other annotations contained within the teiHeader or in the document (as shown in the previous example). Such references, along with the textual content, are purely documentary. The association between a textual component and a change element is always made explicitly, either by using the target attribute on the change element to point to one or more textual elements, or by pointing from the element or elements concerned to the change element by means of their change attribute. If a change element is associated with some element, it is also associated with all of that element's children, unless otherwise indicated, for example by a new value for the change attribute.
<del>house</del>
<add>mouse</add>
</mod>.</line>
Elements such as add and del and the like carry an implied semantics concerning the order in which events in the writing of a document was carried out: something which is deleted must have been written before it was deleted; something which is added must have been added at a later stage of the writing. Even when a combination of such elements is used, the chronology can usually be inferred (see further 12.3.3.2 Use of the gap, del, damage, unclear, and supplied Elements in Combination). Explicit indication of the set of alterations to which some modification belongs is mostly useful in situations where all the alterations identified in a document are to be grouped, for example chronologically.
<creation>
<listChange ordered="true">
<change target="#zone_1 #subst_3">First stage, written in ink by a
scribe</change>
<change target="#zone_2 #mod_1 #line_1 #line_2 #subst_1 #subst_2 #subst_4 #delSpan_1">Revised by Goethe using pencil</change>
<change target="#redo_1 #redo_2 #redo_3 #subst_1 #subst_2 #delSpan_1 #add_1">Fixation of the revised passages and further revisions by Goethe using
ink</change>
</listChange>
</creation>
</profileDesc>
<sourceDoc>
<surface>
<zone xml:id="zone_1">
<line xml:id="line_1">
<handShift new="#g_bl"/>
<retrace hand="#g_t" xml:id="redo_1">Nun</retrace>
</line>
<line>
<handShift new="#jo_t"/>Ihr wanſtige Schuften mit den Feuerbacken</line>
<line xml:id="line_2">
<handShift new="#g_bl"/>
<retrace hand="#g_t" xml:id="redo_2">feiſt</retrace>
</line>
<line>Ihr glüht ſo recht vom Höllen Schwefel ſatt.</line> [...] </zone>
</surface>
</sourceDoc>
<text>
<body>
<l n="11656">
<subst xml:id="subst_1">
<del>Ihr</del>
<add>Nun</add>
</subst> wanſtige Schuften mit den Feuerbacken</l>
<l n="11657">Ihr glüht ſo recht vom Höllen Schwefel <subst xml:id="subst_2">
<del>ſatt</del>
<add>feiſt</add>
</subst>.</l>
</body>
</text>
The documentary transcription stresses the writing process, while the textual transcription emphasizes textual alterations. In either case, the change of writing activity associated with a particular feature in the transcript is explicitly indicated. From the documentary perspective, by assigning particular modifications to a specific change element, we describe the writing process, in that they specify which segment has been written when . From the textual perspective, the markup concentrates simply on the existence of textual alterations and makes no explicit claims about the order of writing.
TEI: Other Primary Source Features not Covered in these Guidelines⚓︎12.8 Other Primary Source Features not Covered in these Guidelines
We repeat the advice given at the beginning of this chapter, that these recommendations are not intended to meet every transcriptional circumstance ever likely to be faced by any scholar. They are intended rather as a base to enable encoding of the most common phenomena found in the course of scholarly transcription of primary source materials. These guidelines particularly do not address the encoding of physical description of textual witnesses: the materials of the carrier, the medium of the inscribing implement, the organisation of the carrier materials themselves (as quiring, collation, etc.), and authorial instructions or scribal markup, except insofar as these are involved in the broader question of manuscript description, as addressed by the msdescription module described in chapter 11 Manuscript Description.
TEI: Module for Transcription of Primary Sources⚓︎12.9 Module for Transcription of Primary Sources
The module described in this chapter makes available the following components:
- Module transcr: Transcription of primary sources
-
- Elements defined: addSpan am damage damageSpan delSpan ex facsimile fw handNotes handShift line listTranspose metamark mod path redo restore retrace secl sourceDoc space subst substJoin supplied surface surfaceGrp surplus transpose undo zone
- Classes defined: att.coordinated att.global.change att.global.facs
The selection and combination of modules to form a TEI schema is described in 1.2 Defining a TEI Schema.