PGTEI 0.5
Note: PGTEI is no longer used at DP. Information on this page may be out of date.
PGTEI 0.5 is the new and upcoming version of the PGTEI conversion toolchain.
This page is mainly to inform people of upcoming and planned changes to the PGTEI spec and to allow them to comment. Please comment on the talk page (discussion).
PGTEI 0.5 includes many requested new features, numerous bug-fixes and the completely new SuckMore™ engine! You have been heard! Now go back and proof more books.
State of the World
Backends for HTML, PDF and TXT are complete. You can see for yourself what's already working and what not.
Find examples of converted files along with TEI sources.
- Alice (pictures, fancy formatting)
- Live on the Mississippi (long, footnotes, tables)
- Candide (lote, elaborate title pages, footnotes)
- Bowerbird (how to build your own markup language on top of TEI)
- The Guide (converter torture test)
Features
(New features are in italics.)
- supports TEI version 5
- full XPath selectors
Outputs:
- HTML
- Plain Text
Supports:
- embedded TeX
- embedded SVG
- embedded MathML
- embedded Lilypond (for music scores)
- embedded Forsyth-Edwards-Notation (for chess boards)
Changes from v0.4 to v0.5
This section contains preliminary information subject to change without notice!
TEI P5
Version 0.5 will use the new TEI P5 Guidelines released November 1, 2007.
Here is some advice on migration from P4 to P5 by the TEI Consortium.
Most of the conversion can be automated by using the P4 to P5 Style Sheet by Sebastian Rahtz. <Index>
elements need extra attention though.
pgtei Namespace
TEI P5 is now namespace aware. All PGTEI extension elements have been moved into the pgtei namespace e.g., <pgtei:style>
, <pgtei:extension>
and <pgtei:charmap>
.
If you want to use PGTEI extensions, you must include the pgtei namespace in your <TEI> node:
<?xml version="1.0" encoding="utf-8" ?> <TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:pgtei="http://www.gutenberg.org/tei/marcello/0.5/ns" xml:lang="en"> <teiHeader> ...
N.B. a namespace is an URI (and not an URL). It is perfectly legal for an URI to point to nowhere.
New XPath selectors
Embedded stylesheets now use XPath selectors. (The very limited CSS selector support of v0.4. has been dropped.)
Rules in PGTEI stylesheets get applied in the following order, with later declarations overriding earlier ones:
- The builtin stylesheet of the converter, from first line to last line. (Which you don't normally see, but can be found in the file builtin.css in the sources.)
- Your embedded stylesheet(s) from first line to last line.
- Declarations found in the
rend
attribute.
For further details on XPath see the XPath 1.0 W3C Recommendation.
Examples
To select all paragraphs write:
elem(//tei:p) { text-indent: 2em }
Note: the elem(...)
wrapper is not part of the XPath expression. The XPath expression is just: //tei:p
.
Note: in P5 all TEI elements are in a namespace. Because XPath (unlike XML) has no default namespace, you must prefix all references to TEI nodes with tei:
.
The elem(...)
wrapper selects the element. The before(...)
wrapper selects the :before
pseudo-element. The after(...)
wrapper selects the :after
pseudo-element.
To center all headers and give subheaders a smaller font write:
elem(//tei:head) { font-size: 150%; text-align: center } elem(//tei:head[@type='sub']) { font-size: 80% }
The font-size:
declaration on the second line will overwrite the one on the first line for all subheaders.
To select all subheaders of level 1 write:
elem(//tei:div/tei:head[count (ancestor::*[tei:head]) = 1 and @type='sub']) { text-align: center; font-size: 80% }
tei:div/tei:head
selects me if I am a div/head but not if I am a table/head, figure/head etc.count (ancestor::*[tei:head]) = 1
counts how many of my ancestors of any element type have a child of type head. My own parent has myself as child, that explains why it's 1 and not 0!@type='sub'
only matches me if I am a subheader.
Local Style Sheets
You can now use an embedded style sheet everywhere in the document. The context of a local style sheet is the parent node and the scope is all descendants of the parent node.
<pgStyleSheet>
is now called <pgtei:style>
.
To get a global style sheet, just use a local style sheet on the <TEI>
node.
CSS generated content
You can now generate content using the content:
declaration.
For further details see the CSS 2.1 W3C Candidate Recommendation - Chapter 12.
Examples
before(//tei:pb) { content: "-" attr(n) "-"; color: gray; font-size: 80% }
Note: attr(n)
copies the "n" attribute on the <pb n="42" />
element.
<tei:pb> no longer outputs a page break marker
The v0.4 behaviour was to always output a "[pg n]" marker. It has been complained that this was too anglo-centric. The new CSS content:
feature can be used to output any kind of page break marker.
To re-enable v0.4 behaviour, use this stylesheet rule:
elem(//tei:pb) { content: "[pg " attr(n) "]" }
<tei:pb> new attribute pgtei:proofers
To accomodate DP requirements an attribute pgtei:proofers
has been added to the <tei:pb>
element. The attribute may contain one or more URIs. Usually these URIs wil point to a <tei:respStmt>
in the <tei:teiHeader>
.
Example
In the text:
<pb n="42" pgtei:proofers="#pr1 #pr2" /> ... <pb n="43" pgtei:proofers="#pr1 #pr3" />
and in the header:
<respStmt id="pr1"> <resp>Round 1</resp> <name>John Doe</name> </respStmt> <respStmt id="pr2"> <resp>Round 2</resp> <name>zakk</name> </respStmt> <respStmt id="pr3"> <resp>Round 2</resp> <name>Alice</name> </respStmt>
<tei:pb> new attribute facs
TEI P5 has a new attribute facs
which max point to a facsimile of the digitized page.
<pb n="42" facs="069.png" />
<pb n="42" facs="http://gallica.bnf.fr/some/path/to/069.jpg" />
<tei:lb/> no longer outputs a line break
This was a misguided departure from the TEI philosophy in the first place.
TEI is a text-feature preserving markup. No thoughts have been spared to text reproduction. In TEI <tei:lb/>
is used to record line breaks found in original manuscripts.
PGTEI is a TEI application geared towards reproduction of texts. A tag that outputs a line break is very handy in reproducing texts, albeit not necessary. The behaviour of v0.4 was to output a line break for the <tei:lb/>
tag. This has been removed in favor of conforming to the TEI standard.
To output a line break use:
<lb rend="line-break-before: always" />
To re-enable v0.4 behaviour, use this stylesheet rule:
elem(//tei:lb) { line-break-before: always }
<tei:milestone unit="tb"/> no longer outputs a thought break
This also heavily clashed with TEI philosophy. In TEI <tei:milestone/>
divides a text according to some supplemental reference system. For example, many English novels were first published as serial works, individual parts of which did not always contain a whole number of chapters.
You can now output thought breaks by writing:
<figure type="gap" rend="height: 2cm" />
Will output 2cm of vertical blank space.
<figure rend="content: '* * * * *'; margin: 2em auto; text-align: center" />
Will output 5 stars horizontally centered on the page.
<figure type="rule" rend="width: 50%; margin: 2em auto" />
Will output a rule of 50% page width, horizontally centered on the page.
<tei:q> behaviour has substantially changed
The rend attribute declarations pre:
and post:
as suggested by TEI have been withdrawn in favor of a CSS 2.1 compliant implementation.
Example
To approximate v0.4 behaviour, use this stylesheet:
/* define quotation marks for English and French */ elem(//tei:*[@lang='en']) { quotes: "“" "”" "‘" "’"; } elem(//tei:*[@lang='fr']) { quotes: "«" "»" "‹" "›"; } /* specify some quote classes */ before(//tei:q) { content: open-quote } after (//tei:q) { content: close-quote } after (//tei:q[contains(@rendition, '#pre')]) { content: no-close-quote } after (//tei:q[contains(@rendition, '#post')]) { content: no-open-quote }
If you write:
- <p><q rendition="#pre">Yuck!</q></p>
you get:
- “Yuck!
with an opening quotation mark but no closing one.
<pgIf> withdrawn
The element <pgIf>
, used for conditional text inclusion has been withdrawn in favor of the CSS display declaration and (local) style sheets.
Example
To approximate v0.4 behaviour use:
<back rend="page-break-before: right"> <pgtei:style> @media pdf { /* hide back if pdf */ elem(.) { display: none } } </pgtei:style> <div xml:id="footnotes"> <divGen type="footnotes" /> </div> </back>
Note: the local style sheet resides in the context of the parent element, in this case in the context of <back>
. Thus elem(.)
refers to the <back>
element. display: none
hides the element.
<pgVar> withdrawn
This was undocumented, but just in case you figured it out from the sources: it is gone.
PDF backend now uses Apache FOP
In v0.4 I used TeX to produce PDF. I switched to XSL-FO and Apache FOP mainly because unicode support in pdfTeX is abysmal and XSL-FO is an XML format.
You can still write TeX formulas in your TEI source. They will be rendered into PNGs and then embedded.