User:Solol/Fr Sandbox/Formatting on the Character Level
Formatting Guidelines |
---|
Formatting Summary |
Formatting on the Character Level |
Formatting on the Paragraph Level |
Formatting on the Page Level |
Miscellany |
Common Problems |
Index |
Version TBAdded. |
Placement of Inline Formatting Markup
Commentaires, suggestions :
Inline formatting refers to markup such as <i> </i>, <b> </b>, <sc> </sc>, <f> </f>, or <g> </g>. Place punctuation outside the tags unless the markup is around an entire sentence or paragraph, or the punctuation is itself part of the phrase, title, or abbreviation that you are marking. If the formatting goes on for multiple paragraphs, put the markup around each paragraph.
The periods that mark an abbreviated word in the title of a journal such as Phil. Trans. are part of the title, so they are included within the tags, thus: <i>Phil. Trans.</i>.
Many typefaces found in older books used the same design for numbers in both regular text and italics or bold. For dates and similar phrases, format the entire phrase with one set of markup, rather than marking the words as italics (or bold) and not the numbers.
If there is a series/list of words or phrases (such as names, titles, etc.), mark each item of the list individually.
In poetry, mark each line of the poem separately if the formatting goes on for multiple lines. See the Tables section for handling markup in tables.
Examples:
Original Image: | Correctly Formatted Text: |
---|---|
Enacted 4 July, 1776 | <i>Enacted 4 July, 1776</i> |
It cost 9l. 4s. 1d. | It cost 9<i>l.</i> 4<i>s.</i> 1<i>d.</i> |
God knows what she saw in me! I spoke in such an affected manner. |
<b>God knows what she saw in me!</b> I spoke in such an affected manner. |
As in many other of these Studies, and | As in many other of these <i>Studies</i>, and |
(Psychological Review, 1898, p. 160) | (<i>Psychological Review</i>, 1898, p. 160) |
L. Robinson, art. "Ticklishness," | L. Robinson, art. "<sc>Ticklishness</sc>," |
December 3, morning. 1323 Picadilly Circus |
/* <i>December 3, morning.</i> 1323 Picadilly Circus */ |
Volunteers may be tickled pink to read Ticklishness, Tickling and Laughter, Remarks on Tickling and Laughter and Ticklishness, Laughter and Humour. |
Volunteers may be tickled pink to read <i>Ticklishness</i>, <i>Tickling and Laughter</i>, <i>Remarks on Tickling and Laughter</i> and <i>Ticklishness, Laughter and Humour</i>. |
“That's the idea!” exclaimed Tacks. | "<i>That's the idea!</i>" exclaimed Tacks. |
The professor set the reading assignment for Erlebnis Geschichte Deutschland seit 1845. |
The professor set the reading assignment for <g>Erlebnis Geschichte Deutschland seit 1845</g>. |
Italics
Commentaires, suggestions :
Format italicized text with <i> inserted at the start and </i> inserted at the end of the italics. (Note the "/" in the closing tag.)
See also Placement of Inline Formatting Markup.
Bold Text
Commentaires, suggestions :
Format bold text (text printed in a heavier typeface) with <b> inserted before the bold text and </b> after it. (Note the "/" in the closing tag.)
See also Placement of Inline Formatting Markup and Chapter Headings.
Underlined Text
Commentaires, suggestions :
Format underlined text as Italics, with <i> and </i>. (Note the "/" in the closing tag.) Underlining was often used to indicate emphasis when the typesetter was unable to actually italicize the text, for example in a typewritten document.
See also Placement of Inline Formatting Markup.
Some Project Managers may specify in the Project Comments that underlined text be marked up with the <u> and </u> tags.
Spaced Out Text (gesperrt)
Commentaires, suggestions :
Format spaced out text with <g> inserted before the text and </g> after it. (Note the "/" in the closing tag.) Remove the extra spaces between letters in each word. This was a typesetting technique used for emphasis in some older books, especially in German.
See also Placement of Inline Formatting Markup and Chapter Headings.
Font Changes
Commentaires, suggestions :
Some Project Managers may request that you mark a change of font within a paragraph or line of normal text by inserting <f> before the change in font and </f> after it. (Note the "/" in the closing tag.) This markup may be used to identify a special font or other formatting that does not already have its own markup (such as italics and bold).
Possible uses of this markup include:
- antiqua (a variant of roman font) inside fraktur
- blackletter within a section of regular font
- smaller or larger font only if it is within a paragraph in regular font (for a whole paragraph in a different font or size, see the block quotation section)
- upright font inside of a paragraph of italicized text
The particular use or uses of this markup in a project will usually be spelled out in the Project Comments. Formatters should post in the Project Discussion if the markup appears to be needed and has not yet been requested.
See also Placement of Inline Formatting Markup.
Words in Small Capitals
Commentaires, suggestions :
The formatting is different for Mixed Case Small Caps and all small caps:
Format words that are printed in Mixed Small Caps as Mixed Upper and Lowercase. Format words that are printed in all small caps as ALL-CAPS. For both mixed case and all small caps, surround the text with <sc> and </sc> markup.
Headings (Chapter Headings, Section Headings, Captions, etc.) may appear to be in all small caps, but this is usually the result of a change in font size and should not be marked as small caps.
See also Placement of Inline Formatting Markup.
Original Image: | Correctly Formatted Text: |
---|---|
This is Small Caps | <sc>This is Small Caps</sc> |
You cannot be serious about aardvarks! | You cannot be serious about <sc>AARDVARKS</sc>! |
Font Size Changes
Commentaires, suggestions :
Normally we do not do anything to mark changes in font size. The exceptions to this are when it indicates a block quotation or when the font size changes within a single paragraph or line of text (see Font Changes).
Extra Spaces or Tabs Between Words
Commentaires, suggestions :
Extra spaces between words are common in OCR output. You generally don't need to bother removing these—that can be done automatically during post-processing.
However, extra spaces around punctuation, em-dashes, quote marks, etc. do need to be removed when they separate the symbol from the word. In addition, within the /* */ markup that preserves spacing, be sure to remove any extra spaces since they will not be automatically removed later on. Finally, if you find any tab characters in the text you should remove them.
Superscripts
Commentaires, suggestions :
Older books often abbreviated words as contractions, and printed them as superscripts. Format these by inserting a single caret (^) followed by the superscripted text. If the superscript continues for more than one character, then surround the text with curly braces { and } as well. For example:
Original Image: |
---|
Genrl Washington defeated Ld Cornwall's army. |
Correctly Formatted Text: |
Gen^{rl} Washington defeated L^d Cornwall's army. |
In scientific & technical works, format superscripted characters with curly braces { and } surrounding them even if there is only one character superscripted. For example:
Original Image: |
---|
... up to xn elements in the array. |
Correctly Formatted Text: |
... up to x^{n} elements in the array. |
If the superscript represents a footnote marker, then see the Footnotes section instead.
The Project Manager may specify in the Project Comments that superscripted text be marked differently.
Subscripts
Commentaires, suggestions :
Subscripted text is often found in scientific works, but is not common in other material. Format subscripted text by inserting an underline character _ and surrounding the text with curly braces { and }. For example:
Original Image: |
---|
H2O. |
Correctly Formatted Text: |
H_{2}O. |
Page References "See p. 123"
Commentaires, suggestions :
Format page number references within the text such as (see p. 123) as they appear in the image.
Check the Project Comments to see if the Project Manager has special requirements for page references.