|
13,418 titles preserved for the world!
179 in Jul 2008 — 186 in Aug 2008 — More... |
| DP | · Register · Help |
Formatting GuidelinesVersion 1.9.e, revised July 19, 2007 (Revision History)Formatting Guidelines in French /
Directives de Formatage en français
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Formatting Summary is a short, 2-page printer-friendly (.pdf) document that summarizes the main points of these Guidelines, and gives examples of how to format. Beginning formatters are encouraged to print out this document and keep it handy while formatting.
You may need to download and install a .pdf reader. You can get one free from Adobe® here.
This document is written to explain the formatting rules we use to maintain consistency when formatting a single book that is distributed among many formatters, each of whom is working on different pages. This helps us all do formatting the same way, which in turn makes it easier for the post-processor to eventually combine all these pages into one e-book.
It is not intended as any kind of a general editorial or typesetting rulebook.
We've included in this document all the items that new users have asked about formatting and proofreading. If there are any items missing, or items that you consider should be done differently, or if something is vague, please let us know.
This document is a work in progress. Help us to progress by posting your suggested changes in the Documentation Forum in this thread.
On the Project Page where you start formatting pages, there is a section called "Project Comments" containing information specific to that project (book). Read these before you start formatting pages! If the Project Manager wants you to format something in this book differently from the way specified in these Guidelines, that will be noted here. Instructions in the Project Comments override the rules in these Guidelines, so follow them. (This is also where the Project Manager may give you interesting tidbits of information about the author or the project.)
Please also read the Project Thread (discussion): The Project Manager may clarify project-specific guidelines here, and it is often used by volunteers to alert other volunteers to recurring issues within the project and how they can best be addressed. (See below).
On the Project Page, the link 'Images, Pages Proofread, & Differences' allows you to see how other volunteers have made changes. This Forum thread discusses different ways to use this information.
On the Project Page where you start formatting pages, on the line "Forum", there is a link titled "Discuss this Project" (if the discussion has already started), or "Start a discussion on this Project" (if it hasn't). Clicking on that link will take you to a thread in the projects forum dedicated to this specific project. That is the place to ask questions about this book, inform the Project Manager about problems, etc. Using this project forum thread is the recommended way to communicate with the Project Manager and other volunteers who are working on this book.
When you select a project for formatting, the Project Page page is loaded. This page contains links to pages from this project that you have recently worked on. (If you haven't formatted any pages yet, there will be no links shown.)
Pages listed under either "DONE" or "IN PROGRESS" are available to make corrections or to finish formatting. Just click on the link to the page. So if you discover that you made a mistake on a page, or marked something incorrectly, you can click on that page here and reopen it to fix the error.
You may also use the "Images, Pages Proofread, & Differences" or "Just My Pages" links on the Project Page. These pages will display an "Edit" link next to the pages you have worked on in the current round that can still be corrected.
For more detailed information, refer to either the Standard Proofreading Interface Help or the Enhanced Proofreading Interface Help, depending on which interface you are using.
| Formatting of the... |
Format all the text, just as it was printed on the page, whether all capitals, upper and lower case, etc., including the years of publication or copyright.
Older books often show the first letter as a large ornate graphic—format this as just the letter.
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
Format the Table of Contents just as it is printed in the book, whether all capitals, upper and lower case, etc. and surround it with /* and */. Leave a blank line between these markers and the rest of the text. Page number references should be retained and be placed at least six spaces past the end of the line.
Remove any periods or asterisks (leaders) used to align the page numbers.
| Sample Image: | |
|---|---|
|
|
|
| Correctly Formatted Text: | |
|
Format as [Blank Page] if both the text and the image are blank.
If there is text in the formatting text area and a blank image, or if there is an image but no text, follow the directions for a Bad Image or Bad Text.
Remove page headers and page footers, but not footnotes, from the text.
The page headers are normally at the top of the image and have a page number opposite them. Page headers may be the same all through the book (often the title of the book and the author's name), they may be the same for each chapter (often the chapter number), or they may be different on each page (describing the action on that page). Remove them all, regardless, including the page number.
A chapter header will start further down the page and won't have a page number on the same line. See the next section for a specific example.
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
Format chapter headers as they appear in the text.
A chapter header may start a bit farther down the page than the page header and won't have a page number on the same line. Chapter Headers are often printed all caps; if so, keep them as all caps. Chapter Headers are usually printed in a different or larger font which may appear to be bold or spaced out, but we do not mark them as a different font or as bold or spaced text; however you should include italics or small-caps markup if it appears in the header.
Put 4 blank lines before the "CHAPTER XXX". Include these blank lines even if the chapter starts on a new page; there are no 'pages' in an e-book, so the blank lines are needed. Then leave one blank line between each additional part of the chapter header, such as a chapter description, opening quote, etc., and finally leave two blank lines before the start of the text of the chapter.
Old books often printed the first word or two of every chapter in all caps or small caps; change these to upper and lower case (first letter only capitalized).
Watch out for a missing double quote at the start of the first paragraph, which some publishers did not include or which the OCR missed due to a large capital in the original. If the author started the paragraph with dialog, insert the double quote.
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
Some texts have sections within chapters. Format these headers as they appear in the text. Leave 2 blanks lines before the header and one after, unless the Project Manager has requested otherwise. If you are not sure if a header indicates a chapter or a section, post a question in the Project Thread, noting the page number. Section Headers are often printed in a different or larger font which may appear to be bold or spaced out, but we do not mark them as a different font or as bold or spaced text; however you should include italics or small-caps markup if it appears in the header.
Major Divisions in the text such as Preface, Foreword, Table of Contents, Introduction, Prologue, Epilogue, Appendix, References, Conclusion, Glossary, Summary, Acknowledgements, Bibliography, etc., should be formatted in the same way as Chapter Headers, i.e. 4 blank lines before the heading and 2 blank lines before the start of the text.
Some books will have short descriptions of the paragraph along the side of the text. These are called sidenotes. Move sidenotes to just above the paragraph that they belong to. A sidenote should be surrounded by a sidenote tag [Sidenote: and ], with the text of the sidenote placed in between. Format the sidenote text as it is printed, preserving the line breaks, italics, etc. Leave a blank line after the sidenote, so that it does not get merged into the paragraph when the text is rewrapped during post-processing.
If there are multiple sidenotes for a single paragraph, put them one after another at the start of the paragraph. Leave a blank line separating each of them.
If the paragraph began on a previous page, put the sidenote at the top of the page and mark it with * so that the post-processor can see that it belongs on the previous page, like this: *[Sidenote: (text of sidenote)]. The post-processor will move it to the appropriate place.
Sometimes a Project Manager will request that you put sidenotes next to the sentence they apply to, rather than at the top or bottom of the paragraph. In this case, don't separate them out with blank lines.
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
Put a blank line before the start of paragraphs, even if a paragraph starts at the top of a page. You should not indent the start of paragraphs, but if paragraphs are already indented, don't bother removing those spaces—that can be done automatically during post-processing.
See the Chapter Headers image/text for an example.
Format ordinary text that has been printed in two columns as a single column.
Spans of multiple-column text within single column sections should be formatted as a single column by placing the text from the left-most column first, the text from the next one after it, and so on. You do not need to mark where the columns were split, just join them together.
If the columns are lists of items, mark the start of the list with /* and the end with */ so that the lines do not get rewrapped during post-processing. Leave a blank line between these markers and the rest of the text.
See also the Indexes, Lists of Items and Tables sections of these Guidelines.
Text for an illustration should be surrounded by an illustration tag [Illustration: and ], with the caption text placed in between. Format the caption text as it is printed, preserving the line breaks, italics, etc.
If an illustration has no caption, add a tag [Illustration].
If the illustration is in the middle of or at the side of a paragraph, move the illustration tag to before or after the paragraph and leave a blank line to separate them. Rejoin the paragraph by removing any blank lines left by doing so.
If there is no paragraph break on the page, mark the illustration tag with an * like so *[Illustration: (text of caption)], move it to the top of the page, and leave a blank line after it.
| Sample Image: | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
| Sample Image: (Illustration in middle of paragraph) | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
Footnotes are placed out-of-line; that is, the text of the footnote is left at the bottom of the page and a tag placed where it is referenced in the text.
During formatting, this means:
1. The number, letter, or other character that marks a footnote location should be surrounded with square brackets ([ and ]) and placed right next to the word being footnoted[1] or its punctuation mark,[2] as shown in the text, and the two examples in this sentence.
When footnotes are marked with a series of special characters (*, †, ‡, §, etc.) we replace these with Capital letters in order (A, B, C, etc.).
2. A footnote should be surrounded by a footnote tag [Footnote #: and ], with the footnote text placed in between, and the footnote number or letter placed where the # is shown in the tag. Format the footnote text as it is printed, preserving the line breaks, italics, etc. Leave the footnote text at the bottom of the page. Be sure to use the same tag in the footnote as you used in the text where the footnote was referenced. Place each footnote on a separate line in order of appearance. Place a blank line between each footnote if there is more than one.
In some books, the Project Manager may ask that you move the footnotes in-line; read the Project Comments for instructions in this case.
See the Page Headers/Page Footers image/text for a sample footnote.
If there's a footnote at the bottom of the page with no footnote marker in the text, especially if it starts mid-sentence or mid-word, it's probably a continuation of a footnote from a previous page. Leave it at the bottom of the page near the other footnotes, and surround it with *[Footnote: (text of footnote)] (without any footnote number or marker). The * indicates that the footnote was continued, and brings it to the attention of the post-processor.
If a footnote continues on the next page (the page ends before the footnote does), leave the footnote at the bottom of the page, and just put an asterisk * where the footnote ends, like this: [Footnote 1: (text of footnote)]*. (The * indicates that the footnote ended prematurely, and brings it to the attention of the post-processor, who will eventually join it up with the rest of the footnote text.
If a continued footnote ends or starts on a hyphenated word, mark both the footnote
and the word with *, thus:
[Footnote 1: This footnote is continued and the last word in it is also con-*]*
for the leading fragment, and
*[Footnote: *tinued onto the next page.].
If a footnote or endnote is referenced in the text but does not appear on that page, keep the footnote/endnote number or marker and surround it with square brackets [ and ]. This is common in scientific and technical books, where footnotes are often grouped at the end of chapters. See "Endnotes" below.
| Original Text: | |
|---|---|
|
|
| Format with Out-of-Line Footnotes: | |
|
In some books, footnotes are separated from the main text by a horizontal line. We don't keep this so please just leave a blank line between the main text and the footnotes. (See example above.)
Endnotes are just footnotes that have been located together at the end of a chapter or at the end of the book, instead of on the bottom of each page. These are formatted in the same manner as footnotes. Where you find an endnote reference in the text, just surround it with [ and ]. If you are formatting one of the ending pages with the endnotes text on it, surround the text of each note with [Footnote #: (text of endnote)], with the endnote text placed in between, and the endnote number or letter placed where the # is. Put a blank line after each endnote so that they remain separate paragraphs when the text is rewrapped during post-processing.
Footnotes in Poetry or Tables should be treated the same as other footnotes. Volunteers should tag them and leave them at the bottom of the page; the post-processor will decide on the final placement.
| Original Footnoted Poetry: | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
Format italicized text with <i> inserted at the start and </i> inserted at the end of the italics. (Note the "/" in the closing tag.)
Punctuation goes outside the italics, unless it is an entire sentence or section that is italicized, or the punctuation is itself part of a phrase, title, or abbreviation that is italicized.
The periods that mark an abbreviated word in the title of a journal such as Phil. Trans. are part of the title for italicization purposes, and are included within the italic tags, thus: <i>Phil. Trans.</i>.
For dates and similar phrases, format the entire phrase as italics, rather than marking the words as italics and the numbers as non-italics. The reason is that many typefaces found in older texts used the same design for numbers in both regular and italics.
If the italicized text consists of a series/list of words or names, mark these up with italics tags individually.
Examples—Italics:
| Original Text: | Correctly Formatted Text: |
|---|---|
| Enacted 4 July, 1776 | <i>Enacted 4 July, 1776</i> |
| God knows what she saw in me! I spoke in such an affected manner. |
<i>God knows what she saw in me!</i> I spoke in such an affected manner. |
| As in many other of these Studies, and | As in many other of these <i>Studies</i>, and |
| (Psychological Review, 1898, p. 160) | (<i>Psychological Review</i>, 1898, p. 160) |
| L. Robinson, art. "Ticklishness," | L. Robinson, art. "<i>Ticklishness</i>," |
| December 3, morning. 1323 Picadilly Circus |
/* <i>December 3, morning.</i> 1323 Picadilly Circus */ |
|
Volunteers may be tickled pink to read Ticklishness, Tickling and Laughter, Remarks on Tickling and Laughter and Ticklishness, Laughter and Humour. |
Volunteers may be tickled pink to read <i>Ticklishness</i>, <i>Tickling and Laughter</i>, <i>Remarks on Tickling and Laughter</i> and <i>Ticklishness, Laughter and Humour</i>. |
Format bold text (text printed in a heavier typeface) with <b> inserted before the bold text and </b> after it. (Note the "/" in the closing tag.)
Punctuation goes outside the bold tags, unless it is an entire sentence or section that is in bold, or the punctuation is itself part of a phrase, title, or abbreviation that is in bold type.
See the Page Headers/Page Footers image/text for an example.
Some Project Managers may specify in the Project Comments that bold text be rendered as all caps.
Older books often abbreviated words as contractions, and printed them as
superscripts. For example:
Genrl Washington defeated Ld Cornwall's army.
Format these by inserting a single caret followed by the superscripted text, like this:
Gen^rl Washington defeated L^d Cornwall's army.
In scientific & technical works, format superscripted characters with curly braces
{ and } surrounding them, even if there is only one character superscripted.
For example:
... up to xn-1 elements in the array.
would be formatted as
... up to x^{n-1} elements in the array.
The Project Manager may specify in the Project Comments that superscripted text be marked up differently.
Subscripted text is often found in scientific works, but is not common in other
material. Format subscripted text by inserting an underline character _ and
surrounding the text with curly braces { and }.
For example:
H2O.
would be formatted as
H_{2}O.
Format underlined text as Italics, with <i> and </i>. (Note the "/" in the closing tag.)
Underlining was often used to indicate emphasis when the typesetter was unable to actually italicize the text, for example in a typewritten document.
Some Project Managers may specify in the Project Comments that underlined text be marked up with the <u> and </u> tags.
Format s p a c e d o u t text with <g> inserted before the text and </g> after it. (Note the "/" in the closing tag.) Remove the extra spaces between letters in each word.
Punctuation goes outside the tags, unless it is an entire sentence or section that is spaced out, or the punctuation is itself part of a phrase that is spaced out.
This was a typesetting technique used to emphasize a piece of text in some older books, especially in German.
Format a change of font within a paragraph or line of normal text by inserting <f> before the change in font and </f> after it. (Note the "/" in the closing tag.) Use this markup to identify any special font or other formatting, except bold, italic, small capped, and spaced out text, which have their own tags.
Possible uses of this markup include:
The particular use or uses of this markup in a project will usually be spelled out in the Project Comments. Formatters should post in the Project Discussion if the markup appears to be needed and has not yet been discussed.
Punctuation goes outside the tags, unless it is an entire sentence that is in a different font, or the punctuation is itself part of a phrase, title, or abbreviation in the different font.
Normally we do not do anything to mark changes in font size.
The exceptions to this are when the font size changes to indicate a block quotation, or when the font size changes within a single paragraph or line of text (see Font Changes).
Format words that are printed in all capital letters as all capital letters.
The exception to this is the first word of a chapter: many old books typeset the first word of these in all caps; this should be changed to upper and lower case, so "ONCE upon a time," becomes "Once upon a time,"
The markup is different for Mixed Case Small Caps and all small caps:
Format words that are printed in Mixed Small Caps
as mixed upper and lowercase, and surround the text with <sc> and </sc>
markup.
Example:
This is Small Caps
would correctly be:
<sc>This is Small Caps</sc>.
Format words that are printed in all small caps
as ALL-CAPS, and surround the text with <sc> and </sc> markup.
Example:
You cannot be serious about
aardvarks!
would correctly be:
You cannot be serious about
<sc>AARDVARKS</sc>!
Words in headings (Chapter Headings, Section Headings, Captions, etc.) that are entirely all-capped should be formatted as all-caps without any <sc> </sc>. The first word of a chapter that is in Small Caps should be changed to mixed case without the tags.
Format a large and ornate graphic first letter of a chapter, section, or paragraph as if it were an ordinary letter.
There are generally four such marks you will see in books:
Note: If an em-dash appears at the start or end of a line of your OCR'd text, join it with the other line so that there are no spaces or line breaks around it. Only if the author used an em-dash to start or end the paragraph or line of poetry or dialog should you leave it at the start or end of a line. See the examples below.
Examples—Dashes, Hyphens, and Minus Signs:
| Original Image: | Correctly Formatted Text: | Type |
|---|---|---|
| semi-detached | semi-detached | Hyphen |
| three- and four-part harmony | three- and four-part harmony | Hyphen |
| discoveries which the Crus- aders made and brought home with |
discoveries which the Crusaders made and brought home with |
Hyphen |
| factors which mold char- acter—environment, training and heritage, |
factors which mold character--environment, training and heritage, | Hyphen |
| See pages 21–25 | See pages 21-25 | En-dash |
| –14° below zero | -14° below zero | En-dash |
| X – Y = Z | X - Y = Z | En-dash |
| 2–1/2 | 2-1/2 | En-dash |
| I am hurt;—A plague on both your houses!—I am dead. |
I am hurt;--A plague on both your houses!--I am dead. |
Em-dash |
| sensations—sweet, bitter, salt, and sour —if even all of these are simple tastes. What |
sensations--sweet, bitter, salt, and sour--if even all of these are simple tastes. What |
Em-dash |
| senses—touch, smell, hearing, and sight— with which we are here concerned, |
senses--touch, smell, hearing, and sight--with which we are here concerned, |
Em-dash |
| It is the east, and Juliet is the sun!— | It is the east, and Juliet is the sun!-- | Em-dash |
| "Three hundred——" "years," she was going to say, but the left-hand cat interrupted her. | "Three hundred----" "years," she was going to say, but the left-hand cat interrupted her. | Longer Em-dash |
| As the witness Mr. —— testified, | As the witness Mr. ---- testified, | long dash |
| As the witness Mr. S—— testified, | As the witness Mr. S---- testified, | long dash |
| the famous detective of ——B Baker St. | the famous detective of ----B Baker St. | long dash |
| “You —— Yankee”, she yelled. | "You ---- Yankee", she yelled. | long dash |
| “I am not a d—d Yankee”, he replied. | "I am not a d--d Yankee", he replied. | Em-dash |
Where a hyphen appears at the end of a line, join the two halves of the hyphenated word back together. If it is really a hyphenated word like well-meaning, join the two halves leaving the hyphen in between. But if it was just hyphenated because it wouldn't fit on the line, and is not a word that is usually hyphenated, then join the two halves and remove the hyphen. Keep the joined word on the top line, and put a line break after it to preserve the line formatting—this makes it easier for the volunteers who come after you. See the Dashes, Hyphens, and Minus Signs section of these Guidelines for examples of each kind (nar-row turns into narrow, but low-lying keeps the hyphen). If the word is followed by punctuation, then carry that punctuation onto the top line, too.
Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together like this: to-*day. The asterisk will bring it to the attention of the post processor, who has access to all the pages, and can determine how the author typically wrote this word.
Format end-of-page hyphens or em-dashes by leaving the hyphen or em-dash at the end
of the last line, and mark it with a * after the hyphen.
For example, format:
something Pat had already become accus-
as:
something Pat had already become accus-*
On pages that start with part of a word from the previous page or an
em-dash, place a * before the partial word or em-dash.
To continue the above example, format:
tomed to from having to do his own family
as:
*tomed to from having to do his own family
These markings indicate to the post-processor that the word must be rejoined when the pages are combined to produce the final e-book.
Format these by deleting the word, even if it's the second half of a hyphenated word.
In some older books, the single word at the bottom of the page (called a "catchword", usually printed near the right margin) indicates the first word on the next page of the book (called an "incipit"). It was used to alert the printer to print the correct reverse (called "verso"), to make it easier for printers' helpers to make up the pages prior to binding, and to help the reader avoid turning over more than one page.
Remove any extra space in contractions: for example, would n't should be formatted as wouldn't.
This was often an early printers' convention, where the space was retained to indicate that 'would' and 'not' were originally separate words. It is also sometimes an artifact of the OCR. Remove the extra space in either case.
Some Project Managers may specify in the Project Comments not to remove extra spaces in contractions, particularly in the case of texts that contain slang, dialect, or are written in languages other than English.
This section applies to an occasional Poem or Epigram in a mainly non-poetry book. For an entire book of poetry, see the special guidelines for Poetry Books.
Mark poetry or epigrams so the post-processor can find it more quickly. Insert a separate line with /* at the start of the poetry or epigram and a separate line with */ at the end. Leave a blank line between these markers and the rest of the text.
Preserve the relative indentation of the individual lines of the poem or epigram by adding 2, 4, 6 (or more) spaces in front of the indented lines to make them resemble the original.
When a line of verse is too long for the printed page, many texts wrap the continuation onto the next printed line and place a wide indentation in front of it. These continuation lines should be rejoined with the line above. Continuation lines usually start with a lower case letter. They will appear randomly unlike normal indentation, which occurs at regular intervals in the metre of the poem.
If the poetry is centered on the printed page, don't try to center the lines of poetry during formatting. Move the lines to the left margin, and preserve the relative indentation of the lines.
Footnotes in poetry should be treated the same as regular footnotes during formatting. See footnotes for details.
Line Numbers in poetry should be kept. Put them at the end of the line, leaving at least 6 spaces between them and the end of the text. See Line Numbers for details.
Check the Project Comments for the specific text you are formatting. Books of poetry often have special instructions from the Project Manager. Many times, you won't have to follow all these formatting guidelines for a book that is mostly or entirely poetry.
| Sample Image: | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
Format letters and correspondence as you would paragraphs. Put a blank line before the start of the letter; you do not need to duplicate any indenting.
Surround consecutive heading or footer lines (such as addresses, date blocks, salutations, or signatures) with /* and */ markers. Leave a blank line between the markers and the rest of the text. The markers will ensure the individual lines are kept in post-processing and not rewrapped.
Don't indent the heading or footer lines, even if they are indented or right justified in the original—just put them at the left margin. The post-processor will format them as needed.
| Sample Image: | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
Surround lists with /* and */ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the individual lines are not rewrapped during post-processing. Use this markup for any such list that should not be reformatted, including lists of questions & answers, items in a recipe, etc.
| Original Text: | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
Surround tables with /* and */ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the individual lines are not rewrapped during post-processing. Format the table with spaces to look approximately like the original table. Don't make the table wider than 75 characters. Project Gutenberg's guidelines go on to say "...except where it can't be helped. Never, ever longer than 80...".
Do not use tabs for formatting—use space characters only. Tab characters will line up differently between computers, and your careful formatting will not always display the same way.
It's often hard to format tables in plain ASCII text; just do your best. This is much easier if you use a mono-spaced font such as DPCustomMono or Courier. Remember that the goal is to preserve the Author's meaning, while producing a readable table in an e-book. Sometimes this requires sacrificing the original format of the table on the printed page. Check the Project Comments and discussion thread because other volunteers may have settled on a specific format. If there is nothing there, you might find something useful in the Gallery of Table Layouts forum thread.
Footnotes in tables should go at the end of the table. See footnotes for details.
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
Surround block quotations with /# and #/ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the block quotation is formatted properly during post-processing.
Apart from adding the markers, block quotations should be formatted as any other text.
Block quotations are long quotations (typically several lines and sometimes several pages) and are often (but not always) printed with wider margins or in a smaller font size—sometimes both.
| Sample Image: | |
|---|---|
![]() |
|
| Correctly Formatted Text: | |
|
Format these as plain ASCII " double quotes. Do not change double quotes to single quotes. Leave them as the Author wrote them.
For quotes from non-English languages, use the quotation marks appropriate to that language if they are available. The French equivalent, guillemets, «like this», are available from the pulldown menus in the proofreading interface, since they are part of Latin-1. Remember to remove space between the guillemets and the quoted text; if needed, it will be added in post-processing. The same applies to languages which use reversed guillemets, »like this«.
The quotation marks used in some texts (in German or other languages), „like this” are not available in the pulldown menus, as they are not in Latin-1. In that case, follow the instructions in the project comments.
The Project Manager may instruct you in the Project Comments to format non-English language quotation marks differently for a particular book.
Format these as the plain ASCII ' single quote (apostrophe). Do not change single quotes to double quotes. Leave them as the Author wrote them.
Format quotation marks at the beginning of each line of a quotation by removing all of them except for the one at the start of the first line of the quotation.
If the quotation goes on for multiple paragraphs, each paragraph should have an opening quote mark on the first line of the paragraph.
Often there is no closing quotation mark until the very end of the quoted section of text, which may not be on the same page you are formatting. Leave it that way—do not add closing quotation marks that are not in the page image.
There are some language-specific exceptions. In French, for example, dialog within quotations uses a combination of different punctuation to indicate various speakers. If you are not familiar with a particular language, check the Project Comments or leave a message for the Project Manager in the Project Discussion for clarification.
Format periods that end sentences with a single space after them.
You do not need to remove extra spaces after periods if they're already in the OCR'd text—we can do that automatically during post-processing. See the Chapter Headers image and text for an example.
In general, there should be no space before punctuation characters except opening quotation marks. If the OCR'd text has a space before punctuation, remove it. This applies even to languages, such as French, which normally use spaces before punctuation characters.
Spaces before punctuation sometimes appear because books typeset in the 1700's & 1800's often used partial spaces before punctuation such as a semicolon or comma.
| Scanned Text: |
|---|
| and so it goes ; ever and ever. |
| Correctly Formatted Text: |
| and so it goes; ever and ever. |
Leave all line breaks in so that the next formatter and the post-processor can easily compare the lines in the text to the lines in the image. Be especially careful about this when rejoining hyphenated words or moving words around em-dashes. If the previous volunteer removed the line breaks, please replace them so that they once again match the image.
Extra blank lines that are not in the image should be removed except where we intentionally add them for formatting. But blank lines at the bottom of the page are fine—these are removed when you save the page.
Extra spaces and tab characters between words are common in OCR output. You don't need to bother removing these—that can be done automatically during post-processing.
However, extra spaces around punctuation, em-dashes, quote marks, etc. do need to be removed when they separate the symbol from the word.
For example, in A horse ; my kingdom for a horse. the space between the word "horse" and the semicolon should be removed. But the 2 spaces after the semicolon are fine—you don't have to delete one of them.
Do not bother inserting spaces at the ends of lines of text. It is a waste of your time for something that we can take care of automatically later. Similarly do not waste your time removing extra spaces at the ends of lines.
Keep line numbers. Place them at least six spaces past the right hand end of the line, even if they are on the left side of the poetry/text in the original image.
Line numbers are numbers in the margin for each line, or sometimes every fifth or tenth line, and are common in books of poetry. Since poetry will not be reformatted in the e-book version, the line numbers will be useful to readers.
Most paragraphs start on the line immediately after the end of the previous one. Sometimes two paragraphs are separated to indicate a "thought break." A "thought break" may take the form of a line of stars, hyphens, or some other character, a plain or floridly decorated horizontal line, a simple decoration, or even just an extra blank line or two.
A "thought break" may represent a change of scene or subject, a lapse in time, or a bit of suspense. This is intended by the author, so we preserve it by putting a blank line, <tb>, and then another blank line.
Project Managers and/or Post-Processors may make the request for additional information to be retained in the thought break markup. For example, some projects delineate different types of breaks by the use of different styles of break such as a line of stars in one place and a blank line in another. In these cases, the Project Comments may request that these be marked up: <tb stars> and <tb>. Please, as always, read the project comments carefully so that you will know what is required for each project. Also be careful not to carry these special requests into other projects with different requirements.
Sometimes printers used decorative lines to mark the ends of chapters. As we already mark Chapter Headers, there is no need to add a "thought break" marker.
The proofreading interface has the "thought break" marker available to cut and paste.
| Sample Image: | |
|---|---|
|
|
| Correctly Formatted Text: | |
|
The guidelines are different for English and Languages Other Than English (LOTE).
ENGLISH: Leave a space before the three dots, and a space after. The exception is at the end of a sentence, when there would be no space, four dots, and a space after. This is also the case for any other ending punctuation mark: the 3 dots follow immediately, without any space.
For example:
That I know ... is true.
This is the end....
Wherefore art thou Romeo?...
Sometimes you will see it with the punctuation at the end; so format it that way:
Wherefore art thou Romeo...?
Remove extra dots, if any, or add new ones, if necessary, to bring the number to three (or four) as appropriate.
LOTE: (Languages Other Than English) Use the general rule "Follow closely the style used in the printed page." In particular, insert spaces, if there are spaces before or between the periods, and use the same number of periods as appear in the image. Sometimes the printed page is unclear; in that case, insert a [**unclear] to draw the attention of the post-processor. (Note: Post Processors should replace those regular spaces with non-breaking spaces.)
Please format these using the proper accented Latin-1 characters, where possible. See Diacritical marks for ways to format some non-Latin-1 characters.
If they are not on your keyboard, there are several ways of inputting these characters:
The original Project Gutenberg will post as a minimum, 7-bit ASCII versions of texts, but versions using other character encodings which can preserve more of the information from the original text are accepted. Project Gutenberg Europe publishes UTF-8 as its default encoding, but other appropriate encodings are also welcomed.
Currently for Distributed Proofreaders this means using Latin-1 or ISO 8859-1 and -15, and in the future will include Unicode.
Distributed Proofreaders Europe already uses Unicode.
For Windows:
| Windows Shortcuts for Latin-1 symbols | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ` grave | ´ acute (aigu) | ^ circumflex | ~ tilde | ¨ umlaut | ° ring | Æ ligature | |||||||
| à | Alt-0224 | á | Alt-0225 | â | Alt-0226 | ã | Alt-0227 | ä | Alt-0228 | å | Alt-0229 | æ | Alt-0230 |
| À | Alt-0192 | Á | Alt-0193 | Â | Alt-0194 | Ã | Alt-0195 | Ä | Alt-0196 | Å | Alt-0197 | Æ | Alt-0198 |
| è | Alt-0232 | é | Alt-0233 | ê | Alt-0234 | ë | Alt-0235 | ||||||
| È | Alt-0200 | É | Alt-0201 | Ê | Alt-0202 | Ë | Alt-0203 | ||||||
| ì | Alt-0236 | í | Alt-0237 | î | Alt-0238 | ï | Alt-0239 | ||||||
| Ì | Alt-0204 | Í | Alt-0205 | Î | Alt-0206 | Ï | Alt-0207 | / slash | Œ ligature | ||||
| ò | Alt-0242 | ó | Alt-0243 | ô | Alt-0244 | õ | Alt-0245 | ö | Alt-0246 | ø | Alt-0248 | œ | Use [oe] |
| Ò | Alt-0210 | Ó | Alt-0211 | Ô | Alt-0212 | Õ | Alt-0213 | Ö | Alt-0214 | Ø | Alt-0216 | Œ | Use [OE] |
| ù | Alt-0249 | ú | Alt-0250 | û | Alt-0251 | ü | Alt-0252 | ||||||
| Ù | Alt-0217 | Ú | Alt-0218 | Û | Alt-0219 | Ü | Alt-0220 | currency | mathematics | ||||
| ñ | Alt-0241 | ÿ | Alt-0255 | ¢ | Alt-0162 | ± | Alt-0177 | ||||||
| Ñ | Alt-0209 | £ | Alt-0163 | × | Alt-0215 | ||||||||
| çedilla | Icelandic | marks | accents | punctuation | ¥ | Alt-0165 | ÷ | Alt-0247 | |||||
| ç | Alt-0231 | Þ | Alt-0222 | © | Alt-0169 | ´ | Alt-0180 | ¿ | Alt-0191 | $ | Alt-0036 | ¬ | Alt-0172 |
| Ç | Alt-0199 | þ | Alt-0254 | ® | Alt-0174 | ¨ | Alt-0168 | ¡ | Alt-0161 | ¤ | Alt-0164 | ° | Alt-0176 |
| superscripts | Ð | Alt-0208 | ™ | Alt-0153 | ¯ | Alt-0175 | « | Alt-0171 | µ | Alt-0181 | |||
| ¹ | Alt-0185 | ð | Alt-0240 | ¶ | Alt-0182 | ¸ | Alt-0184 | » | Alt-0187 | ordinals | ¼ 1 | Alt-0188 | |
| ² | Alt-0178 | sz ligature | § | Alt-0167 | · | Alt-0183 | º | Alt-0186 | ½ 1 | Alt-0189 | |||
| ³ | Alt-0179 | ß | Alt-0223 | ¦ | Alt-0166 | * | Alt-0042 | ª | Alt-0170 | ¾ 1 | Alt-0190 | ||
1Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)
For Apple Macintosh:
| Apple Mac Shortcuts for Latin-1 symbols | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ` grave | ´ acute (aigu) | ^ circumflex | ~ tilde | ¨ umlaut | ° ring | Æ ligature | |||||||
| à | Opt-`, a | á | Opt-e, a | â | Opt-i, a | ã | Opt-n, a | ä | Opt-u, a | å | Opt-a | æ | Opt-' |
| À | Opt-`, A | Á | Opt-e, A | Â | Opt-i, A | Ã | Opt-n, A | Ä | Opt-u, A | Å | Opt-A | Æ | Opt-" |
| è | Opt-`, e | é | Opt-e, e | ê | Opt-i, e | ë | Opt-u, e | ||||||
| È | Opt-`, E | É | Opt-e, E | Ê | Opt-i, E | Ë | Opt-u, E | ||||||
| ì | Opt-`, i | í | Opt-e, i | î | Opt-i, i | ï | Opt-u, i | ||||||
| Ì | Opt-`, I | Í | Opt-e, I | Î | Opt-i, I | Ï | Opt-u, I | / slash | Œ ligature | ||||
| ò | Opt-`, o | ó | Opt-e, o | ô | Opt-i, o | õ | Opt-n, o | ö | Opt-u, o | ø | Opt-o | œ | Use [oe] |
| Ò | Opt-`, O | Ó | Opt-e, O | Ô | Opt-i, O | Õ | Opt-n, O | Ö | Opt-u, O | Ø | Opt-O | Œ | Use [OE] |
| ù | Opt-`, u | ú | Opt-e, u | û | Opt-i, u | ü | Opt-u, u | ||||||
| Ù | Opt-`, U | Ú | Opt-e, U | Û | Opt-i, U | Ü | Opt-u, U | currency | mathematics | ||||
| ñ | Opt-n, n | ÿ | Opt-u, y | ¢ | Opt-4 | ± | Shift-Opt-= | ||||||
| Ñ | Opt-n, N | £ | Opt-3 | × | (none) † | ||||||||
| çedilla | Icelandic | marks | accents | punctuation | ¥ | Opt-y | ÷ | Opt-/ | |||||
| ç | Opt-c | Þ | (none) ‡ | © | Opt-g | ´ | Opt-E | ¿ | Opt-? | $ | Shift-4 | ¬ | Opt-l |
| Ç | Opt-C | þ | (none) ‡ | ® | Opt-r | ¨ | Opt-U | ¡ | Opt-1 | ¤ | (none) ‡ | ° | Shift-Opt-8 |
| superscripts | Ð | (none) ‡ | ™ | Opt-2 | ¯ | Shift-Opt-, | « | Opt-\ | µ | Opt-m | |||
| ¹ | (none) ‡ | ð | (none) ‡ | ¶ | Opt-7 | ¸ | Opt-Z | » | Shift-Opt-\ | ordinals | ¼ | (none) ‡1 | |
| ² | (none) ‡ | sz ligature | § | Opt-6 | · | Shift-Opt-9 | º | Opt-0 | ½ | (none) ‡1 | |||
| ³ | (none) ‡ | ß | Opt-s | ¦ | (none) ‡ | * | Shift-8 | ª | Opt-9 | ¾ | (none) ‡1 | ||
‡ Note: No equivalent shortcut, use drop-down menus.
1Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)
In some projects, you will find characters with special marks either above or below the normal Latin A...Z character. These are called diacritical marks, and indicate a special pronunciation for this character. For formatting, we indicate them in our normal ASCII text by using a specific coding, such as: ă becomes [)a] for a breve (the u-shaped accent) above an a, or [a)] for a breve below.
Be sure to include the square brackets ([ ]) around these, so the post-processor knows to which letter it applies. He or she will eventually replace these with whatever symbol works in each version of the text they produce, like 7-bit ASCII, 8-bit, Unicode, html, etc.
Note that when some of these marks appear on some characters (mainly vowels) our standard Latin-1 character set already includes that character with the diacritical mark. In those cases, use the Latin-1 character (see here), available from the drop-down lists in the proofreading interface.
The table below lists the special codings currently used:
The "x" represents a character with a diacritical mark.
When formatting, use the actual character from the text, not the x shown in the examples.
| Proofreading Symbols for Diacritical Marks | |||
|---|---|---|---|
| diacritical mark | sample | above | below |
| macron (straight line) | ¯ | [=x] | [x=] |
| 2 dots (dieresis, umlaut) | ¨ | [:x] | [x:] |
| 1 dot | · | [.x] | [x.] |
| grave accent | ` | [`x] or [\x] | [x`] or [x\] |
| acute accent (aigu) | ´ | ['x] or [/x] | [x'] or [x/] |
| circumflex | ˆ | [^x] | [x^] |
| caron (v-shaped symbol) | ∨ | [vx] | [xv] |
| breve (u-shaped symbol) | ∪ | [)x] | [x)] |
| tilde | ˜ | [~x] | [x~] |
| cedilla | ¸ | [,x] | [x,] |
Some projects contain text printed in non-Latin characters; that is, characters other than the Latin A...Z—for example, Greek, Cyrillic (used in Russian, Slavic, and other languages), Hebrew, or Arabic characters.
For Greek, you should attempt a transliteration. Transliteration involves converting each character of the foreign text into the equivalent Latin letter(s). A Greek transliteration tool is provided in the proofreading interface to make this task much easier.
Press the "Greek Transliterator" button near the bottom of the proofreading interface to pop up the tool. In the tool, click on the Greek characters that match the word or phrase you are transliterating, and the appropriate Latin-1 characters will appear in the text box. When you are done, simply cut and paste this transliterated text into the page you are formatting. Surround the transliterated text with the Greek markers [Greek: and ]. For example, Βιβλος would become [Greek: Biblos]. ("Book"—so appropriate for DP!)
If you are uncertain about your transliteration, mark it with ** to bring it to the attention of the next formatter or the post-processor.
For other languages that cannot be so easily transliterated, such as Cyrillic, Hebrew, or Arabic, surround the text with appropriate markers; [Cyrillic: **], [Hebrew: **], or [Arabic: **] and leave it as scanned. Include the ** so the post-processor can address it later.