PP with Gedit
Gedit is a powerful text editor that has full Unicode support, including bi-directional capabilities. It allows for plugins which can do more powerful tasks and links to external tools, such as GutCheck, JeeBees, validation, etc. Many of these techniques can be used with other editors as well. This tutorial assumes you are running Windows. If you are running on a linux box, a lot of the setup stuff will be easier.
I am still learning how to use it and as I learn, I'll be updating this with details on how to PP using it. For now, I'll include my checklist. As I go I'll add instructions on how to do each task.
Note: At this point I've stopped working with Gedit as the spellcheck does not work in Windows. Until that is fixed I will not be continuing this tutorial.
Save only as often as you wish not to lose your data
setup gedit
- gedit does not do regular expressions by default. There is a Regular Expression Plugin that will do it.
- Download it and unarchive it.
- It is in tar.gz format, so you will need something that can open that file. Windows does not understand either tar or gz by default. You will need a program such as 7 Zip that will be able to open both.
- Put the contents of it (a folder called regex_replace and a file called regex_replace.gedit-plugin) into C:\Program Files\gedit\lib\gedit-2\plugins
- On Windows the plugin does not work without glade. This is a large file that takes a while to install. If anyone can find another, less bloated way, to get regex to work, let me know.
- Download and install Glade
- Open gedit, click Edit > Preferences > Plugins and put a check next to "Regex Search and Replace".
- Restart gedit.
- Now you can do regular expression searching by clicking Search > Regular Expressions
- Download it and unarchive it.
- You can turn on some other useful plugins at Edit > Preferences > Plugins:
- Change Case - allows you to change the case of selected text
- Indent Lines - allows you to indent a selection of text. You can alter the indention amount within Edit > Preferences > Editor by changing the "Tab Width"
- Join Lines/Split Lines - Allows you to rewrap the text. You can change the right margin at Edit > Preferences > View. Check "Display Right Margin" and change the "Right margin at column" to be the width you want it at (72 is the PG standard)
- Snippets - Allows you to add in some snippets of code that you use often. Especially helpful for HTML, as you can store your CSS header information, the way you liked your TNs to look, or page numbers, or whatever bits of code you use often.
- Spell checker - Can't work without it.
- You will need an image viewer to look at the images and edit any illustrations.
- IrfanView is good for flipping through pictures and for minor edits. Has a good batch editor.
- XnView is good for flipping through images and edits. It has more options than IrfanView. Has a great batch editor.
- GIMP is great for major edits, but you cannot flip through the images and has no easily usable batch options.
- Install DPCustomMono on your computer
- Follow the directions to install it
- in Gedit click Edit > Preferences > Font and Color and set the editor font to DPCustomMono
prep (get the file ready)
- Decide on Transcribers Notes: It is best to make a decision on your TNs early. Keep a text file with everything you want in your TN, or create a space in the file to put them as you work. I like to use a separate file so that I don't have to flip around in the file so much.
- Check for missing pages and scan readability:
- Use an image viewer to flip through each of the PNGs and make sure that every page is there.
- While you flip, make sure that everything is readable. Check footnotes and other small text as well.
- You should check that the page numbering does not skip any,
- that blank pages and title pages are all there,
- and that nothing is missing prior to the first numbered page (ie if the 005.png is real page number vi or 6, then there is a missing page prior to this. Could be a blank, but we want all pages accounted for.)
- If anything is missing, check the project comments to see if the PM knew and has verified that it is a printers error (these happen) and if not, contact the PM about it.
- If anything is unreadable, it should be replaced as well and run through the proofing rounds.
- If the PM does not respond then it is up to you to find it (the PM may also ask you to do it). You can either find it yourself, via the Internet, or from a library or you can contact the MP finders
- Replace carriage return, new line with just a new line. Search for "\r\n" replace all with "\n"
- Remove blank lines before page separators. Search for "\n\n(-----File: .+)" Replace with "\n\1"
- Remove page separators: for now those big clunky page numbers can be replaced with something like {{{###}}} and then remove them for the plain text later and replace them with the proper HTML for the HTML file. Search for "-----File: ([0-9]+)\.png---.+" Replace with "{{{\1}}}"
- Remove [Blank Page] markers: regex search for "\[Blank Page\]\n" and replace with nothing. Rarely there is a blank page at the end of the file. Do a search for [Blank Page] to verify that none were missed by the above search.
- Check Proofers Notes: Search for "*". Fix any proofers notes. Sometimes some need to be left for later, but it is good to at least look at the proofers notes early so you know what to expect.
First pass
Flip through each page, quickly checking each image to the scan, looking for anything odd.
Check that the following are formatted correctly:
- Front Matter: Some people center it with the right side being at 72. I prefer to have it left aligned. The title should be in an <h1> and nothing else in the front matter should be in an <h#>. <h#> is semantic; it can be used by some automatic processes, such as the ePub creator to make up an outline of the book. For more information on this, see the ePub guide to Table of Contents.
- Table of Contents: Normally the text is left aligned and the page numbers right aligned. Some people will create one, if one does not exist. Should be indented 2 spaces as a whole.[1]
- List of Illustrations: Like the ToC. Some people will create one, if one does not exist.
- Tables: Look at them all, make sure they look right in the ASCII version. Should be indented 2 spaces as a whole.[1]
- Poetry: Check indenting. Should be indented 2 spaces as a whole.[1]
- Blockquotes: Should be indented 2 spaces as a whole.[1]
- Hanging Indents: This is when the first line of the para is indented 2 spaces and the rest is indented 4 spaces. <<<Look into how to do this with gedit>>>
- Illustrations: Make sure they are all formed by the guidelines and they all exist as expected.
- Footnotes: Check that they exist and that the numbering matches at the marker as well as the footnote.
- Sidenotes: Like footnotes
- Glossary: Check to make sure it looks like you want it.
- Index: Check to make sure they look how you want them to
- Language: If there are other languages in the text, you will want to know about them. Give them a quick look over and make sure the proofers did them right. ie [Hebrew: **], [Greek: tês chronologias] etc.
- Other: Some times there are other odd things. Make sure they look right.
Footnote 1: To indent, select the section to indent and type <ctrl>+T or hit Edit > Indent.
Markup check
Closely check markup
Check that all markup is well formed (begin and end, punctuation is in right place, lists are separated, etc)
- bold <b>, smallcaps <sc>, italics <i>, gespert <g>, should be formed by guidelines. You can search for each one separately to check them, or search for "<" to find all of them.
- Ellipse, search for ".." and make sure that they are formed properly
- Check Initials: You can standardize them by using this regex: ie {regex: (\b[A-Z])\. ([A-Z])\. replace: $1.$2.} or you can just verify that they match the scans.
- Thoughtbreaks: Make sure they are formed properly (<tb>)
- Super (^) and Sub Script (_)
- Check em and long dash
- search for "---" to look for malformed em and long dashes
- normalize long and em dashes check that the proofers used them in the same way throughout. ie, long dashes are used when a work is omitted, not em dashes
- Fractions: If 1/2, 1/4 and 3/4 only, replace with latin-1 fractions, otherwise, leave them as is.
- Check for Orphaned Brackets <<<How to do this in gedit?>>>
Clean up
- Check Transliterations: If there are any non roman script languages you need to do something with them. You will probably need a UTF-8 version (see below) but for the plain text you will need to either use [Arabic] or whatever language to let the reader know they are missing something, or transliterate it, like we do for Greek.
- Check section and chapter breaks: search \n\n\n (three lines) to check that sections and chapter breaks are correct and no extra linebreaks are inserted
- Check for alphanumerics <<<How?>>>
- Check hyphens <<<how?>>>
- Run Rex-Ex checks <<how?>>>
- Run Scanno check <<how?>>>
- Run Spell check You should have installed the spellchecker above. Hit <shift>+F7 or Tools > Check Spelling
- Run Jeebies <<how?>>>
- Run Gutcheck <<how?>>>
- Run PPV Check <<how?>>>
- Remove end-of-line spaces <<how?>>>
- Footnote Fixup <<how?>>>
- Sidenote Fixup
- Search "[Side" to look for mal-noted sidenotes, such as [Sideline
- Search "[Sidenote" and step through each one. Make sure they are in the correct place.
- <<how?>>>
Final pass
- Page through the document looking for anything odd.
- Check Transcribers Notes
- Re-Read them for errors
- Check Markup
- Spellcheck Them
- Verify all proofers notes are taken care of
- Save and copy for plain text and HTML. Save for UTF-8 if needed.
Latin 1
[oe]->oe Check any other non latin-1 characters and decide how to handle them Replace <i>, <b>, <g> with _ (and + if needed) Remove <sc> or uppercase contents, or replace with _ if needed If any other HTML, take care of it If you need ASCII, save for that file ReWrap Correct Tables Page through the document looking for anything odd Clean up rewrap markers gut check again
SR
Upload for SR Correct any errors found in SR
ASCII
Only create an ASCII file if automated changing of Latin-1 characters would mess up the tables, or something. Make a copy from Latin-1 Change accented characters to [xx] format Add to Transcribers note about what that format means ReWrap Clean up rewrap markers Correct Tables Page through, looking for any problems gut check again
UTF-8
Only create if there are characters in the document not found in Latin-1
Replace [xx] markup with UTF characters Save copy for HTML ReWrap Clean up rewrap markers Correct Tables Page through, looking for any problems Remove the Byte Order Mark gut check again
HTML
Check page lables Auto Gen HTML Clean up HTML (validate) Remove Auto Gen ToC Clean up Front Matter Fix Title Check em-dashes and long dashes [oe]-œ (and any other odd characters as needed) Check Page Numbers Check Fractions Fix Tables Check Footnotes Check Sidenotes Links for See ToC and LoI links Index Links Illustration Turn transcriber's notes into rollovers Page through looking for anything that needs to be fixed Link Checker Tidy CSS Validation HTML Validation PP HTML Check in IE and FF Spellcheck and smooth read transcriber's notes
Upload
If working in Linux run unix2dos for all documents Upload When posted, do a dance. Update your wiki page with the link.