User:Miller

From DPWiki
Jump to navigation Jump to search

This walkthrough assumes that you have already downloaded and installed: Guiguts, Aspell, Tidy, Irfanview and Xnview and you have Windows. Please make special note of the title. This is M's post-processing process. It will absolutely not be anyone else's. Please feel free, and please do, edit grammar errors. Please do not change the instructions to what you are sure is correct—I'd encourage you to document your own method too. Thank you so much. My GG is 1.0.25 with Preferences/Appearance set to the Old Menu style. If you use a different version or a newer menu style, some of the tools will be in different locations.

It is highly recommended that first-time post-processors choose easy books without footnotes or indexes (not tables of contents). This is so that you do not get discouraged by the amount of detail and work involved in those projects right at the start. The sections you see listed at the end of this walkthrough are intended for those with a little more experience in our process to plug into the process where appropriate. You are free to choose such a project as your first in any case. If you've chosen such a book, you are free to set it aside and chose an easier one and then go back to the more difficult project. Have I emphasized that enough yet? Biggrin.gif Okay, let's get started!

Getting Started

Making a place to keep notes

Open a text file with Notepad and save it as "notes" in whichever file your project will be in. For example, for this walkthrough, I am using Sure Pop and the Safety Scouts. I will save this file in my DP folder in a folder of its own. I have a folder on my computer dedicated to JUST DP stuff. I've named this new project folder "surepop". Always use lowercase letters when naming your files as it will save you heartache later when you're renaming your illustrations. Trust me.

Go to your project's home page.

Download your project files

Scroll down until you see "Download Zipped Images." Click on this link. It will immediately start downloading. The more images your project has the longer this will take as your Project Manager will have included larger, high-resolution versions of your illustrations and possibly cover, end papers, etc. Once your files are downloaded, choose "Extract" or "Unzip" in and choose C:\DP\surepop\pngs (You'll have to type in your project's name and the pngs bit.) This will create a folder within your folder that holds all of your page images. Now delete the zip archive. You don't need it and it will just take up space on your computer. (Worst case, you can always download it again from DP)


Now go back to the project's home page and this time choose "Download Zipped Text". This download will take about 3 seconds. In your new zip you will find
1) The Good words list
2) The Bad words list
3)The proofed and formatted project text named something like: projectID48d4451eaf6c4.txt and
4) a link to the project discussion named projectID48d4451eaf6c4_comments.html

Now. The awful truth. I never use anything except the proofed and formatted text. So, I right click on the projectID48d4451eaf6c4.txt and choose extract. This goes into C:\DP\surepop This will also take about three seconds and it may seem like nothing happened. You can check by going to My Computer->C drive->DP->surepop. In surepop will be a folder titles "pngs," a text file named "notes," and a text file named projectID48d4451eaf6c4. (or whatever your numbers are.)

You are now done downloading and your project is all tidy in one place.

Read the Discussion

While your images are downloading, scroll up to the link "Discuss this project." Read through any comments on that thread so that you're aware of any anomalies facing you. Put any notes from the thread that you might need on your saved "notes" page to remember to address the issues when you find them, i.e. Eileen writes on the thread, "Page 7 has a description of the diagram on page 37. PPer may want to link that in the html." Copy that onto your notepad file to decide later after looking at it.


Illustrations

First things first. Let's get the illustrations out of the way and done. This will be very basic. Hopefully your projects do not need a lot of fancying up. There are people who are willing to help out if they do, so don't despair. We're just going to assume that they are straightforward though.

Open Irfanview. Click on the open file button at the top left. Browse until you find your pngs folder. Double-click on that and scroll to the end (In Irfanview this will be sideways) until you find the first illustration file. It will be after the numbered pngs and be named something like i001 or illus-001 or cover. Double-click on the first one and it will open for you.

Often images need to be cropped because there is a lot of useless white-space around the actual illustration. To do this: click in the upper left corner, hold the mouse down and drag to the lower right corner. You've now highlighted the part you want to save. Click Edit->Crop selection.

Next it is time to resize. A good rule of thumb is that the longest side of your image should be no more than 400-600px. Click Image->Resize/Resample. I am using 450 px for my longest side as in the original the illustrations do not take up the whole width of the page. Your illustrations should be relatively sized to the original. For example, an illustrated drop cap usually only takes up about a fraction of a full page and so would usually only be about 100-125 high.

Make sure that the checkbox "Preserve aspect ratio" is checked. Then I put 450 in the width box and the height box adjusts itself automatically.

If your picture is black and white, such as a line drawing, you'll need to decrease the color depth to make your illustration's file size smaller. Click on Image->Decrease color depth. Check the little box in the middle for 16 colors. There is a little floppy disk in the tool bar. Click on this "To save as". A menu pops up; use the up arrow on that little tool bar on the little window to get to "surepop." Use the folder with the little asterisk icon to create a new folder (or right click in the big white space). Name the new folder "images". All lower case. Now doubleclick on your newly created folder. At the bottom Irfanview will already have taken the original name of the illustration image. As long as it is lowercase with no spaces, you can just leave it as that. You'll need to decide to use jpg or png. If your illustration is simply black and white, then png is what you want. For illustrations with more color use jpg. Use the menu beside "Save as file type" to find either png or jpg. Click save and you are done. Now Irfanview will save your last file type so if all of your illustrations are black and white you can just leave it on png. If you have both though, be careful to change it back and forth appropriately.

For Surepop, the illustrations are all color so I can just crop, clean up a bit if necessary and save.

So to sum up: Crop->Clean up if necessary->Resize->Reduce color if black and white->Save.

Starting the Text

Time to work on the new project. Open GG. It will say No File Loaded. So let's load one. Go to File->Open->surepop->projectID48d4451eaf6c4.

Save your file as surepop-firstpass.txt or whatever name best fits. This will leave you with an untouched text file that you can always go back to (projectID48d4451eaf6c4.txt) and also your working copy (surepop-firstpass.txt). Never, ever, ever keep that long projectID48d445 thing in your project's name. It isn't descriptive and sends PPVs and Wwers into fits. Short, descriptive, lowercase, unspaced names.

First Pass

We're going to start the first pass (looking for oddities) and adjusting page numbers at the same time. Click anywhere past the first

-----File: 001.png---\RJMAustin\SMH\Janet\JHowse\tenaj\--------------------

line.

Adjusting Page Numbers

You'll notice that the second line from the bottom of GG has a lot of little boxes. Nine to be specific. Right click on the box named "Lbl: None." This will open your page numbering tool. Go back to the main GG window and click on the "See Image" box. This will open your first png with Xnview. Don't worry if you see png.002, pushing "home" on your keypad will always take you to the first png as "end" will take you to the very last illustration.

Change the Title Page Mark-up

The formatting rule is that everything on the title page gets wrapped in /* */ markup. I change that to only include things that I really don't want rewrapped such as any poetry or the publisher's information. If you leave the /* above the title, then GG will not convert it as it should, so if nothing else, move it below the title, above the author's name.

Looking for Oddities

Look at the first page. Often it is blank. For me it is the Title of the book. It already has the four blank lines before it as I like, so I can click "page down" on my keyboard and go to the next page. The second page has a quotation. My formatters saw this as a poetry quote but I think it is a blockquotation so I change the /* */ to /# and #/ and put the signature in its own / * */. This is the point of a first pass. Making sure things are the way you'd like them to be. So far none of my pages have visible page numbers. That's okay. Just keep going through the first few pages making sure that your Title page, Copyright page, Table of Contents, etc. have the markup and line spacing that you want. Especially watch for missing italics or smallcaps. These pages have a lot going on and stuff is easy to miss.

When I get to the Table of Contents, I see that it says that the book's introduction is on Arabic page number 1. When I turn to the next page, which is a pledge the bottom of that page says it is Roman numeral page vi. This is my png.006. That matches exactly nicely. I go to the Configure Page Numbers tool and Change the first box to Roman. I leave the Start @ 001 in place as that's just what I want this time. The book's Arabic page 1 is my png.007. So I change the box next to Image# 007 on the CPN tool from " to Arabic. Then change the +1 to Start @ and type a 1 in the box. At the very top, press "Recalculate" and you'll see those first numbers change to Roman and the rest start at 1 at Image# 007.

Okay, back to "First Pass"ing. Go back to Xnview and use the page down key on your key pad to pretty quickly flip through the pages of your book. Stop if you see anything that you want to check or change. For example, on png.017 of my book, there is an illustration that is at the bottom of the page. I want to be sure it landed between paragraphs. I go back to GG and in the same row we've been clicking boxes on, click on the third box, "Img: 007." A little Goto Page Number box pops up and I type in 017 and enter or Okay. That takes me right to the page of the text I wanted to see. Looking at the image and text, I notice that that particular image has to be right aligned to make sense, so I make a note on my "notes" notepad page. As these notes will only ever been seen by you, you only have to make them make sense for you. In this case, I typed: "right align illo on png.017."

Words split across pages: As you scroll through, you'll probably find some words split across pages. You can:
1) rejoin those words on your first pass (Don't worry, if you miss any GG will find them for you later)
2) wait until you remove page separators and rejoin them then. or
3) the more controversial, let GG rejoin them knowing that
a)if the line in the html ends oddly your word could be split visually and
b) some PPVers hate this.
"3" maintains the original integrity better in that it shows that the word was split in the original but, usually, I don't think this is a semantic worth saving as we rejoin all other words split over the ends of lines. I tend to do "1" as I like how it makes "Removing Page Separators" that much faster. I inevitably miss one or two, but as I said, GG catches them for me and lets me fix them.

Watch for correct chapter spacing as well. I just found a chapter header with only 3 blank lines before it. GG will NOT find those for you and so will not format those properly in HTML. Also watch for wrong spacing or missing blank lines after #/ and */ markup. Without the blank line after, as I just found on one of my pages.

/*
--<sc>Sure Pop</sc>
*/
[Illustration]

Would have been wrapped by a very confused GG into

/*--<sc>Sure Pop</sc>*/[Illustration]

Other notes I leave for myself:

move ! into </i> [This is AFTER the text is done and I am working on the html so that the italic words will not bump into ! ? ; :]
fix note on 063 [This note is more unusual than the other notes in the book as it has more than one line of centered text and some blockquoted text]

You'll find as you go things that you'll want to be reminded of and things you'll just do automatically. Notes are good for me if RL takes over and I have to come back to a project after a few weeks and have no idea what I've done. I always put at the top of my "notes" what I did and am going to do next, i.e. "Finished proofer's notes; time to -*"

Illustrations and Blank Pages in the Page Count Frequently on a first pass, you will find full page illustrations followed by blank pages. These are most usually not numbered. Sure Pop has none of these, but this is where adjusting your page numbers comes in as well. First, say the illustration and blank page following are on png.023 and 024. Go to "Configure Page Labels" and for those to Image#s, click the button labeled "+1" next to them until it says "No Count". Then push the Recalculate and you'll see that it now skips those Image#s in its count. Check to make sure that the next page after the Illustration and Blank Page matches what your Configure Page Labels now says. Delete all [Blank Page] notation. This is only to hold the place while proofing and should not appear in your finished project. You can delete all [Blank Page] notes by using this regex (Regular Expression). Open your Search and Replace tool found under, cleverly enough, Search and Replace. Check the "Regex" and "Start at Beginning" boxes are checked.

Search for: \n\[Blank Page\]\n
Replace with: \n

Watch for: [Illustration: ] This is not right and needs to be fixed to [Illustration].

Final page number Now look at your book's last page with text on it. Mine is 130. I check to be sure that matches the last Label on Configure Page Labels. If it doesn't, it means that I made a mistake somewhere. Usually by forgetting to not count a blank page and/or illustration. Use the "page up" key on your keyboard to quickly flip back to the last illustration and check it. Once the numbers are correct, click "Use These Values" at the top of the CPL. The box will close and you will see that the box at the bottom that used to say Lbl: None now says something like Lbl: Pg 130. That will be very helpful for any transcriber's notes.

Page Separators

Now that we've been through our first pass, time to remove the page separators. GG knows where it is and will pause at ANYthing it finds odd, so on the task bar of GG go to ->Fixup->Fix Page Separators. A little box will pop up. I immediately grab the title bar and move it to the bottom of the window so that I can see the main GG window. Check the Full Auto box. Yes, it will be okay but if you're unsure feel free so save a copy of your file, just in case. Now push the refresh button and it will highlight the very first page separator. As the right number of blank lines are already there, I can just push "Delete" You'll see there are many options to choose from. "Delete" is the one you will use most often, but if you find a blank line is missing, just push that button and it will add it and move on. It will stop on any line with blank lines after it, before any line that starts with a capital letter, or any punctuation or <x> tag or after any that end with * or – or >, etc. After you fix or adjust anything, just push Refresh and it will take you on. When you reach the end of your file, close that window and save again.

Proofer's Notes

Proofer's Notes: The proofers and formatters will have left you notes with questions that they had. On the task bar of GG, go to Search->Search & Replace. A long window opens. Grab that and move it to the bottom (You'll do this pretty much every time so that it doesn't obstruct your view or disappear behind the main GG window.) Uncheck Whole Word and search for [*

This covers most malformed notes. My first note says

goodby[**goodbye]

I've already seen "goodby" in the text so I know that is at least one of the ways this book spells it. Just to be sure though, I search for "goodbye". It only shows up in proofer's notes, so those are safe to delete and ignore. I also search for "good-bye" and find that it isn't there either. I put [* back into the search window and go on.

Goodby[** typo?] can again just be deleted.

Missing or wrong punctuation will need to be fixed or noted by you. On my notepad "Notes" I scroll down a bit and start a section entitled "Transcriber's Notes:" Here is where I'll put things that I changed or anomalies that I don't want Project Gutenberg to be annoyed by getting errata reports for. For punctuation, I almost always just fix all obvious errors and then include a blanket note. "Obvious punctuation errors repaired." On fiction, especially, wrong or missing punctuation is rampant and my list of Transcriber's Notes would be enormous if I noted every one. Do be absolutely sure that you are really only fixing broken things. For example, semicolons were used a lot more than they are now in dialogue. "Look out," she whispered; "I don't want you to fall!" is not uncommon. Both exclamation points and question marks can occur in the middle of a sentence. "Don't go! can you not just wait a moment?" is something that you might see and is not, for this time period, incorrect. A lot of our books used single quotes where we would now use double and also had double quotes within double-quotes. If you have any questions, ask in the forums. The depth of grammar knowledge on DP is amazing.

Words that are missing or misspelled, you have a choice: 1) Leave it and put a note at the end that you did so; 2) Fix it and note it at the end. I tend to fix and note as I think that is what the author would have preferred and it will make it easier on the reader. Notes look something like:

Transcriber's Notes:

Obvious punctuation errors repaired.

Page 91, word "to" added to text (minute or two to)

Page 103, word "as" added to the text (just as she had)

Page 104, "hedge-hog" changed to "hedgehog" (send the hedgehog to)

I include the actual book's page number to give the reader an idea of where the text is and also a small bit of the text so that they may search for it. This way if I've made an error, they can find and repair it and they also will know what the original said. Be wary of fixing spelling that was accurate then. For example, it would have been an error to change all of the "goodby"s to "goodbye" or "good-bye." That was correct for the time and this text. Once again, if you are unsure, ask in the forums and often someone will be able to find it in an old dictionary.

Questionably broken words or -*

Once you are through all of the [**notes], you can search for -* to make them as consistent as you reasonably can.

The first one that GG stops on for me is "house-*keeper". First I search for it as "usekeeper"and find it doesn't appear. Then I try "use-keeper" and find that isn't in the text either. I leave off the first letter or so in case it appears capitalized. This lets me see every time it shows up. Except it doesn't in this case. Hmm. This is now my decision. It is usually safe to retain the hyphen as hyphens were much more prevalent. In this case however, I think "housekeeper" is more usual, so I remove the -* and go on. For "street-car" I search for "streetcar" and don't find it but "street-car" shows up mid-line on another page so I know that the hyphen should be retained. Sometimes you will find words with and without hyphens mid-line. "streetcar" and "street-car" could both appear due to printer's whims. In that case, I usually go with the one most used on the end-of-line one, and put a note at the end.

Both "streetcar" and "street-car" were used in this text. This was retained.

This should, again, help PG not to get errata reports on this issue.

Remember to save your file often and save a copy with a different name at all major edits.

Thoughtbreaks

Convert all of your DP-style thoughtbreaks <tb> into Project Gutenberg-style thought breaks.

        *       *       *       *       *

We used to have to search and replace for those but now GG does it with one click. Go to Text Processing->Convert <tb> to asterisk break.

Orphaned markup

Once you've handled all of the -*, it's time to look for lonely markup. On the GG task bar go to Search->Find Orphaned Brackets and Markup. Again, move it to the bottom as this one will dive behind your main window almost every single time. This will search for /* that end with #/ or with nothing at all. Or ( that open but don't close. Or <i/ instead of <i>. Just click through each little radio button. It doesn't find things often but you'll be very very glad that it did if it does. One more orphan check to run through and this bit is done. Go to Fixup->HTML Fixup. A BIG menu with lots and lots of options opens. Ignore all of them at this point except: on the middle right "Find orphaned markup". This will search for <i>, <b>, <sc> that open but do not close. Or close but never opened.

My search finds:

<i>He looks where he goes and keeps to the right.</i>
He crosses at regular crossings, not in the middle of the block.</i>

which is not only a missing <i> but a missing paragraph break on this page. If nothing appears, then your file is good!

You can quickly run through your italic endings for missing punctuation. Search for </i>. Remember if the full sentence or phrase is italic the .!? goes inside the markup.

Jeebies

Time for jeebies! Go to Fixup->Run jeebies. This will give you a list, hopefully not too long, of every "he" that maybe should be "be" and vice versa. Almost all of these will be false positives but it's better to check than to have your PPVer or WWer point it out to you later. Double-click on the first question and read the text that shows up in the window, if it's right, right-click and it will be removed from the list of questions and move on to the next one. If it's wrong, check the original and fix it and note it if the original was wrong too. Then right-click and on to the next one.


Word Frequency

This is one of my favorite steps. Go to ->Fixup->Run Word Frequency Routine. It automatically runs on Frq which I find not as helpful as Alphabetically, so I check "Alph" immediately and run it again. Then I work my way through most of the buttons.

Emdashes: I run quickly down the list of em-dashes to check for any "suit--cases" that should be "suit-cases" and the like. Double-clicking on any of the words in the list will take me to the one that appears in the text. If I want to compare it to the original I click on the that "See Image" button on the bottom of the main GG window.

Hyphens: Here the first line will tell me any "suspects". These occur when a word shows up as both hyphenated and not in the text. Mine says 32 words with hyphens, 1 suspect (marked ****). While I am most interested in that suspect, I also check as I go down the list for words that shouldn't be hyphenated at all. Such as "in-the." (That wasn't in my text, it was just an example.) My one suspect was "tiptoed" which also shows up as "tip-toed." Once each. After searching for "tiptoe" and "tip-toe" to see if it shows up in another form, I find that those are the only two instances, so I can't get a majority ruling. I decide to leave a note in my Transcriber's Notes: Both "tiptoe" and "tip-toe" were used in this text. So far, for this text, this is my first actual transcriber's note. If it turns out to be the only one, I may not bother with the note at all as it does match the original text. If there were many inconsistencies like this, I'd leave the note.

Alpha/num: This is a very quick check to find any lone1y numbers floating amongst words as that 1 in lonely. Your list will consist mainly of 1st, 12mo, etc. which are correct.

All Words and Check Spelling are buttons that I skip. We'll be running the more thorough Spellcheck later.

Ital/Bold: This button will list all italic or bold words just as it says. Check those for punctuation that is in that should be out. We'll run a check for out that should be in, later.

ALL CAPS: Another button that I skip.

Mixed Case: Here is where we check that all MacArthurs ended up as MacArthurs and not Macarthurs.

Initial Caps: I don't use this button now, but I may use it later if I find some anomalies in proper names. For example, one of my books used Molly most of the time but Mollie a few times. Those few Mollies I changed and left Transcriber's Notes about.

Character Counts: This will show you which characters are there. There should be the same number of [ as ], but we checked that already with Orphaned Brackets. At the very bottom of the list are the very odd characters. The British pound sign sometimes shows up here and a double-click will often show it was in place of an "f." "o£" This time though, Surepop is clean.

Check , Upper: This checks for every comma that is followed by an upper case letter. Now most of them will be correct due to proper names. Keep an eye out for , The; , What; , There; etc. This check will show up again in Stealtho checking. So you can do it thoroughly now, or thoroughly then. On this one, I paused over ", Your" as that seemed odd. Double clicking on it took me to the place in the text and it showed: "'Oh, Your Majesty, let" which is of course, fine.

Check . Lower: This will show you every time a period/full-stop is followed by a lowercase letter. This will show up most often for things like, etc. A.M. gym. i.e. and so on. Every now and again though, it will show up in place of a comma, which will need fixing. If it is that way in the original, then fix with a TN, if it is correct in the original text, then just fix it and move on merrily. This book hasn't got a single one.

Check accents: This is a great check. It takes every accent and then lists any words that show up similarly. For example: resumé and resume and even resumè; coördinate and coordinate. On words like coördinate, it will not find co-ordinate. You'll want to check for that yourself. Again, being an easy kiddie-lit book, no accents to be found.

Unicode > FF: This is a button I've never used. No real reason except that it doesn't ever find anything for me.

Stealtho Check: Skip this button as you're going to do a more thorough check now.

Close the Word Frequency window.

Stealth Scanno

In the upper tool bar, go to Search->Stealth Scannos. A window will pop up with three files.

Regex

For no really good reason, I tend to go with Regex first. Either double-click regex or click and choose Open. The search and replace window will open. Grab it by the title bar and move it down so you can still see your GG window above. You'll see that the regex box is already checked for you.

At the bottom of the Search & Replace, check the box that says "Auto Advance." This will make it skip checks that are not present in your text. It's very clever. It is now going to run through a list of regexes (computer terminology for code that looks for certain things like two spaces after a . instead of one) If you click Search and nothing happens, click it again to be sure and then you can rest assured that that item is not in your text. The more complicated your text is, the more checks the regex will pause on. Click "Next Stealtho." For me it stops on "R. W." because it used to be a DP rule that all initials had the spaces removed. Now the rule is match the scan so we can leave it and move on. Sometimes this check is very handy for checking for consistency between A. M. and A.M., etc. In this book, that's not a problem.

This is where your more thorough check for , Upper can take place.

And so the checks go on. For me, it paused on "heartstrings" because there were five consonants in a row without a vowel and it thought that might be a problem; it wasn't, so I clicked "Search" again and it found nothing else. On to "Next Stealtho."

Next it highlighted each repeated word (dittograph, if it's an error). For example:
He had had cherry pie for dinner.
He went to the the store.
The second would probably be an error and need to be repaired and noted in the Transcriber's Notes. I see that all of mine are correct and move on.

Save. Often. With different names. This phrase will show up a lot. Once you're completely comfortable with PPing, you'll save different versions less often.

It paused on a line that was over 75 characters long. We can ignore that check right now. It will be fixed up later.

Then it will search for combinations of letters that it thinks might be questionable. "tli" that possibly could have been "th" in the original. Since it is in the middle of "whistling," I'm safe to ignore. I skip most of the letter combination checks because I'm going to do a thorough spellcheck later.

It will also point anything in brackets out for you. This will help spot any malformed proofer's notes, the [oe] and [OE] markup that you'll handle later, (add this to your list of notes to yourself if it finds one), and any footnote markers.

At the end of the regex stealtho check, the Search and Replace pop-up will have 36/36 at the bottom, meaning that all 36 checks have been done.

en-comm

Go back to Search->Stealth Scannos and choose "en-comm". I skip the "misspelled" as, again, I'll be doing a thorough spellcheck soon. En-comm checking will list stealth words that could be other words. It will usually list the uncommon word first. Click the "Whole Word" box so that you can skip checking every single date in your book for the 1 check. You are looking for 1s that should be Is. All of my ones were correct. They were parts of lists and prices and the first page in the table of contents.

Next it stops on "bad" which could be "had." Click through them checking. If you are unsure, you can always click "See Image" at the bottom of the GG window. A new button has appeared on the search and replaced named "Swap Terms." This button will do just what it says, stop looking for the "bad" and look for all of the "had." I am not going to check the 191 instances of "had." If a proofer had noted many had/bad errors or I'd seen any, then I would. Sometimes you have to make the call about what is the best use of your time and what will give you the best end result.

I do check all of the instances of "ball" and "hall". Sometimes these checks show up not only stealthos but inconsistencies in the text. It has on occasion found instances where every other hall-way is hyphenated, but there is a hall way lurking.

Keep clicking through the stealthos. We've already checked the "be/he" with jeebies so that is one giant check we can skip. Again, I skip checking the 1636 instances of "the."

Spellcheck

Always make sure that you've run Word Frequency before running spellcheck. Doing this will make your spellchecking much faster. As long as you've run it since opening GG, you're good to go. You can rerun it at any time. Go to Search->Spellcheck.

I drag the window to the far right of my screen and resize it by clicking and dragging on the left side of the Spellcheck window so that the "resume at bookmark" button is completely invisible. I won't be using it and it gives me more of the GG window to see. This book says I have only 66 words to check. You will probably not be so fortunate. The first word it questions for me is "Pigmy." It only occurs one time in the text (with mixed case, anyway) so I click on "See Image" to check it. It is correct, so I click the bottom right button "Add to project dictionary." Then it will hop to the next word automatically. The image number at the bottom of the GG window will help me with this checking. I just go to the already open xnview image. I click "home" on my keyboard if I'm not already at the start of the book. Then I use "page down" on my keyboard to get to the right page. This helps so that I don't have 13 xnview windows open at one time. At the top of your Spellcheck window it will tell you how many times a word appears in the text. If it says it appears "0" times, it means that it is part of a hyphenated word. If a word appears 13 times, you are usually safe to add it to the dictionary without checking each one. With some experience, you'll find your own threshold of how many occurrences are enough to justify adding without checking each one. You may not find many misspelled words, but you will probably discover some inconsistencies. I just found that "Pellmell" and "pell-mell" show up once each. Since neither was over the end of a line, I have no way of knowing which the author preferred. I add a note to the end after the "tip-toe" note: This text also uses Pellmell and pell-mell. Some PPers do not bother with this type of note. I do because I know that the errata team at PG gets a LOT of false positive notes questioning their texts. This may stop at least one of those notes appearing.

Gutcheck

Spellchecking done, now for the first Gutcheck. Save your file. Go to Fixup->Run Gutcheck. A new window will pop-up. Don't even worry about what it says yet. First thing, click the GC View Options button. Another new window will pop open. Now you check what you don't want to see right now:

  • Asterisk (As that will show you every /**/);
  • Forward slash; (that will show you all html tags and all of the wrap/don't wrap markers)
  • HTML symbol (we know those are there);
  • HTML tag (again);
  • Long line (will be handled later, I promise)
  • Short line (same as above)

Close that window and go back to the Gutcheck window. Pull it down so that you have the GG window on top and Gutcheck on the bottom. You may need to resize. Scroll down past the line that will say something like -->181 queries. The thing to remember is that just because Gutcheck questions it, doesn't mean it is wrong. It is just checking. You have the final say. My first stop is for <sc>. As Gutcheck doesn't recognize this markup, it will question every use of it. It will usually say: Paragraph starts with lowercase; Query word sc; Query word sc; Feel free to right-click these and move on. Double-clicking on the item in the list will take you to the place it occurs. Right-clicking will remove it from the list. The second thing my Gutcheck asks about is "No punctuation at paragraph end." This is because my book's title is split into two lines. As it isn't really a paragraph, I can just right-click and move on. This will take me to the next item automatically. Check any questions against the image and then right click them.

If any of your quoted parts go over one paragraph, Gutcheck will question with "Mismatched quotes." Just check to be sure they are correct and delete it. On this project, it did find one set that the proofers had missed the opening quote on. After checking the original, I replaced it. Had it not been in the original text, I'd have added a note to my TNs concerning it. If there are a lot of printing punctuation errors, I will just include a blanket note: All punctuation errors repaired. If there are only one or two changes to make, I'll just list each one separately:

Page 33, opening quotation mark added. ("For today it is)

The words in the parenthesis allow the reader to search for that exact text if they wish to as in our text versions, the page number will only give them a general idea of where it is since we don't retain those in the text.

Save. Often. With different names. This phrase will show up a lot. Once you're completely comfortable with PPing, you'll save different versions less often.

Split the Files

Split the files. Now I go to File->Save as. Save as surepop.html. Then do it again. File->Save as. Save as surepop.txt. Now I have a separate file with all of the fixes set for htmling. Right this minute though, we're going to tidy up the text file which is almost done!

Text Only File

Then I go to the very bottom of the file, add a couple of blank lines and then go to: Text processing->Add a thoughtbreak. I add a couple of blank lines after that and then copy and paste my list of TN there.

Converting Tags

Go to Text processing->Convert italics. All of your <i> and </i> have now been changed to _. Do the same for Convert bold, just in case you have any. These will be changed to =. One of the whitewashers insists that explanation be made for any markup signal beside the _ for italics. I, if I have bold text usually include the italics in the note.

   Transcriber's Note: Italic text is denoted by _underscores_ and bold text by =equal signs=.

I put this note at the very front of my text, above the title by four lines.

For <sc>, you'll need a regex. Go to ->Search->Search and Replace. Check the regex box.

Put this in the search: <sc>((.|\n)+?)</sc>
In the replace: \U$1\E

Click replace all.

This magical code will capitalize all of those and remove the tags.

For [oe], do a search and replace to change it to just oe; for [OE], change them all to Oe. Make sure regex is NOT checked on your search and replace. If your project has a LOT of these, you may want to consider making a UTF-8 version in addition to your plain text version that actually uses the real ligatures, œ and Œ.An example is if your main character is named Phœbe.

Table of Contents

I tidy up the table of contents before rewrapping. Align the numbers and chapter headers so that it looks nice and clean. Rewrapping will add about 8 spaces before each line. If that will make your lines too long, (i.e. longer than say 72), you may want to do this by hand. Line everything up the way you want it to be, then highlight the table of contents by clicking and dragging over the whole thing. Then go to Selection->Indent selection 1. Twice. That will indent the whole selection 2 spaces. PG asks that things that are indented be so at least two spaces. You can do it more if you like, just watch that right margin. If you place your cursor after a number in the table of contents, the first box on the bottom of the GG window will tell you how far to the right you are. Mine ends as Col. 65, so I am set. Now, if you have indented your TOC yourself, immediately change the /* and */ surrounding it to /$ $/. This tells GG to just ignore that bit when rewrapping.

Another option is to put a number in brackets right after the opening /*[2]. This will tell GG that for this section, you want a two space indentation instead of your usual eight spaces, (or whatever you've got your preferences set to).

Rewrapping

Hit ctrl & a. This will highlight your entire file. Then go to Selection->Rewrap selection.

Checking /* indentation after Rewrap

Now I quickly run through all /*. If you've chosen a poetry book for your first book, this will take a long time. Otherwise, I just check them to be sure everything looks as I'd like it to. Surepop has little quotations from "Colonel Sure Pop" with his "signature" following each one. I'd like those to be sort of right-aligned as they are in the original. So I just space them over. This check also allows me to check for any poetry that has gone over too far to the right. Gutcheck will find this for us soon when we run our last Gutcheck so it's not important to catch all of those right now, but if I see them, I fix them.

Removing the Markers

Go to Fixup->Clean up rewrap markers. This will delete all /# /* and /$ markers and their mates.

Remove End-of-Line Spaces

Now go to Fixup->Remove End-of-line spaces. Then save. If you do not do this now, Gutcheck will complain about every one.

Final Gutcheck

Final Gutcheck. Go to Fixup->Run Gutcheck. Remember those boxes we checked? Uncheck them now as we need to know if there are any lingering long or short lines, etc. Just like last time, make your way through the questions, right-clicking correct things, fixing anything that is broken.

Remove End-of-Line Spaces Again

Again go to Fixup->Remove End-of-line spaces just in case one got added during fixing things. Your text version is now DONE.

HTML

Deep breath. On to HTML.

Go to File->the second thing on your list should be surepop.html. Click on that.

Setting up the Table of Contents

Go to your table of contents. If any line begins with a space, remove it. You'll be happier later. This will often occur if your formatters have aligned your chapter numbers for the text for you. Very helpful for the text. Not as helpful for the HTML. Sometimes there will be a "PAGE" designation on the right above the number. remove the spaces before it to make it land on the left margin. Now here is where I do something wacky that helps me later. I type "Blah" right before "PAGE" and then add some spaces after it. So:

PAGE

now looks like

Blah            Page

This will help me in making the table shortly. "Blah" is my space holder and will be replaced with &nbsp; when I actually make the table. Putting &nbsp; in right now would be a Bad Idea as GG will convert the & into code. Before finishing the file completely, I can run a check for "Blah" and make sure none got left behind. Make sure that each page number is spaced at least 2 spaces from the title. DP requires six but GG only needs two.

Auto-generating the HTML

Here we go. Go to Fixup->HTML Fixup.† That big pop-up with all of the cool buttons returns. In the upper left click the button that says Autogenerate HTML. It will immediately save a copy just in case for you. Then it will go to work. And work. And give you a file loaded with coded html. Save. (something like surepop-html-1.html) If you want you can immediately go to External->Pass open file to default handler. It will open your browser with your new file and show you how it is starting to look like a real e-book!

†For newer GG versions, HTML Fixup was split into two menus. The first is HTML Generator and Checks and the other is HTML Markup.

HTML Generator

Fill in the boxes for TITLE and AUTHOR as you want them to appear in Title Case. GG will try to fill in these boxes for your but sometimes gets things wrong. Then click autogenerate HTML at the bottom.

ascii bug

Some versions of GG will list your encoding on the fifth text line as "ascii" it needs to read "us-ascii".

<meta http-equiv="Content-Type" content="text/html;charset=ascii" />

should be

<meta http-equiv="Content-Type" content="text/html;charset=us-ascii" />

If your code is something else like: <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" /> then this is a fix you can ignore.

Fixing the Title

First thing, the ninth line of your html file will say something like:

The Project Gutenberg eBook of Sure Pop And The Safety Scouts, by Roy Rutherford Bailey.

Check this to be sure it is accurate. GG will take the first line of whatever is written on your file and the first thing that says "By" followed by something and place it in the author spot. If it cannot find a "By" it will just say "By AUTHOR." The allcaps are so that you see it and remember to fix it. If the first thing in your file is an illustration, that will be what it chose as the title. Fix it up to match the real title and author if it is not correct. As you can see, it did pretty well with Sure Pop but I need to change the case of the "And The" to be correct.

The Project Gutenberg eBook of Sure Pop and the Safety Scouts, by Roy Rutherford Bailey.


Cutting Out Unused CSS

Now there is a bunch of code following. This is your auto-CSS. (CSS just means Cascading Style Sheets but you really never need to know that.)

You should remove anything that you are not using. For example if you have no footnotes, you can remove all of that code:


   .footnotes        {border: dashed 1px;}
   .footnote         {margin-left: 10%; margin-right: 10%; font-size: 0.9em;}
   .footnote .label  {position: absolute; right: 84%; text-align: right;}
   .fnanchor         {vertical-align: super; font-size: .8em; text-decoration: none;}

You probably do not have line numbers or sidenotes either. If you are unsure if something is in your book, then leave that bit there. A little bit of unused CSS is okay. The more experience you have, the more you will change your CSS to fit what you like and what works best for you. For example, I have indented paragraphs, a bit for right-aligned text, a bit for signatures, my own poetry mark-up, etc.[2]

If you have an older version of GG, then
/*<![CDATA[ XML blockout */
<!--
may appear at the top of your CSS and
/* // --> */
/* XML end ]]> */
at the bottom of your text. Delete those lines as they just bug the whitewashers no end. It is wise to open your GG file, find the text file that is named "Header" and delete those lines from it now. Then, from now on, those lines will never appear again.

If you have

Page Numbers

I remove all of the Pg from the page numbers. I think the numbers speak for themselves and sometimes they end up split across two lines and that just looks odd. So Search and Replace for: Pg . (That is "Pg" with a space after it) Leave the Replace box empty. Save. At any time that you've made a change to your HTML and saved it, you can refresh that browser that showed you what it looked like. Then you can check that what you did turned out the way you wanted it to.

From here on, you can do each of these steps in almost any order. Sometimes putting illustrations in first will be the wisest course, sometimes deleting the auto-TOC will make sense to do first. Save versions as you go. Undo what you don't like.

Unused Chapter Links

Getting rid of chapter links that you won't use. GG automatically inserts links for its automatic Table of Contents. I make my own more tidy Table of Contents, so we do not need those links. Search and Replace them away. Check the regex box.
For versions 1.0.1 through 1.0.19 use
Search: <h2><a name="([\w\s\p{IsPunct}\n]+?)" id="([\w\s\p{IsPunct}\n]+?)">((.|\n)+?)</a>
Replace: <h2>$3

Replace all.

For versions 1.0.2 and later use:
Search: <h2><a name="([\w\s\p{IsPunct}\n]+?)" id="([\w\s\p{IsPunct}\n]+?)"></a>
Replace: <h2>

Replace all.

Delete the Auto-generated TOC

Now scroll down in your file until you find GG autogenterated Table of Contents. It will start something like:

<!-- Autogenerated TOC. Modify or delete as required. -->
<p>
<a href="#Sure_Pop_and_the_Safety_Scouts"><b>Sure Pop and the Safety Scouts</b></a><br />
<a href="#SURE_POP_AND"><b>SURE POP AND</b></a><br />
<a href="#CONTENTS"><b>CONTENTS</b></a><br />
<a href="#THE_SAFETY_SCOUTS_PLATFORM"><b>THE SAFETY SCOUT'S PLATFORM</b></a><br />
<a href="#INTRODUCTION"><b>INTRODUCTION</b></a><br />
and end with something like:
<a href="#HOW_CAN_YOU_TELL_A_GOOD_SCOUT"><b>HOW CAN YOU TELL A GOOD SCOUT?</b></a><br />
<a href="#THE_BEST_OF_GIFTS_A_BOOK"><b>THE BEST OF GIFTS—A BOOK</b></a><br />
</p>
<!-- End Autogenerated TOC. -->

and DELETE THE WHOLE THING. You won't use it. You'll make a much nicer one later on.

Title Page

Make sure that the title is in <h1></h1> tags and the author in <h2></h2> tags. Anything that you want centered can go in <div class='center'>Whatever it is</div>. If you need line breaks or blank lines, add <br /> at the end of a line. Just make sure that every <br /> is contained in a <div> of some kind or a <p> or a heading of some kind.

It is probably wise to center the copyright as well. It just looks nice.

Table of Contents

Make sure that each line is left justified and that there are at least two spaces between each title and page number. I usually just put in 5 or 6 to be sure. If your TOC has Roman chapter numbers, and most do, put some spaces between the . after the number and the chapter title. I do a search for . with a space after it and replace with . with a few spaces after it. If you have a title with Mr. Whosit, don't do this search and replace unless you want to just go back and fix that one by hand. If you have a "Blah" place holder, replace that now with &nbsp; Okay, highlight the whole Table of Contents. On the HTML-fixup menu, click "Auto Table".

The second line will say: <table border="0" cellpadding="4" cellspacing="0" summary="">. You'll want to make that cellpadding number smaller. I change mine to 0 and add some spaces by hand, but that is too tight for some people. In the summary="", put Contents or Table of Contents.

If your TOC has a "CHAPTER" heading above the chapter numbers, change
<td align="left">CHAPTER</td>
to
<td align="left" colspan="2">CHAPTER</td>

This will make your chapter heading overlap the chapter title column a bit, making things a little tidier.

If your TOC has a "PAGE" heading, you'll want to change
<td align="left">PAGE</td>
to
<td align="right">PAGE</td>

You'll want those Roman Numerals to be right-aligned if they are in the original. UNCHECK regex. Highlight the whole table.

Search for: <tr><td align="left">
Replace with: <tr><td align="right">

To put a space after each Roman Numeral, highlight the whole table, then

Search for: .</td><td align="left">
Replace with: .&nbsp;</td><td align="left">

Now we want the TOC to be linked to the actual page numbers. Check the regex box on your Search and Replace. Highlight the whole table.

Search for: "left">(\d+)
Replace with: "right"><a href="#Page_$1">$1</a>

This will both right-align the numbers and link them.

Keep refreshing the browser window that has your completed project showing. Make sure that you are liking what you see. Make sure you save with different names (i.e. surepop-contents2.html) between refreshes to see the difference. If you don't like the change, go back to an earlier save. Eventually, you will save a fewer versions, but until you are comfortable making changes, it's best to be cautious.

Chapter Centering

GG has already centered your first line of Chapter Heading. Usually something like

CHAPTER XIV

It puts them in <h2> tags for you. You need to center the title yourself. Check the regex box.

This makes a better generated table of contents for Ebooks on handheld devices.

Search for: </h2>\n\n<p>((.|\n)+?)</p>
Replace with: <br />\n\n<small>$1</small></h2>

Replace all.

Placing Illustrations

If your book had a cover image included, you'll need to create an image tag. Your proofers and formatters didn't know it was included, so this is your job. At the end of your CSS, after the

      </style>
</head>
<body>

Type in <p>[Illustration: Cover]</p>

On your HTML fixup menu, there is a button on the upper right for Auto Illus Search. GG will search for your first <p>[Illustration]</p> tag. It will then pop up your project list and then choose your "images" folder. Find the corresponding image number. If your illustration had no caption, then in the first box on the Image window, under Alt text, type in a short description. (This will help people with readers know what they are missing.) The jury is out on whether you should also type the same thing in the "Title text" window. Accessibility people say, "No!" it just repeats information. The strictest HTMLers say: "Yes! You should never have empty ""!" Your call. Decide whether you want your illustration to be on the left, in the center or on the right and check the appropriate box. For the cover, I choose center. Click okay. Now, since my cover really doesn't have a caption in the original, I erase the line that says: <span class="caption">Cover</span>. Do this for all of your illustrations that do not have captions. You type it into the box so that the (alt="Cover") bit of the illustration code is already filled in. Repeat this process until GG cannot find another Illustration tag.

Transcriber's Notes

Transcriber's Notes for the HTML. Wherever possible I use hover tags. These are tags that pop up when you hover your cursor over them. They underline the corrected word with a dotted line so you know where they are.

In your CSS put

ins {text-decoration:none; border-bottom: thin dotted gray;}

.tnote {border: dashed 1px; margin-left: 10%; margin-right: 10%;padding-bottom: .5em; padding-top: .5em; padding-left: .5em; padding-right: .5em;}

Find your first correction. This is where those words you put in parentheses for the text version in your notes comes in handy. Replace the corrected word with

<ins title="Transcriber's Note: original reads 'Molly'">Mollie</ins>

When you've replaced them all, go to the very end of your file. Put in a break like the chapter ones:

<hr style="width: 65%;" />

<div class='tnote'><h3>Transcriber's Notes:</h3> <p>Obvious punctuation errors repaired.</p>

<p>The remaining corrections made are indicated by dotted lines under the corrections. Scroll the mouse over the word and the original text will <ins title="Transcriber's Note: original reads 'apprear'">appear</ins>.</p></div>

You can change the wording as much as you like. Since Sure Pop is so short and there were no real corrections, my TN for this project looks like:

<hr style="width: 65%;" />

<div class='tnote'><h3>Transcriber's Notes:</h3> <p>Transcriber's Notes: Both "tiptoe" and "tip-toe" were used in this text. This text also uses Pellmell and pell mell.</p> </div>

Your notes: Now go back and go over the notes that you left for yourself. For example, I like my ! ? ; : to slant with the rest of the italic text so I'd left a note "Put ! into </i>"

Search for: </i>! Replace: !</i>

Replace all. Repeat for ? : ; Make sure that "Whole Word" is not checked or it won't find any. As there are no spaces, it assumes that is part of a word.

Replace any [oe] with &oelig; and [OE] with &OElig;. Make sure the regex box is NOT checked for these search and replaces.

Here is my list of notes for this Project.

  move ! into </i>
  right-align illo on png.017
  fix note on 063
  hanging indents on 084
  hanging indents on 092
  center single line on 099
  hanging indents on 135; 136

Fixing up those Italics

The Best Practices document asks that we, where appropriate, change our italic tags to more realistically show what they are representing. Open your Search and Replace menu. You will need to choose for each italic set a category.

You can read more about the categories here. Here is a quick list:

 <em> </em> is used when a word is italic for emphasis in the text.
 <cite> </cite> is used when a reference is cited. The New York Times, Washington Irving, etc.
 <i lang="fr" xml:lang="fr"> </i> is used to indicate a language other than the original's in the text.
   ("Bonjour, m'ami!" he shouted.)

In this case I've indicated "fr" for French as that is the language my projects tend to use most. This will help screen reading devices with such capabilites in pronouncing the words properly. The link above will help you choose which code to use for whatever language your text has. If your italic text does not fit any of the above categories then the italic tags stay just as they are.

Where it says toward the bottom: "Replacement Text Terms- Single Multi", check Multi. Also check "Regex" and "Start at Beginning."

 In the search box search for: <i>((.|\n)+?)</i>
In the first replace box put: <em>$1</em> In the second replace box put: <cite>$1</cite> In the third replace box put: <i lang="fr" xml:lang="fr">$1</i>

Then make your way through the text choosing to replace with one of the above options or leaving the italics as printed. Ask in the Best Practices Thread or in the No Dumb Questions for PPers if you're unsure.

Last Pass

After taking care of your notes, it's time to go through and look your project over carefully. Refresh the one in the browser and start at the top, fixing anything that you find. If you don't know how to fix it, ask in the Help! HTML thread or the The Official "No Dumb Questions" thread for PPers

You must know by now that DPers love to answer questions. You will probably get more than one answer or solution. Keep at them until you understand and get an answer that works best for your text.

One of the things that I had to fix was the title of the book right before the first chapter. The first line of the title was in the <h2> tag as it should be but the second line was off on its own. I inserted a break <br /> after the first line instead of the closing </h2> and put the closing </h2> after the end of the second line. Save your file.

The Final Checks

Find Orphaned Markup

After fixing everything that you wanted fixed, it's time to run some checks. First: Find Orphaned Markup. This is the second time we've used this button on the middle right of the HTML Markup menu. Chances are, this time it will find more than it did last time. For me, it found an opening <p> on a page number that I'd moved without a closing </p> That was easy to fix. Keep pushing it and fixing until it takes you to the end of your file because that means it's done. Save your file.

Link Checker

This checks to be sure that everything that links in your project, like the TOC goes somewhere, and isn't broken. It also checks to be sure that you've used all of the illustrations in your "images" file. At the bottom of your HTML Markup menu on the left is a button "Link Checker." Click it and a new window opens. Anything with (CRITICAL) after its title, needs to be fixed. (INFORMATIONAL) will be full of page numbers that you didn't use, which is fine. If there are no CRITICAL issues, close that window. Save your file.

Tidy Check

Go back to HTML Markup and next to "Link Checker" push "HTML Tidy". If you are lucky that first line says:

INFO: Doctype give is "-//W3C//DTD XHTML 1.0 Strict//EN"

This means that your file is perfect as far as Tidy is concerned. If not, fix up the errors or warnings, that it finds.

Save your file.

Validator

Validator: this is a web validation service. You must run it and it works really, really well.

Validator

Browse to find the file. For me: C:\DP\surepop\surepop.html The icon for the HTML file has the blue E next to it as it thinks IE is my usual browser. I've just never changed it to Firefox which is what I use for most things. (You should check how your HTML looks in at least those two browsers.) Push Enter or Check.

Chances are the bright red:

Errors found while checking this document as XHTML 1.0 Strict!

will show up. With more practice, it will turn green most of the time. I have an error on this one. Do NOT panic with the number it gives you. For me it says: 850 Errors, 1 warning(s) I just roll my eyes. I know that I do NOT have 850 errors and neither do you. Scroll down the error page until you see

Validation Output: 850 errors.

It will then tell you what line it thinks your heinous error is in.

Line 248, Column 4: document type does not allow element "h2" here; missing one of "object", "ins", "del", "map", "button" start-tag <h2>SURE POP AND THE<br />

Now I know that is wrong. So I go to the GG window. On the bottom left, it says: Ln: 4317/4321 (meaning my cursor is currently on line 4317 of the total 4321 lines in the project. I click on that box and a little pop-up Goto Line Number. I type in 248. I know that line is fine so I start to look UP the file for something odd. Omigosh. I realise what I've done. It is almost always a very simple error and this is no exception. Remember, how I told you to save before going to Validator? I didn't. So of course I have errors. They're already fixed, but Validator didn't see it as it saw the LAST save I did before fixing errors. I save the file. I refresh the Validator window. Click okay when it tells me about the PostData. and get the happy green: This document was successfully checked as XHTML 1.0 Strict!

What this taught us, beside take your own advice, was that one error, that lost <h2> tag that was fixed by fixing the title before the first chapter, caused Validator to panic and tell me that everything past that point was Wrong, Wrong, Wrong. Validator does that. It errs on the side of Panic. Don't Panic with it.

Do not include the link that the Validator offers you to prove that your file is valid. The whitewashers and your PPV will check it themselves.

Zip it Up

Though it is hard to believe, you are pretty much done. It's time to zip it all up for PPV.

I use zipcentral, but its website has gone the way of the dodo. I asked in this thread about ways other people zip. There are a lot of options. The Winzip one is probably the easiest for now.

In your zip you should have, named for your project, of course:

   * Your text file: surepop.txt
   * Your html file: surepop.html
   * Your txt bin file: surepop.txt.bin
   * Your html bin file: surepop.html.bin
   * A folder with your images: images

Once your project is zipped up, check it for a file called thumbs.db. Delete that file immediately. It is something helpful that IE adds that is huge and useless for our purposes. It is a FAIL for whitewashing (the final step in Post-processing where our projects have to pass someone at PG). I have turned off the option in my IE and have never missed it. If you want to as well, go to your IE Browser. Go to Tools->Folder Options. Go to the View tab. Scroll down and check the box titled: Do not cache thumbnails. Click Apply and close the window. Now it will never include them again.

PPV

Go to your project's Project Page. Scroll to the very bottom and you'll find a button: Upload for Verification. Browse for your zipped file. Type any explanations or comments to your prospective PPVer in that box. Things you might include are page numbering inconsistencies you noticed, or that the author changed your character's name halfway through and how you handled it. Anything that a quick word from you would explain, explain here. Click upload. If this is your first project, go post in the "New PP-ers waiting for their first project to PPV" thread.

You are done! For now.

Your PPV wants you to succeed. They want you to get to see your project on Gutenberg. They will:
1) Download your project.
2) Check it over with a fine tooth comb.
3) Spellcheck it.
4) Run all of the checks that you did (or were supposed to)
5) Take notes and either
   a) Send it back to you to fix some things, or
   b) Write to you and ask what you'd like them to do about some things and then post to Gutenberg, or
   c) Post it to Gutenberg and send you feedback. This one is much more likely after you've done a few.

You will receive feedback no matter which of the three above they choose. It will be encouraging and, even so, you'll probably feel frustrated and embarrassed if there are changes to be made. This is normal, but not what they are aiming for. Be patient with the time it takes and with their suggestions and corrections. If you don't like what they've changed, tell them. Remember that they, like you, are volunteers. They've volunteered to take on this job and want you to feel good and do well!


Further Notes

[1]<sc> If your book uses a.d., b.c., a.m., p.m. and so on, you'll need to fix that for the HTML.


Other searches: If your book uses period/full stops after Mr., Dr., Mrs. Mme. Mlle., and so on. It's sometimes wise to check for them with a space after for missing periods.



[2] If you find yourself adding the same thing over and over to your CSS files, such as right-aligned text, you can add it to GG header file so that it always appears. To edit the file, go to your GG folder. Open the text file that's named obviously enough, "Header." I immediately save it as "GG original header." Then go back and reopen "Header." Somewhere in the list of ".somethings" add:

   .right    {text-align: right;}

For example, mine reads is right above

   .u        {text-decoration: underline;} 

GG doesn't care what order they are in. Then save your file and thenceforth, all of your autogenerated files will have the right-aligned CSS code in place.

Extra Information that you probably won't see on an Easy Fiction

Footnotes

I do footnotes twice. Once for the text version and once for the HTML version. This is because I put them in different places. For this walkthrough I'll be using "Roses and Rose Growing" as "Surepop" had no footnotes.

Text Version Footnotes

After the text has been saved as roses.txt and before rewrapping:

Go to->Fixup->Footnote Fixup

The buttons on the menu:

  • See Anchor: That takes me to the place in the text with the [1] for whatever number is in that drop down menu in the second line.
  • See Footnote: Takes me to the footnote.
  • Last Footnote: Takes me to the footnote right before the one I am looking at presently. If I am looking at #1 it won't show me a thing, obviously.
  • Drop Down: Lets me choose any of my footnotes to see.
  • Next Footnote: Shows you the next footnote in order. Use this to step through your footnotes and make sure they are all the way you'd like them to be.
  • Three check boxes: This lets you choose to have only one type of footnote marker. You will use this most often. Sometimes however, your text will have both numbered and lettered or numbered and Roman. I've already found on working on "Roses" that it has both numbered and lettered so I'll be leaving all of the boxes unchecked.
  • Number, Letter, Roman: This button changes whatever footnote you are looking at to a number, a letter or a Roman numeral. We will not be using these buttons.
  • Sadly, I cannot tell you what Join with Previous, Adjust Bounds, or Set Anchor, does as I've never used them.
  • First Pass: This button runs through your document and counts up and checks all of your footnotes. Although it says "First", you'll use it more often than just once usually.
  • Inline/Out-of-line: Only on the rarest of occasions would you use Inline. It gets in the way of reading and is generally frowned upon. In all of my books, I've used it once and am not sure even then if it was the best choice. It was an ancient history book with numerous year footnotes. Out-of-line is the default and should remain checked.
  • Re Index: This will change all of your footnotes to an order. 1, 2, 3. A, B, C. I, II, III. IF you didn't check "All to" above then it will simply change them in whatever form you have. Mine will be both 1, 2, 3. and A.
  • Autoset buttons: This will tell GG where you want your Footnotes to land. For the text version, the usual is Chapter End. For the HTML version, the usual is End. It will add a heading of FOOTNOTES: to the places you've chosen to have footnotes.
  • Next and Last Landing Zone: Does what you'd think. Lets you scroll through your landing zones.
  • Unlimited Anchor Search: this should be check at all times. It lets GG search for anchors that are farther away from the actual footnote than you might expect.
  • Move Footnotes to Landing Zones: This does what it says. It collects all of your footnotes and puts them where you've told it to. The text on this button will be gray until you choose a landing zone.
  • Tidy Up Footnotes: This is the final step for the Text version. You will NEVER use this button for the HTML version. It changes

[Footnote 1: See pruning, p. 17.] to

[1] See pruning, p. 17.

It tidies them up.

  • Check Footnotes: This is another button that you may use a lot. After First Pass, click this button and a window will open that lists all of your footnotes and any issues that might happen with them. They will be color-coded. White is what you want to see.

The actual process:

Click: First Pass. Click that and watch GG zip through your file counting up and checking footnotes. At the top of the window I now have #1/16 which tells me I have 16 footnotes and am looking at the first one.

Click: Check footnotes: All white. Sometimes on making your first pass through the text and moving the footnotes out of the middle of paragraphs will result in the accidental placing a footnote BEFORE an anchor instead of after it. Simply go to the footnote in question and move it. Click First Pass again. Check Footnotes. It should now be resolved. Sometimes a page will stop mid-paragraph with a [1] in it and continue on the next page with another [1] on the new page. Now you have two [1]s in the same paragraph. Change the second anchor to a 2 and the corresponding footnote to a 2 as well. (If you have a lot of footnotes you may have a lot of reordering to do). Once all of your footnotes are white (or brown if they are just large), on to:

Click: Re Index. Before pushing this button, decide if they can all be numbers, letters or Roman. If so, check the appropriate box above. Mine cannot. I notice that my one letter footnote has multiple anchors and one note.

 [A]Blanc double de Coubert.
 [A]Conrad F. Meyer.
 [A]Delicata.
 [A]Madame Georges Bruant.

All for

 [Footnote A: Perpetual flowering.]

For my text version, this is fine. For my HTML, it will take special handling.

Okay, it has now been re-indexed. I'm choosing end of chapter as that way it isn't a long way for the text reader to have to go to find the reference. Autoset Chap. LZ. Now the text on the Move Footnotes to Landing Zones is no longer grayed out. Click that next.

Finally, click Tidy Up Footnotes. That looks nicer. The final step I take here is purely cosmetic. I search for FOOTNOTES: and if a section has only one, I change it to FOOTNOTE: It just seems more correct. Three of my eight chapters with footnotes had only one.

Note on Tables: Often you will want to leave footnotes that reference table data with the table instead of moving them to a chapter or book end. You can follow all of the steps and just cut and past them back. My one letter footnote conveniently landed at the end of the chapter in any case so I didn't have to move it at all.

Now it is safe to rewrap.

Footnotes for HTML

DO THIS BEFORE running AUTOGENERATE HTML! The steps are identical except: Choose End as your landing zone and do NOT Tidy Footnotes. Make sure before moving your notes to the end that there is at least one blank line at the end of your text. Otherwise it will come out something like

  /*
  <i>SOUTHWELL,
  NOTTS.</i>
  */
  FOOTNOTES:

I try to make sure there are two blank lines above the FOOTNOTES: tag.

Other things to be aware of for html. Sometimes, GG puts the closing </div> one footnote too early. Check your second to the last footnote. If it has </div></div>, move one of them so that your LAST footnote ends </div></div>.

Now this moved my [A] footnote to the end as well. After I generate the HTML, I'll be moving it back to the end of the list it references.

If you have a multiple anchored footnote as my [A] is GG will only link the last anchor. You'll need to do the rest by hand.

Simply copy: <a href="#Footnote_A_3" class="fnanchor">[A]</a>

and paste it for each [A].

Do not include the: <a name="FNanchor_A_3" id="FNanchor_A_3"></a>

on every one. Each named anchor must be unique, therefore, each anchor may only be "named" one time in a file. You may have multiple anchors to the same footnote, but only the first one has the "name" and "id".

Index

On your first pass, make sure that any continued items are rejoined. For example:

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1,
*/
-------
/*
72, 82, 110-12, 123.

Arsenate of lead, 146-9.

should become:

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72, 82, 110-12, 123.
*/
-------
/*

Arsenate of lead, 146-9.

Or if you want to tidy it up now:

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72, 82, 110-12, 123.
-------

Arsenate of lead, 146-9.

Make sure when you are removing page separators that anything that needs to be indented stays indented. Often you will find a lot of printer's errors in the index. Often the person who makes the index is not the author of the book. A lot of transcriber's notes come out of indexes.

Text

For the text version, you simply need to be sure that the entire thing is indented at least two spaces from the left. After the opening /* put [2]

/*[2]
A.

Abol syringe, 138, 148.

Abol, White's Superior, 141, 148.

Aphis.
  _See_ Green Fly.

Aphis Lion, 140.

Arsenate of lead, 146-9.
GG reads that [2] and will indent that section 2 spaces. If you have some very long lists of numbers

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72, 82, 110-12, 123.

You'll need to break those within your width limit (I usually choose 70) and indent the next line at least 6 spaces. Remember that your index will be indented two so break the line at least before 68. The first gray box in the second to the bottom line of GG will tell you which Line and Column your cursor is resting on. Column means how far over.

Ln: 5869/6294 - Col: 74

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72,
          82, 110-12, 123.

Your final Gutcheck will alert you to any long lines left over. Adjust and click Fixup->Remove End of Line Spaces when you are done. Other than that, for the text, you're finished.

HTML

Indexes for HTML are a bit more complicated which is why they are not recommended for new post-processors. You can leave the [2] after the opening /* if you like. It will just be ignored for the html version.

After auto-generating your HTML your index will look something like:

<p>
A.<br />
<br />
Abol syringe, 138, 148.<br />
<br />
Abol, White's Superior, 141, 148.<br />
<br />
Aphis.<br />
<span style="margin-left: 1em;"><i>See</i> Green Fly.</span><br />
<br />
Aphis Lion, 140.<br />
<br />
Arsenate of lead, 146-9.<br />

Put a bookmark at the top of the index so that you can find it over and over again. To place a bookmark, hold down ctrl and shift and then choose any number from 1-5. I choose 5 as it's the end of the book. To go to a bookmark hold down ctrl and click the number.

Change the opening <p> to a <div> and the closing one as well.

Linking pages: Highlight the entire index. If there are a lot of ads after the index I add a lot of blank lines before them so that I can see where to quit click, drag, highlighting easier. Extra blank lines will not show up in the final product so it's safe to add them now. After highlighting, we'll do a regex search and replace:

Search: , (\d+)
Replace: , <a href="#Page_$1">$1</a>

The comma is very important. Without it you will link every number in your index and that will have some bad unintended consequences. As will all of the steps: Save multiple versions as you go along so you can always go back a step. roses-index1.html; to start. roses-index2.html after your first pass replacing and so on. Check your file in a browser often such as IE or Firefox to be sure it's looking as you hope.

This replace may have to be run more than once. Sometimes GG doesn't see the every number in a line. Keep highlighting and running it until it doesn't find any more. It will change them to:

Abol syringe, <a href="#Page_138">138</a>, <a href="#Page_148">148</a>.

Roses took two passes through to catch all the basic links. Now, if you are lucky, your index with groups of successive pages will look like 236-247. The first will have been linked already to

<a href="#Page_236">236</a>-247

For those we simply change our regex a bit:

Search: -(\d+)
Replace: -<a href="#Page_$1">$1</a>

Repeat as above.

For Roses, however, our printer was very concerned about over-using ink. He shrank most of the continued references to:

Arsenate of lead, 146-9

I obviously do not want my readers to go to page nine to see the end of the Arsenate of lead reference so every one of those will be coded by hand.

I will use the same search and replace as above but instead of choosing: Replace all, I will look at each one, click "Replace" and edit the:

-<a href="#Page_9">9</a>

that it gives me to:

-<a href="#Page_149">9</a>

so the reader still sees -9 but goes to page 149.

Whew. So that is the number references done. Now for the "See", "See also", etc. This will work for those references within your text as well. Some do not link these and it is not required. I just think it looks nicer. (For the ones in the text that say "See page 212" simply

search for: page (\d+) replace: page <a href="#Page_$1">$1</a>

Make absolutely sure as you replace those that it is not referencing a different book entirely. See Elston's Music Dictionary, page 212 is not a link we want hyperlinked.)

Within the index, you'll notice that my first "see" entry is:

Aphis.<br />
<span style="margin-left: 1em;"><i>See</i> Green Fly.</span><br />

I put a bookmark right here. Then I go find the entry for Green Fly. I highlight "Green fly" under G and using my HTML fixup menu, I click "Named Anchor." It is right in the middle of that pop-up menu. It will add:

<a name="Green_fly" id="Green_fly"></a>Green fly,

Now I use my bookmark to go back to the "See" entry. I highlight "Green Fly" there. Using the same HTML fixup menu, I choose "Internal link". It is two to the left of "Named Anchor". A new window pops up and the first link it offers me is "#Green_fly". I double-click on that and my link is created.

<span style="margin-left: 1em;"><i>See</i> <a href="#Green_fly">Green Fly</a>.</span>

Repeat until all the "See"s are done. Look over your file in a browser. Make sure all numbers are linked. Make sure no years got linked.

Finish your HTML file as noted above.

Greek

First, unless you are a scholar in ancient Greek, post links to your original page and also copies of what your proofers and formatters came up with here or here. Within a few hours Greek scholars will give you any corrections or tell you it is all just perfect!

You are going to want to make three different files this time instead of the usual two. One plain text (or Latin-1) which will have the transliteration of the Greek, one UTF-8 that will have the real Greek letters visible, and one HTML which will also have the Greek letters visible as well as a transliteration.

The Whitewashers require that each of these have the -names that are listed below.

Plain Text

For this, you simply leave the [Greek: Biblos] tags in place in the text. Save this one as yourtextname-ltn-1.txt.

UTF-8

For this one, you'll need to find the letters themselves. GG will help you with this as will the Greek scholars if you need them. In GG, under Fixup, at the very bottom, you will see "Find Greek." It will zip you right to your first Greek tag and pop-up a little tool. (It may pop-up BEHIND your working text file depending on how large you've got your text box set.) There in the text will be the word as transliterated. My first is "Epiphaneia". Check the UTF-8 box, then I just look at the original image and at the choices GG gives me and pick out the letters I need. Which comes out something like: Επιφανεια. I copy and paste the Greek word into the text in place of the [Greek: Epiphaneia] tag. You can use the "transfer" button but then erase the [Greek: ] brackets and word because it is obvious what that is and make sure that only Επιφανεια remains in the text. You can, while you have the Greek box open get the codes for the HTML version if you want. Do the same steps as for the UTF-8 but click the "HTML code" button instead.

Save this UTF-8 file as yourtextname-utf8.txt

HTML

The jury is out on whether it is best to use the HTML codes for Greek letters or the letters themselves. The codes for the word above come out as: & #917;& #960;& #953;& #966;& #945;& #957;& #949;& #953;& #945; (with no spaces at all in the word. The spaces are present to prevent your browser from interpreting the codes.) I translate into the codes and copy them while I am doing the UTF-8. Save them in a separate notepad file with their transliterated word next to them.

& #917;& #960;& #953;& #966;& #945;& #957;& #949;& #953;& #945; - Epiphaneia (remember no spaces after the &s)

An added thing that you can do to help your reader in your HTML file is to use "Insert" or "hover tags". These will provide the reader with a little popup transliteration of the word if they hover their cursor over the Greek word.

In your CSS put:

   ins {text-decoration:none;  border-bottom: thin dotted gray;}

around your Greek word put:

   <ins title="Greek: Epiphaneia">& #917;& #960;& #953;& #966;& #945;& #957;& #949;& #953;&# 945;</ins>

which will come out in your text as

 of our Saviour to the Gentiles; and the name
Epiphany (Επιφανεια), which signifies an appearance
from above, was given to it in allusion to the

Except yours will have a grey-dotted line underneath. Hover tags do not yet work in all handheld ebooks, but maybe someday.

Save this file as yourtextname-htm.htm