User:Camomiletea/PM Routine
Follow A personal View of PMing, with exceptions as follows:
Checking for previous versions or projects "claimed"
- This script is not enough.
- Search the forums, particularly Providing Content forum.
- Search in-progress on DP-Canada.
- Harvesting Toronto claims
- Harvesting Americana claims
Obtaining Individual Images
- Harvesting high-resolution images
- This can be useful for getting
- images of title and verso for copyright clearance, and
- original scans for badly cropped pages.
Abbyy
- Open images, making sure to check boxes "Determine page orientation" and "Enable image processing".
- Step through images.
- Make sure read order is sensible, columns are treated correctly, and text is not needlessly split into multiple blocks.
- On blank pages remove text areas, draw a picture area.
- Delete text areas around gunk or within illustrations.
- "Read all pages"
- Step through images again
- Make sure chapter headers are recognized as body text, not "headers/footers".
- Remove page numbering if needed.
- "Read page" after making changes to the blocks.
- Remove any glaring gunk (→ will also look after Guiprep).
- Save:
- text with linebreaks
- text without linebreaks
- images as PNGs.
Some notes:
- Tends to misrecognize y as v. So I've been checking for "v" at the end of words (Regex:
v\b
) and finding things like "readilv". - May misrecognize "or" as "01*", "no" as "110", "on" as "011", and I don't think I have scanno checks for those.
- If a single paragraph is split into multiple blocks, there will be a paragraph break between the parts. Delete the multiple parts; use "Add Area Part" to draw a section to add to the block.
- I don't adjust the text areas to remove page numbers until I see how Abbyy reads the pages; that way on some pages Abbyy will have already determined that they are "headers/footers" and ignored them, and I have less work to do.
- Area properties: this allows you to renumber blocks, specify a different language from the default...
- Image properties: this shows the DPI of image and allows you to change it.
- Quick Access Toolbar: I hide the main toolbar, and just have this little row of buttons that seemed the most useful:
- Undo and Redo
- Previous Page, Next Page, Go to Page
- Open Images, Open Finereader document
- Save Finereader document, Save images, Save as text, Options
- Reorder tool, Read selected pages, Read all pages, Show/Hide unprintable character.
Guiprep and More
- If book contains "McNames", see this forum topic.
- If there are oe ligatures, etc. disable "Convert to ISO 8859-1" -> fix in the next stage.
Coming as I am from the Post-Processing, I find the automatic changes really annoying, because in many instances the change is incorrect. Therefore, I'm trying out a different procedure.
- Run Guiprep with minimal settings, such as
- remove spaces before punctuation,
- multiple spaces to one
- fix ellipses
- Concatenate files with Guiguts (File -> Import Prep Text Files; File -> Save)
- Run Guiprep with most settings on, and Fix scannos checked.
- Concatenate files again with Guiguts.
- Run WinMerge on the two versions.
- Since most of Guiprep changes are good, use the latest as the main one, making changes to it as needed.
In Guiguts:
- Save with a new name.
- Remove tabs (Search \t, check Regex, replace one space)
- Fix UTF-8 characters if needed (œ ligatures to [oe]; Greek to transliteration.)
- Run Scanno checks.
- Fix punctuation which was shifted to next line -- Search: \n([,.:;!?])\s? Replace: $1\n
- Run Word Frequency Routine:
- Alpha/Num tab.
- MixedCase tab.
- Sort Alpha, review Character Counts tab for *, /, {, and other characters that you know can't be there.
- Possibly some Gutcheck.
- Compare the working copy with the previous version (making sure no page separators are missing).
- if page separators are missing, reinsert, and delete .bin file.
- File -> Export As Prep Text Files.
Illustrations
- Do not delete non-illustration images. Move them to a separate folder.
- Step through each non-illustration image to double-check that there are no illustrations (or weird symbols that might need to be treated as illos).
Project comments
<p style="background-color: #336633; color: #ffffff; font-size: larger; text-align: center;">Proofreading:</p> <p><b>There are no deviations from the Guidelines.</b> </p>
Mention unusual items in the text.
<p>Reminder on the use of WordCheck: you can always simply match the flagged words with the scan without unflagging/suggesting them; but if you'd like to help future proofreaders, see the following guidelines—</p> <ul> <li>Do suggest proper names, including uncommon or unusually spelled names.</li> <li>Do suggest words in other languages {dialect/ye olde english} if they match the scan.</li> <li>Do suggest abbreviated words.</li> <li>Don't suggest parts of words split across page breaks.</li> <li>Don't suggest suspected typos in the original - please use [** ] to place your comment.</li> </ul> <p style="background-color: #336633; color: #ffffff; font-size: larger; text-align: center;">Formatting:</p> <p><b>There are no deviations from the Guidelines.</b> </p>
Mention formatting items that occur in the text. Mention how to recognize chapters / sections. Other reminders.
<p style="background-color: #336633; color: #ffffff; font-size: larger; text-align: center;">Post-Processing:</p> <p>If you would like to post-process this book, let me know. Otherwise I will post-process it myself. An HTML version is requested. There are no illustrations and no index.</p>
Skip if intend to PP. HTML will be required if there are illustrations, index, or complex layout items (tables, family trees, etc.); requested otherwise. Mention if there are illustrations, index, etc.
<p style="background-color: #336633; color: #ffffff; font-size: larger; text-align: center;">About the book:</p>
- Explain the difficulty level rating (for proofreading/formatting).
- What is the book about?
- Any interesting facts?
- Is this a first by a particular author?
- Blurb/incipit of a book if desired.
- Link to the images source.
- Any related projects?
- Is it from one of the books wanted lists?
<p>Useful link: <a href="">Table of Contents</a>.</p>
Review Common PM Mistakes
I've granted you Project Management status at DP. Your PM mentor should look through your project after you have it loaded onto the server but before you release it for proofing and give an OK to release it. Please do this for your first 2-3 projects.
The mistakes that new PMs make that cause the most grief for our system are as follows:
1, 2, 3, and 4: Not making sure that new projects have all their pages. Please, please check to be sure that all the proofing images are readable, that all the pages are present, and that high resolution versions of all illustrations have been uploaded. I cannot emphasize strongly enough how important this is!
5. Proofing images are too large. The proofing images are the page images that the proofers and formatters will use. We try to keep these down to around 50K each (and certainly under 100K) because many of our volunteers are still working on dial-up or usage metered internet connections. Generally a 300 dpi B&W image that has been scaled to 1000 pixels wide will fall into that range. Some projects with small type, poor contrast, or other issues may require larger proofing images in order to be readable. That's OK when really required but shouldn't be a significant number of projects. On the other hand, don't make the proofing images so small that they are difficult to read!
6. Reasonable page margins: Some projects come in from new PMs with absolutely huge amounts of white space around the text block in the proofing image. This is not likely to happen if the project was scanned by hand, but does turn up occasionally in scans that were harvested from some of the large book-scanning projects. Don't worry about a bit of white space around the text block. Having some does make the page easier to read. But when that white space is as wide or wider than the text block itself, that's a problem.
7. No inline markup! Please be sure that all inline markup has been removed before a new project is loaded. This is really more a CP issue, but since many PMs are also CPs I mention it here.
8. Please make sure that you keep your Good and Bad Word Lists up to date.
9. Please monitor the discussion threads for your projects and try to deal with issues right away. In particular, be sure to change project comments to reflect any changes to, or clarifications of, the Guidelines.
If you know that will be away from DP for more than a day or two while you have projects actively in the rounds, please drop a note to one of the PFs to ask that the PFs (as a group) look after your projects while you are away.
To upload your files to ~dpscans, use the project upload script (the first time you use it, it will create a dpscans folder for you): http://www.pgdp.net/noncvs/project_upload.php
Happy PMing!