User:Mairis/Workflow

Work in progress...

Project Setup

Software

Reference Folder

\clearance
\illo_jpg
\illo_tifs
\img_jp2
\img_png
\img_raw_tifs
\img_ref
\img_split
\img_st
\scantailor
\text
\textw
\upload
notes.txt

Preparation

Check Suitability

Check the work is in the public domain
- USA published ≤ 1930
- UK author's death ≤ 1955
Search the In Progress List
Compare sources for best scans:
- Colour preferred over b&w
- Good quality scans
  - Clear, legible print
  - HQ illustrations (600dpi)
- Complete
  - No missing pages
  - Includes all illustrations listed

Create Project Folder

Make a copy of reference folder
Rename to book title
Download images from source
1. DL raw JP2 files (zip or tar)
2. Extract to \img_jp2
3. Delete blank pages before/after book content
Prep TP&V and save to \clearance

Clearance Request

Fill out a clearance request
- Upload tp&v images
- Missing information can be found at WorldCat or national libraries.
- Include explanations and links to sources
Submit and wait for approval
- Usually takes a few days to be reviewed
- Recieve email with results

Image Prep

Keep notes.txt open while you work and record anything that might be useful.

Convert JP2 files to TIF

Open jp2 image in XnView
Browse (folder icon)
Select all images
Batch convert (icon)
Select tif.xbs preset
Convert
Close XnView

Renumber TIF files

Open \img_raw_tifs in Bulk Rename Utility
Rename all pages prior to page 1 using "front matter" preset
- Add > prefix > 000
- Number > insert > at 3
- Type > a-z
- Remove > last 3
Rename remaining pages using "renumber" preset
- Numbering
  - Mode: prefix
  - Pad: 3
- Remove > last 3
Check last page; the page number and file number should match
Close Bulk Rename Utility

Scan Tailor

Open ST and create new project
- Input: \img_raw_tifs
- Output \img_st
- Select all images except cover
- Set 600dpi for all pages
- Save to \scantailor (save periodically as you work)
Fix orientation
1. Rotate first page the right way
2. Apply to > every other page
3. Do the same with the next page (opposite direction)
Select content
1. Press arrow button for auto
2. Click ‘beep when finished’ and wait
3. Arrange by height and check top and bottom images
  - Resize box as necessary until the pages are more or less the same size
  - ‘Remove content box’ from empty pages
4. Repeat for arrange by width
5. Scan image thumbnails and alter anything that doesn’t look right
Margins
1. Set all margins to 5.0 > apply to all pages
2. Edit alignment of pages
  1. Order by height and focus on the top of the list
  2. Title page, dedication, etc should be centred
  3. Chapters that start midway down the page set to bottom
Output
1. Set output resolution to 600 > apply to all pages
2. Black and white > apply to all pages
3. Set despeckling to none > apply to all pages
4. Select title page and press the arrow to apply to all pages
5. Select ‘beep when finished’ and wait

Split Pages

Index
1. Start a new project in Scan Tailor
  1. Input > \img_raw_tif
  2. Select pages that need to be split
  3. Output > img_st_index
2. Follow the same steps above, except:
  1. Split the page in half > apply to all and adjust
  2. Include the header only in the first image, crop to exclude in all others
3. Run output
4. Rename files using Bulk Rename
  1. Add “r_” prefix
  2. Replace “_1L” with “a”
  3. Replace “_2R” with “b”
5. Move the reference images from \img_st to \img_ref
6. Copy the split files from img_st_index to img_st
7. Open the first split file and the first reference file in Paint
  1. Copy the full title onto the split page and save
Change image format
1. Open XnView
2. Tools > Batch convert
  1. Input:
    1. Add TIF files from \book\img_st
  2. Output
    1. Output folder to img_png
    2. Filename: {Filename}
    3. Format: PNG
  3. Click ‘convert’ and wait
3. Batch convert again
  1. Actions
    1. Resize shortest side to 1000px
      1. (if format and resize are changed at the same time, the size gets messed up)
4. In the file explorer, sort the images and

Text Prep

gImageReader

Open \img_st
Select all TIF files > Recognise all English > Batch mode…
Leave options boxes unticked and click OK
When finished, close gImageReader

Guiprep

Add all txt files from \img_st to \textw in the book folder
Change directory > \book
Process Text
1. Leave ‘rename txt files’ unchecked
2. Leave ‘convert to ISO 8859-1’ unchecked
3. Leave ‘fix old English’ unchecked (unless necessary)
4. Click ‘Start processing’ and wait.
Check headers/footers and remove if necessary
Close Guiprep

Python Script

Run post_guiprep.py (corrects quotes, removes problem characters, etc)

Illustration Prep

MOVE cover to folder illo_tifs
COPY the title page and other illos to same folder
Open Scan Tailor
New project > open illo_tifs
Crop and rotate
Save project to same folder

Change illustration format

Select files from \book\illo_tifs\out
Actions unchecked
Output
1. Folder: \illo_jpg
2. Filename: {Filename}
3. Format: JPG
4. Convert

Create ZIP

Copy images, text, and illustrations (and ref images) into \book\book
Zip this folder

Upload

Create Project

Create a new project on DP
1. PM tab > Create project
2. Fill in the information (it should match the clearance request)
  - Include publishing date in title: [1873]
  - Author: Surname, First Name
  - Add extra character sets if necessary
  - Copy and paste standard format to project comments
  - Fill in information (links to author wiki, information about context of book, things to look out for)
3. Add project comments for proofreaders
  - Are there other languages included? Do they have WordCheck dictionaries available?
  - Have extra character sets or special characters been added?
  - Are there characters without unicode that need special handling?
  - Is there anything that might make the characters difficult to read? (fading, ink blots)
  - Any other useful info from notes.txt?
Upload files to DP
1. Upload the zip folders to file manager
2. On project page, enter zip file location in the Add/Replace field
3. Delete files from files manager
Update the word lists
1. On project page, click ‘edit project words list’
2. Review and add words to GWL
3. If there are words the OCR has repeatedly read wrong:
  1. Open Notepad++
  2. Search > Find in files
  3. Fill in the find & replace
  4. Run in \text
  5. Repeat for all misread words
  6. Zip \text and reupload to replace current files
Review project page
1. Project Quick Check
2. Check images (page numbers correct, illustrations include cover and title page)
Add holds (P1 Waiting, F1 Waiting)
Change state of project: New Project > P1: Unavailable > P1: Waiting

Release the Project

Remove the P1 Waiting hold
Monitor project:
- Review suggestions and keep good/bad words list up-to-date
- Answer questions in the project thread
- Search concatenated text file for proofer’s notes
- Add any salient info to the project comments

Formatting

When the project reaches the F1 waiting hold:

Review notes.txt for anything relevant
Look through the project for anything that may need special formatting instructions
If there is anything unusual, message dp-format for advice
Update the project comments with the formatting template and any special instructions
Include any relevant info for the PPer
Remove the F1 waiting hold

Notes

I use P1 Waiting holds to:
1. limit the number of projects available in P1 (PM limit is 13/round), leaving a couple of extra slots for special day projects or other projects I want to go straight into P1
2. control the number of projects available in different languages

Some of the processes can be saved as templates in their respective programs to save time.

User:Mairis/Workflow

Contents

Project Setup

Software

Reference Folder

Preparation

Check Suitability

Create Project Folder

Clearance Request

Image Prep

Convert JP2 files to TIF

Renumber TIF files

Scan Tailor

Split Pages

Text Prep

gImageReader

Guiprep

Python Script

Illustration Prep

Create Project

Release the Project

Formatting

Notes

See also

Official Docs

User Guides

Navigation menu

User:Mairis/Workflow

Project Setup

Software

Reference Folder

Preparation

Check Suitability

Create Project Folder

Clearance Request

Image Prep

Convert JP2 files to TIF

Renumber TIF files

Scan Tailor

Split Pages

Text Prep

gImageReader

Guiprep

Python Script

Illustration Prep

Create Project

Release the Project

Formatting

Notes

See also

Official Docs

User Guides

Navigation menu

Search