User:Mairis/Workflow
Jump to navigation
Jump to search
Work in progress...
Project Setup
Software
- Scan Tailor
- gImageReader
- Guiprep
- Notepad++
- IrfanView
- XnView
- Hathi Download Helper
- Bulk Rename Utility
Reference Folder
- \clearance
- \illo_jpg
- \illo_tifs
- \img_jp2
- \img_png
- \img_raw_tifs
- \img_ref
- \img_split
- \img_st
- \scantailor
- \text
- \textw
- \upload
- notes.txt
Preparation
Check Suitability
- Check the work is in the public domain
- USA published ≤ 1929
- UK author's death ≤ 1954
- Search the In Progress List
- Compare sources for best scans:
- Colour preferred over b&w
- Good quality scans
- Clear, legible print
- HQ illustrations (600dpi)
- Complete
- No missing pages
- Includes all illustrations listed
Create Project Folder
- Make a copy of reference folder
- Rename to book title
- Download images from source
- DL raw JP2 files (zip or tar)
- Extract to \img_jp2
- Delete blank pages before/after book content
- Prep TP&V and save to \clearance
Clearance Request
- Fill out a clearance request
- Upload tp&v images
- Missing information can be found at WorldCat or national libraries.
- Include explanations and links to sources
- Submit and wait for approval
- Usually takes a few days to be reviewed
- Recieve email with results
Image Prep
Keep notes.txt open while you work and record anything that might be useful.
Convert JP2 files to TIF
- Open jp2 image in XnView
- Browse (folder icon)
- Select all images
- Batch convert (icon)
- Select tif.xbs preset
- Convert
- Close XnView
Renumber TIF files
- Open \img_raw_tifs in Bulk Rename Utility
- Rename all pages prior to page 1 using "front matter" preset
- Add > prefix > 000
- Number > insert > at 3
- Type > a-z
- Remove > last 3
- Rename remaining pages using "renumber" preset
- Numbering
- Mode: prefix
- Pad: 3
- Remove > last 3
- Numbering
- Check last page; the page number and file number should match
- Close Bulk Rename Utility
Scan Tailor
- Open ST and create new project
- Input: \img_raw_tifs
- Output \img_st
- Select all images except cover
- Set 600dpi for all pages
- Save to \scantailor (save periodically as you work)
- Fix orientation
- Rotate first page the right way
- Apply to > every other page
- Do the same with the next page (opposite direction)
- Select content
- Press arrow button for auto
- Click ‘beep when finished’ and wait
- Arrange by height and check top and bottom images
- Resize box as necessary until the pages are more or less the same size
- ‘Remove content box’ from empty pages
- Repeat for arrange by width
- Scan image thumbnails and alter anything that doesn’t look right
- Margins
- Set all margins to 5.0 > apply to all pages
- Edit alignment of pages
- Order by height and focus on the top of the list
- Title page, dedication, etc should be centred
- Chapters that start midway down the page set to bottom
- Output
- Set output resolution to 600 > apply to all pages
- Black and white > apply to all pages
- Set despeckling to none > apply to all pages
- Select title page and press the arrow to apply to all pages
- Select ‘beep when finished’ and wait
Split Pages
- Index
- Start a new project in Scan Tailor
- Input > \img_raw_tif
- Select pages that need to be split
- Output > img_st_index
- Follow the same steps above, except:
- Split the page in half > apply to all and adjust
- Include the header only in the first image, crop to exclude in all others
- Run output
- Rename files using Bulk Rename
- Add “r_” prefix
- Replace “_1L” with “a”
- Replace “_2R” with “b”
- Move the reference images from \img_st to \img_ref
- Copy the split files from img_st_index to img_st
- Open the first split file and the first reference file in Paint
- Copy the full title onto the split page and save
- Start a new project in Scan Tailor
- Change image format
- Open XnView
- Tools > Batch convert
- Input:
- Add TIF files from \book\img_st
- Output
- Output folder to img_png
- Filename: {Filename}
- Format: PNG
- Click ‘convert’ and wait
- Input:
- Batch convert again
- Actions
- Resize shortest side to 1000px
- (if format and resize are changed at the same time, the size gets messed up)
- Resize shortest side to 1000px
- Actions
- In the file explorer, sort the images and
Text Prep
gImageReader
- Open \img_st
- Select all TIF files > Recognise all English > Batch mode…
- Leave options boxes unticked and click OK
- When finished, close gImageReader
Guiprep
- Add all txt files from \img_st to \textw in the book folder
- Change directory > \book
- Process Text
- Leave ‘rename txt files’ unchecked
- Leave ‘convert to ISO 8859-1’ unchecked
- Leave ‘fix old English’ unchecked (unless necessary)
- Click ‘Start processing’ and wait.
- Check headers/footers and remove if necessary
- Close Guiprep
Notepad++
- 10. Use Notepad++ to remove any tabs in text files (??? maybe skip)
- Search > Find in Files
- Find what: \t
- Replace with: (one space)
- Directory: \text
- Search mode: Extended
- Click "Replace in Files" and OK when the window pops up "Are you sure?"
- Close Notepad++
Illustration Prep
- MOVE cover to folder illo_tifs
- COPY the title page and other illos to same folder
- Open Scan Tailor
- New project > open illo_tifs
- Crop and rotate
- Save project to same folder
Change illustration format
- Select files from \book\illo_tifs\out
- Actions unchecked
- Output
- Folder: \illo_jpg
- Filename: {Filename}
- Format: JPG
- Convert
Create ZIP
- Copy images, text, and illustrations (and ref images) into \book\book
- Zip this folder
Upload
Create Project
- Create a new project on DP
- PM tab > Create project
- Fill in the information (it should match the clearance request)
- Include publishing date in title: [1873]
- Author: Surname, First Name
- Add extra character sets if necessary
- Copy and paste standard format to project comments
- Fill in information (links to author wiki, information about context of book, things to look out for)
- Add project comments for proofreaders
- Are there other languages included? Do they have WordCheck dictionaries available?
- Have extra character sets or special characters been added?
- Are there characters without unicode that need special handling?
- Is there anything that might make the characters difficult to read? (fading, ink blots)
- Upload files to DP
- Upload the zip folders to file manager
- On project page, enter zip file location in the Add/Replace field
- Delete files from files manager
- Update the word lists
- On project page, click ‘edit project words list’
- Review project page
- Project Quick Check
- Check images
- Message dp-format for advice on what to include in the notes for formatters
- Change state of project from “New Project” to “Proofreading Round 1: Unavailable”
Release the Project
- When dp-format provides feedback:
- If there are no notes, proceed to release the project
- If there are notes:
- Write the notes for formatters and save in a message draft
- Add a hold in F1 Waiting
- Release into P1
- Change project status to “Proofreading Round 1: Waiting for Release”
- Monitor project
- Review suggestions and keep good/bad words list up-to-date
- Answer questions in the project thread
- Search concatenated text file for proofer’s notes
- Add any salient info to the project comments
- When the project reaches F1 Waiting:
- Add the formatting notes to the project comments
- Remove the hold