User:Mairis/Workflow

From DPWiki
Jump to navigation Jump to search

Work in progress...

Project Setup

Software

Reference Folder

  • \clearance
  • \illo_jpg
  • \illo_tifs
  • \img_jp2
  • \img_png
  • \img_raw_tifs
  • \img_ref
  • \img_split
  • \img_st
  • \scantailor
  • \text
  • \textw
  • \upload
  • notes.txt

Preparation

Check Suitability

  1. Check the work is in the public domain
    • USA published ≤ 1929
    • UK author's death ≤ 1954
  2. Search the In Progress List
  3. Compare sources for best scans:
    • Colour preferred over b&w
    • Good quality scans
      • Clear, legible print
      • HQ illustrations (600dpi)
    • Complete
      • No missing pages
      • Includes all illustrations listed

Create Project Folder

  1. Make a copy of reference folder
  2. Rename to book title
  3. Download images from source
    1. DL raw JP2 files (zip or tar)
    2. Extract to \img_jp2
    3. Delete blank pages before/after book content
  4. Prep TP&V and save to \clearance

Clearance Request

  1. Fill out a clearance request
    • Upload tp&v images
    • Missing information can be found at WorldCat or national libraries.
    • Include explanations and links to sources
  2. Submit and wait for approval
    • Usually takes a few days to be reviewed
    • Recieve email with results

Image Prep

Keep notes.txt open while you work and record anything that might be useful.

Convert JP2 files to TIF

  1. Open jp2 image in XnView
  2. Browse (folder icon)
  3. Select all images
  4. Batch convert (icon)
  5. Select tif.xbs preset
  6. Convert
  7. Close XnView

Renumber TIF files

  1. Open \img_raw_tifs in Bulk Rename Utility
  2. Rename all pages prior to page 1 using "front matter" preset
    • Add > prefix > 000
    • Number > insert > at 3
    • Type > a-z
    • Remove > last 3
  3. Rename remaining pages using "renumber" preset
    • Numbering
      • Mode: prefix
      • Pad: 3
    • Remove > last 3
  4. Check last page; the page number and file number should match
  5. Close Bulk Rename Utility

Scan Tailor

  1. Open ST and create new project
    • Input: \img_raw_tifs
    • Output \img_st
    • Select all images except cover
    • Set 600dpi for all pages
    • Save to \scantailor (save periodically as you work)
  2. Fix orientation
    1. Rotate first page the right way
    2. Apply to > every other page
    3. Do the same with the next page (opposite direction)
  3. Select content
    1. Press arrow button for auto
    2. Click ‘beep when finished’ and wait
    3. Arrange by height and check top and bottom images
      • Resize box as necessary until the pages are more or less the same size
      • ‘Remove content box’ from empty pages
    4. Repeat for arrange by width
    5. Scan image thumbnails and alter anything that doesn’t look right
  4. Margins
    1. Set all margins to 5.0 > apply to all pages
    2. Edit alignment of pages
      1. Order by height and focus on the top of the list
      2. Title page, dedication, etc should be centred
      3. Chapters that start midway down the page set to bottom
  5. Output
    1. Set output resolution to 600 > apply to all pages
    2. Black and white > apply to all pages
    3. Set despeckling to none > apply to all pages
    4. Select title page and press the arrow to apply to all pages
    5. Select ‘beep when finished’ and wait

Split Pages

  1. Index
    1. Start a new project in Scan Tailor
      1. Input > \img_raw_tif
      2. Select pages that need to be split
      3. Output > img_st_index
    2. Follow the same steps above, except:
      1. Split the page in half > apply to all and adjust
      2. Include the header only in the first image, crop to exclude in all others
    3. Run output
    4. Rename files using Bulk Rename
      1. Add “r_” prefix
      2. Replace “_1L” with “a”
      3. Replace “_2R” with “b”
    5. Move the reference images from \img_st to \img_ref
    6. Copy the split files from img_st_index to img_st
    7. Open the first split file and the first reference file in Paint
      1. Copy the full title onto the split page and save
  2. Change image format
    1. Open XnView
    2. Tools > Batch convert
      1. Input:
        1. Add TIF files from \book\img_st  
      2. Output
        1. Output folder to img_png
        2. Filename: {Filename}
        3. Format: PNG
      3. Click ‘convert’ and wait
    3. Batch convert again
      1. Actions
        1. Resize shortest side to 1000px
          1. (if format and resize are changed at the same time, the size gets messed up)
    4. In the file explorer, sort the images and

Text Prep

gImageReader

  • Open \img_st
  • Select all TIF files > Recognise all English > Batch mode…
  • Leave options boxes unticked and click OK
  • When finished, close gImageReader

Guiprep

  1. Add all txt files from \img_st to \textw in the book folder
  2. Change directory > \book
  3. Process Text
    1. Leave ‘rename txt files’ unchecked
    2. Leave ‘convert to ISO 8859-1’ unchecked
    3. Leave ‘fix old English’ unchecked (unless necessary)
    4. Click ‘Start processing’ and wait.
  4. Check headers/footers and remove if necessary
  5. Close Guiprep

Python Script

  • Run post_guiprep.py (corrects quotes, removes problem characters, etc)

Illustration Prep

  1. MOVE cover to folder illo_tifs
  2. COPY the title page and other illos to same folder
  3. Open Scan Tailor
  4. New project > open illo_tifs
  5. Crop and rotate
  6. Save project to same folder

Change illustration format

  1. Select files from \book\illo_tifs\out
  2. Actions unchecked
  3. Output
    1. Folder: \illo_jpg
    2. Filename: {Filename}
    3. Format: JPG
    4. Convert

Create ZIP

  1. Copy images, text, and illustrations (and ref images) into \book\book
  2. Zip this folder

Upload

Create Project

  1. Create a new project on DP
    1. PM tab > Create project
    2. Fill in the information (it should match the clearance request)
      • Include publishing date in title: [1873]
      • Author: Surname, First Name
      • Add extra character sets if necessary
      • Copy and paste standard format to project comments
      • Fill in information (links to author wiki, information about context of book, things to look out for)
    3. Add project comments for proofreaders
      • Are there other languages included? Do they have WordCheck dictionaries available?
      • Have extra character sets or special characters been added?
      • Are there characters without unicode that need special handling?
      • Is there anything that might make the characters difficult to read? (fading, ink blots)
      • Any other useful info from notes.txt?
  2. Upload files to DP
    1. Upload the zip folders to file manager
    2. On project page, enter zip file location in the Add/Replace field
    3. Delete files from files manager
  3. Update the word lists
    1. On project page, click ‘edit project words list’
    2. Review and add words to GWL
    3. If there are words the OCR has repeatedly read wrong:
      1. Open Notepad++
      2. Search > Find in files
      3. Fill in the find & replace
      4. Run in \text
      5. Repeat for all misread words
      6. Zip \text and reupload to replace current files
  4. Review project page
    1. Project Quick Check
    2. Check images (page numbers correct, illustrations include cover and title page)
  5. Add holds (P1 Waiting, F1 Waiting)
  6. Change state of project: New Project > P1: Unavailable > P1: Waiting

Release the Project

  1. Remove the P1 Waiting hold
  2. Monitor project:
    • Review suggestions and keep good/bad words list up-to-date
    • Answer questions in the project thread
    • Search concatenated text file for proofer’s notes
    • Add any salient info to the project comments

Formatting

When the project reaches the F1 waiting hold:

  1. Review notes.txt for anything relevant
  2. Look through the project for anything that may need special formatting instructions
  3. If there is anything unusual, message dp-format for advice
  4. Update the project comments with the formatting template and any special instructions
  5. Include any relevant info for the PPer
  6. Remove the F1 waiting hold

Notes

  • I use P1 Waiting holds to:
    1. limit the number of projects available in P1 (PM limit is 13/round), leaving a couple of extra slots for special day projects or other projects I want to go straight into P1
    2. control the number of projects available in different languages
  • Some of the processes can be saved as templates in their respective programs to save time.

See also

Official Docs

User Guides