User:Eschaal/PPTools/Mac

From DPWiki
Jump to navigation Jump to search

Introduction

This topic is about the post-processing tools that I use on my Mac, all of which are free and readily available. This topic is not meant to be all-inclusive, since there are always another way of doing things.

Life without Guiguts

Guiguts is a popular program for post-processing. Some would even say that it is almost essential to post-processing. Unfortunately, on the Mac, I have found it to be difficult to install and clunky to use.

Project Gutenberg does not require the use of Guiguts, but it does require the use of gutcheck, a spelling checker, and other tools "to confirm regularity of formatting, accurate spelling, proper line widths and headers, etc." Fortunately, you don't need Guiguts to use gutcheck. Instead you can use the online local gutcheck.

I used Guiguts in the past because it was useful and because there was so much information on how to use it for post processing. I gave up on it when I had problems trying to reinstall it after I upgraded to Lion. Now, I use a combination of AppleScripts, Automator, Bash scripts, and Perl scripts, as discussed below. That combination has worked well, but it is still a work-in-progress.

Automation

Mac OS X has a variety of tools for automation of repetitive work, and there is a lot of repetitive work in post processing.

Automator

Probably the easiest-to-use automation tool is Automator. This is drag-and-drop programming, where one drags actions from a list of possible actions into a workspace. The result can be saved as a workflow, or a stand-alone application, or a service that is readily available at the operating system. The greatest strength of Automator is its ease-of-use, and its greatest weakness is that it is limited to the actions created.

A nice function of an Automator service can be made to be in conjunction with keyboard shortcuts and can be made to only appear, and the keyboard shortcuts only work, when used in a desired application.

MakePDF Service

For instance, one service that I use is only available when I am in a finder window. That service whatever folder is selected in the finder and tries to make a PDF file of the images within that folder. I use that service to make a PDF of all the page images of a project, using a keyboard shortcut for it (command-8) that only works when I am in a finder window. I explain how I actually use this service in post processing in the discussion below about viewing images.

This is what that service looks like as a workflow:

MakePDFWorkflow.png

For more information about this particular service, see the Make PDF of images Service.

AppleScripts

Automator is very easy to use, but it is limited to what actions are available. While the number of actions continues to increase, they are still somewhat limited. That is why it is sometimes better to use AppleScripts, which is a regular scripting language. Although AppleScripts is designed to be easier to use than other scripting languages, it is still programming.

Not every application running on the Mac is scriptable with AppleScripts, but one application that is very scriptable is TextWrangler (a free and powerful text editing software). TextWrangler even has a special place to place scripts used with it. That is in the Library/Application Support/TextWrangler/Scripts. While the Library files are normally hidden from view, they can be found by using the menu File/Go (while holding down the Option key).

Jeebies AppleScript

For instance, I have an AppleScript in TextWrangler that uses "regular expressions" to make special searches. Regular expressions are a very, very powerful type of search and replace. For instance, the search expression </?(i|f|sc|b)> will search for all instances of HTML tags for italics, bold, formatted, and small cap. Regular expressions can be your friend.

One search I use is the Find All for \bhe\b|\bbe\b|\bis\b|\bas\b which searches for possible he/be and is/as errors. I prefer this method to using Jeebie (a checker for he/be errors) because I find moving through search results from TextWrangler easier than moving through Jeebie results.

A script that uses this regular expression is described in the Jeebie script, and is shown below.

JeebieScpt.png

Information on Regular Expressions

For more information about the use of regular expressions, check out:

Bash

One of the things that has made both Automator and AppleScripts more useful is that they have the ability to include bash scripts. Bash (Bourne Again Shell ) is the free version of the Bourne shell distributed with Linux and GNU operating systems. It is similar to the original, but has added features such as command line editing. This is what one normally uses in the Terminal application to modify files.

Bash is most useful for commands at the UNIX level, to do file and folder management. I tend to use it primarily to create various compression files, like the files set to smooth reading or to upload Project Gutenberg.

"pp zip with images" AppleScript

For instance, one of my AppleScripts creates a zip compressed file of a text file, HTML file, and a folder of images. Within that script, after getting the short name of a project and storing it in the variable "filename" it does the following piece of code:

Zipclip.png

That code tells the application Terminal application to do a bash script. For example, if the name of the project is "calloftown" (for the book "Call of the Town" by J. A. Hammerton), then the script does the following in the Terminal app: First, it changes the directory to "dp/pp3/calloftown/3-ppv/" and lists the files in that directory. Then it creates a zip file called "calloftown.zip" and adds to that zip file the folder "images" and its contents. Then it adds to that zip file the text file "calloftown.txt" and the HTML file "calloftown.html". Finally it lists the contents of the zip file.

"removeDS" AppleScript

If there is a .DS_Store file listed in the contents of the zip file, I run another Applescript that uses the "unzip -l filename.zip" command to remove that file.

  • cd Desktop/dp/pp3/calloftown/3-ppv/ - changes the directory to dp/pp3/calloftown/3-ppv/.
  • ls - list the files in that directory.
  • zip -d calloftown.zip images/.DS_Store - removes the .DS_Store file from calloftown.zip.
  • unzip - calloftown.zip - lists the contents of the revised zip file.

Using Bash in Automator Services

Both AppleScripts and Automator can use Bash. One advantage of using an Automator service instead of an AppleScript is that a service can be assigned a keyboard shortcut.

For instance, when in TextrWrangler, one Automator service simply runs the following Bash strip

Smcap.jpg

The result is that it takes what is selected and puts the small cap html tags around it. I assigned it the keyboard shortcut command-carot so that I can quickly add small cap tags where needed.

Perl

While Bash has its uses, a more robust programing language is Perl, which is also scriptable in both Applescripts and Automator.

One example of a Perl program is the image program, which is used to adjusts the page number markers, starting at a specified page number, and adding a specified value (plus or minus) to the page number. (code here)

Both Applescripts and Automator can call Perl programs and embed scripts within it.

An example is an Automator workflow that does the following:

  • Uses an action Ask for Finder Items that asks what finder item to select.
  • Uses an action Copy Finder Items that copies the selected finder item to ppvtest folder.
  • Uses an action Name Single Item that renames the copy to "book.txt".
  • Uses an action Run Shell Script that runs a bash script that changes the directory to the ppvtest folder, runs the program pptxt.pl, and removes the file "book.txt".
  • Uses an action Get Specified Finder Items that selects the file ppvtxt.log.
  • Uses an action Open Finder Items that opens that file.
  • Uses an action Move Finder Items to Trash that deletes that file.

This workflow basically runs a program by Roger Frank that does a quality check on a text file being post processed.

Images

Dealing with images, one needs something to view them and something to manipulate the images.

Viewing images

I use Preview to view the images of scans. I used to open all the images of a image folder by using the File > Open... and selecting the folder of the images. Later, I used an Automator service that opens the folder of the images. Now, I use the Automator MakePDF Service mentioned above that makes a PDF file of the images of the folder, which I use to thumb through the images when doing post-processing.

One way I use this PDF file is to move it into iTunes, and load it into my iPad so that I can see the page images on the iBook app on my iPad while working with the text file in TextWrangler, with the text window as large as possible. This is more important when I work on the Mac Air, than my iMac. The iMac is large enough to display both the image PDF file and the text file in large windows. A nice feature of iTunes is that you can sort PDF files in collections and categories, so you can keep your ebooks separated from your PDFs, and keep your product manual PDFs from the DP project page image PDFs.

Manipulating images

There are two image manipulation software that I use in post processing.

  • Graphic Converter is shareware that has the strengths of being great for manipulating a variety of files at the same time, and it runs directly in Lion. This is the only application that I discuss here that is not free.
  • Gimp is a free software that runs under X11 in Lion. While it isn't as Mac oriented as Graphic Converter, it's strengths are its power and similarity to PhotoShop. I first moved to Gimp when I moved to an Intel Mac that couldn't run my version of PhotoShop. Using custom keyboard shortcuts makes it easy to handle a series of tasks, including cropping, scaling, level adjustments, retouching (including using masks), flattening, and converting type of conversion.

Usually I use the same workflow for images, so I assigned keystrokes shift-1 through shift-5 to common functions:

  • Open window (Keystroke: control-o)
  • Straighten the image (optional) (Keystroke: shift-r)
  • Convert the image to grayscale (optional) (Keystroke: shift-1)
  • Crop the image (Keystroke: shift-c)
  • Scale the image (Keystroke: shift-3)
  • Adjust the color levels (Keystroke: shift-4)
  • Clone to remove image flaws (Keystroke: c)
  • Flatten image (Keystroke: shift-5)
  • Save image (Keystroke: control-s)
  • Close window (Keystroke: control-w)

Usually, I use Gimp for image manipulation of individual images and I use Graphic Converter for those times when I have to make batch changes, like changing a folder of jpg images to png.

Once upon a time I used iPhoto for image manipulation, but the editing capability was limited. It is a great piece of software, but not for the type of photo manipulation needed for post processing.

Text Editing

TextWrangler has been a very popular editor on the Macintosh for many years. Bare Bones Software announced that it has reached the end of its life in 2017 and will no longer be maintained. It will not even run on the most-recent versions of the macOS. However, Bare Bones Software's commercial product, BBEdit, will run in a free "reduced-feature" mode, that is equivalent to the features of TextWrangler. There is no limit to how long you can use BBEdit's free version. A longer explanation is here.

BBEdit is actively being maintained and enhanced. It has a lot of features that make editing and processing books much less time-consuminig. It is robust, shows line numbers, it is very scriptable, its search and replace supports "regular expressions", common tasks can be automated by using TextFactories and a whole lot more. It can be downloaded here.

Some of the AppleScripts that can be used within BBEdit are for the following functions (though these scripts will have to be modified slightly to work with BBEdit) but they will:

  • Initial cleanup of text file. (code here)
  • Search to check accented characters. (code here)
  • Search to check questionable hyphens. (code here)
  • Search to check for proofer comments. (code here)
  • Search to check HTML tags. (code here)
  • Search to check footnotes. (code here)
  • Search to check markups. (code here)
  • Search to check illustrations. (code here)
  • Search to check for he/be and as/is errors. (code here)
  • Initial formatting of text file. (code here)
  • Initial formatting of html file (variations).
  • Creation of Illustrations tags.
  • Creation of Footnotes.
  • Creation of Table of Contents (variations).
  • Creation of Tables.
  • Creation of Index.

In additions to AppleScripts within BBEdit, I use a variety of AppleScripts and Automator scripts to perform the following functions:

HTML Issues

Adjusting page numbers

There is some controversy about page number markers. Some suggest that not only are they not needed, but actually the presence of page numbers interferes in making ebook versions of the book, while others would argue that they are not only needed, but essential. There are even those who argue that the not only should page numbers always be included, but they MUST have anchors so that a future user could use those anchors at a time uncertain. I take sort of a middle view. I will always include page numbers if the project manager demands it or if there is an index. If neither of those conditions are met, I will use page numbers if I feel that they would add value worth the time and effort of keeping them in.

Dealing with page numbers requires time and effort, even if you are using Guiguts, which supposedly automatically generates them. The problem is that it won't put in the right numbers unless you are careful to adjust the image numbers to correspond to the real page numbers. Even if you are really, really careful about adjusting the image numbers to correspond to the real page numbers there is a significant risk that Guiguts will place the page numbers in the wrong position. Thus, even if you use Guiguts (which I don't), if you use page numbers then you WILL have to check each page number manually to confirm that the page number is both right as to the number and as to the location.

I use a three step approach.

Step one, which occurs when I first deal with the project, I replace the page number lines with a simplified page number marker that strips the names of proofers from the page number line. Thus, the line:

 "-----File: 001.png---\nameofP1\nameofP12\nameofP3\nameofF1\nameofF2\--"

becomes

 "-----Image: 001-----"

To do this, I use a regular expression that does a Grep search that uses as its search term;

 ^-----File: ([0-9]{3})(.*?)$

and uses as its replacement string:

 -----Image: \1-----

Step two, which I do early in the post-processing workflow, involves going through the the text file manually to replace the number of the image with the actual page number, and moving the maker to the correct place. I usually do this in TextWranger, by doing a Find All search for " -----Image:" and then scrolling through the search results and making the changes according to the page images.

Step three converts the page number markers to the necessary code, using a regular expression does a Grep search that uses as its search term;

 -----Image: (.*?)-----

and uses as its replacement string:

 <span class="pagenum">[Pg\&nbsp;\1]</span>

After that, I review the html in a browser to see if the page numbers are correct. As I said, using page number markers requires time and effort to do right.

Creating Illustrations

Illustrations markup can be converted to html by use of regular expressions. The regular expressions used would depend upon the type of illustrations markup used and the type of illustration HTML that you want to create. For instance, the following is described in PPTools/Guiguts/HTML:

 <div class="figcenter/left/right" style="width: widthpx;">
 <img src="path-to-image" width="width" height="height"
  alt="Alt text" title="Title text" />
 <span class="caption">Alt text.</span>
 </div>

A different form is used for Punch periodicals, as described in Periodicals/Punch:

 <div class="figcenter" style="width:***WIDTH***%;">
   <a href="images/***.png"><img width="100%" src="images/***.png" alt="" /></a>
   <p>***CAPTION***</p>
 </div>

Other differences occur as to whether or not to include the width and height restraints. For instance, The_Proofreader's_Guide_to_EPUB recommends not to use such width and height restraints.

For instance, one example of using such a template with a regular expression is using the Grep expression with the search term

 ^\[Illustration: ([\d\D]+)]

and the replacement term:

 <div class="figcenter" style="width:***WIDTH***%;">\r
 <a href="images/***.png"><img width="100%" src="images/***.png" alt="" /></a>\r
 <p>\1</p>\r</div>

would work for Punch illustration.

Testing

For text files, I use:

  • Roger Frank's pptxt application, which does a series of checks on a Latin-1 text file. There is an online version, but I use an Automator app that works with a command-line version of the pptxt.

For HTML files, I use:

  • For link checking, I use LinkChecker 7.0 for Mac OS X, which is free software that is available at here