PPTools/Guiguts/Guiguts Manual/File Menu
File and Project Management
Some of the File menu's options are the standard ones found in most programs; others, particularly the ones on the Project sub-menu, provide capabilities tailored to the way DP and post-processing operate.
Overview of Guiguts File Management
The core unit of proofreading, formatting, and many aspects of post-processing, is the page in the original book: its image and the editable text in that image. When using Guiguts, you see the text in Guiguts' main window, and can see the corresponding image in the external image viewer of your choice.
Guiguts is a Plain Text editor. All of its files are encoded as UTF-8 and each line ends with the two-character Carriage-Return/Line-Feed (CRLF) sequence used by DP and Project Gutenberg.
The .bin File
Guiguts keeps the editable text and the matching page image synchronized through the use of an extra file it creates and maintains in the same folder as the text file you are editing. It's called the .bin file; it's name is the same as the text file's name, including the filetype .txt or .html, but with an additional filetype of .bin. For example, if you've just opened a newly-downloaded text file named projectID5c4e178999ae5.txt and you save it (without changing its name), Guiguts will create and save a file named projectID5c4e178999ae5.txt.bin. The .bin file does not contain the text; it contains information about where each page begins in the text (at the -----Separator lines-----), and some other information that was unique to this particular project when the file was saved. (Despite the implications of the filetype, this is a Plain Text file.)
Each time you save the text file, with whatever name you choose, Guiguts will save a corresponding .bin file. Even after you've removed the Page Separator lines, the .bin file will let Guiguts keep track of the page boundaries and the page images that correspond to the text. The .bin file contains other project-specific information as well, e.g., where the cursor was when the file was saved, and the names of the proofreaders (if that information was available).
Some file management systems, such as the Wiki, maintain a history of all changes since a document's creation. Guiguts does not do that. It can keep a few backups of each file you're working on, but eventually, if you just use simple "Saves" that retain the same file name, Guiguts will delete the oldest copy.
The nature of post-processing is such that you don't necessarily develop in a straight line: sometimes, you'll want to go back several steps to try something different, or to retrieve an earlier version of a portion of the text. So, it's good practice to use "Save As" frequently, specifying a new file name that meaningfully describes the file's state.
Also, you will reach a point in post-processing beyond which the Plain Text and HTML versions must diverge. You can use "Save a Copy As" to create what will become the HTML version, and then keep working with what will become the Plain Text, using the same name as before, or a name you choose when using the normal "Save As".
Opening a File
If Guiguts already is running, select File>Open or click in the toolbar to bring up a file-open dialog. Use it to navigate to the desired file. Guiguts loads the file you choose as the current document and displays it in the window.
Guiguts also searches for a matching .bin file and opens it to obtain page-boundary and other information about the file. Guiguts also scans the text for page-separator lines, unless this action has been disabled. See below for managing page separators and the .bin file.
Guiguts can read files with various kinds of line endings, but always saves files using DP-style line endings: CRLF. It does this on all platforms, including Unix, Linux, Mac, and Windows.
Recent Files List
Once a file has been opened, its name is pushed onto the top of the list of recently-opened files below the Open command in the File menu, where it is convenient for opening another time. Once the list grows to a certain size (9, by default), subsequent opens and saves will cause the oldest entry in the list to be dropped (only from the list).
Saving a File
To save the file in a different folder or under a different name, use File>Save As. A file-save dialog opens. Navigate to the desired folder and enter the filename to use. Guiguts also saves a .bin file in that folder.
To save under a different name and continue editing under the current name, use Save a Copy As.
Backup Files Automatically
You can enable and disable periodic auto-save using Toggle Auto Save on the Preferences>Backup sub-menu, or by shift-right-clicking in the toolbar. The icon's background will be green when Auto Save is enabled. Use 'Auto Save Interval on the same sub-menu to set the period. Each time that period elapses, Guiguts saves the file. This ensures that you can't lose more than a few minutes of work to a crash.
Guiguts preserves up to three most recent versions of the current file (with the same name). When Auto Backups is enabled, and you save the file mybook.txt which already exists on disk, the existing file is renamed to mybook.txt.bk1. If mybook.txt.bk1 already exists, it is renamed to mybook.txt.bk2, and if mybook.txt.bk2 already exists, it is deleted. Manual saves operate the same way.
The autosave timer is automatically reset when a save is done for any other reason. For a few seconds before an autosave is to be done, in the toolbar flashes yellow. Right-click on it to reset the autosave interval, thus preventing the impending save.
Caution: when you enable autosave and auto backups together, each autosave action creates a new generation of backup file. Thus the two backup levels (.bk1 and .bk2) will soon contain the last two autosaved versions.
Reverting to Saved
Unlike some editors, Guiguts does not have a "Revert to Saved" command in the File menu. However, you can revert to the last-saved version of the file by selecting the file's name in the File menu, re-opening it. If you have made changes, you are prompted to save the file first; just click the "No" button to reload the file, cancelling all changes since the save.
Including (inserting) a File
Select File>Include to insert the entire contents of another file into the current document. A file-open dialog appears. Navigate to the desired file and click Open. The contents of the selected file are inserted at the current insertion point location. If there is a selection, the file data replaces it.
The insertion can be undone with Edit>Undo or by clicking on the Tool bar, or by the standard keyboard shortcut, ctrl-z.
The Project Sub-Menu
Like the button of the same name on the Status bar this opens the image of the current page in the external image viewer you've previously selected.
View Operations History
If you've selected Preferences>Processing>Track Operations History, this "View" option will display a log of the tools you've used since then. Tracking is not selected by default, so this "View" option is "(disabled)" by default.
View Project Comments
In a Browser window, this opens the Projects Comments file that was included in the download materials for this project. Those are the same comments you will find on the Project Page at DP.
View Project Page
In a Browser window, this opens the Project Page at DP.
View Project Discussion
In a Browser window, this opens the Project Discussion at DP.
Set Project Language
Like the Lang button on the Status bar, this shows the current language (English in this example) and lets you change it.
Set Project ID
This displays the current project ID and lets you change it. Normally, you will not want to do so, unless there isn't one, in which case, use this option to enter it manually. Without the correct project ID, some other options won't be able to show you information that must be obtained from the Project Page at the DP website.
Set Image Directory
The image directory contains the page images for your project. By default, is the pngs sub-folder in the folder containing the text file you are editing. This lets you select a different folder.
Page Markers and Page Labels
The first three options in the bottom section of the Project sub-menu are for problem-recovery situations that rarely if ever arise. They probably were more necessary in Guiguts' earlier days.
You will want to use the fourth option, Configure Page Labels, with just about every project, and the best time to do so is as soon as you start editing the project, even before going through the Proofer's Notes. Right-clicking Lbl: None on the Status bar is the quickest way to access this dialog.
Display/Adjust Page Markers
This is used primarily when [some of] the page markers go missing or move to the wrong place; normally, you will not need to use it. It routinely appears when you use the < or > buttons on the Status bar to flip through consecutive pages; unless there's a problem, just ignore it.
After you remove the page separators the page-image boundaries only exist as pointers in the .bin file associated with the .txt or .html file you see on the screen. (Each pointer is the line number and character position within that line at which a given page begins.) Guiguts calls them markers, and if any of them become incorrect, you can try to correct them by using the Adjust Page Markers dialog to see and move them. To open this dialog, right-click (Mac: control-click) in the Img: field of the status bar.
As the dialog opens, the invisible page markers are revealed as bright-yellow insertions in the text. To close the dialog and hide the markers, right-click in the Img: field again.
The number of the page marker next after the insertion point is shown.
Use the Previous Marker and Next Marker buttons to step through the page
markers in sequence. To jump to a particular marker, edit the yellow
field at the top to show a page number one higher or lower than the
marker you want. Then click Previous Marker or Next Marker. The document
scrolls to the center the line containing the marker you want.
Managing Page Markers
The page markers represent the points where Guiguts will insert visible or invisible page anchors when you generate the HTML for the book. Normally you would not want to move, delete, or add to these markers because doing so makes the "pagination" of the HTML version different from that of the original book.
One possible reason to insert a marker: sometimes the people who scan a book scan the blank pages and sometimes they do not; and sometimes they scan some blank pages and not others. You may want to insert a marker to account for an unscanned page, so that the number of markers agrees exactly with the number of surfaces in the actual book.
You might receive a book that has no page markers. (Perhaps the book has been partly processed, and the .bin file, where the marker information is kept, has been lost.) In that case you could install page markers. The command File> Guess Page Markers will create markers evenly spaced through the document. Then you can use this dialog to insert, delete, and move the markers to match the page images.
To move the current marker in the text, click the four move buttons. The marker slides one character left or right, or one line up or down.
To add a new page marker at the end of the book, set the insertion point in the text somewhere after the last existing marker. Then click Add. To insert a new page marker between existing ones, set the insertion point where the marker is to go, and click Insert. A new marker is inserted and following markers are incremented by 1 if necessary to prevent duplicates. (See next paragraph before using these buttons.)
WARNING: The "Add," "Insert," and "Remove" buttons also delete ALL Page Labels, not just the one associated with the page marker being added, inserted, or deleted. If you are using Page Labels and you save your file after clicking any of these buttons, there will be no Page Labels. You can bypass this by opening "Configure Page Labels" before using these buttons and clicking "Use these values" (in "Configure Page Labels") afterwards. Otherwise, you may be able to recover from this by replacing the damaged .bin file with the most recent undamaged one (do that when Guiguts is not running).
To remove a marker, use the dialog to navigate to the marker you want to remove, and click Remove. The marker is taken out of the book, all Page Labels are deleted (see previous paragraph), but no other changes are made.
Clicking Insert Page Markers inserts a text string of the form [Pg 003] (with leading zeros if appropriate) into the text following every (invisible) page marker. These markers correspond to the .png names of the page images of the original book. You can apply a regex replacement to give them a different appearance. Note that these numbers normally do not match the page numbers printed in the original book and are unlikely to be useful to people reading any version of the final eBook.
By contrast, clicking Insert Page Labels inserts a text string of the form [Pg 3] (without leading zeros) into the text following every (invisible) page marker to which you have previously assigned a page label. These markers correspond to those page labels, which should match the page numbers of the original book. You can apply a regex replacement to style these as you want, for example as visible page numbers in the Plain Text version. These numbers may be useful to people reading the Plain Text version of the final eBook, particularly when there is an index and/or internal references to other pages in the text. This is not intended for use with the HTML version, which is able to use the "pagenum" class and id="Page_nnn" created during HTML Auto Generation.
The Page Offset field and Renumber button can be used to renumber the page markers from the current point onwards, for example to make a gap in the numbering in which to insert some page markers, or to close a gap by using a negative offset.
Page Numbers vs. Folios
The scanned pages we receive usually, but not always, are numbered sequentially from 1 and we casually refer to these as "page numbers." Confusion arises because these numbers often are not the same as the numbers printed on the pages of the book. The technical term for the number that is printed on a book page is folio. If we use these terms separately, we can avoid some confusion:
- page number: the sequence number of a scanned page image.
- folio: the number printed on the page of a book.
The distinction is important because the numbers in a table of contents, an index, or a cross-reference ("see p.196") are folios—not page numbers.
Page numbers are a simple count of every scanned surface. Folios are assigned using complex and not terribly consistent rules that reflect the logistics of printing technology. For example, the "front matter" (comprising title pages, preface, contents, etc.) was usually the last part of the book finished, after the body had been set up in type. That meant body folios had to be assigned before the quantity of front-matter pages was known. Sometimes the front matter was left un-folio'd; sometimes it got its own series of folios, often lowercase roman numerals; then the body started over again with folio "1." When one or more glossy "plates" were inserted in the book, they might be numbered sequentially with the pages, or they might be referenced as "facing page nn." If a plate had a blank reverse, the blank face might count in the folio sequence or not—and it might have been scanned, giving it a page number, or not.
In summary: page numbers rarely match folios and the numerical relationship between the two can change as you go through the book. However, when Guiguts auto-generates HTML it inserts an anchor at each page boundary with the folio (number) of the page. It is important that these anchors reflect the folios, not the page numbers. Then it is easy to link the page references in the contents, the index, and in cross-references to the correct page.
Guess Page Markers
This is something to consider using only as a last resort, if you've removed the Page Separators (which always must be done at some point) and something's gone wrong with the .bin file. The results will not be a substitute for the lost information.
Set Page Markers
Guiguts uses the Page Separator lines to associate page images with the text. If the Page Separators are available, and there's no .bin file yet, Guiguts does this option automatically. Otherwise, you can use it to override what's already in the .bin file, and the new results will be stored there on the next save.
Configure Page Labels
Guiguts associates a numeric label with each page marker. Initially, the labels are the same as the image numbers: image 001 has label Pg001. It is these labels that are used when creating the HTML page-anchors usually displayed in the right margin, referenced in the Index and sometimes within the text. You use the Configure Page Labels dialog to change the labels so that they properly reflect the folios throughout the book.
Please note that the appearance and method of using this Dialog changed beginning with Guiguts version 1.3. Among the benefits: much-improved scrolling and support for books that are thousands of pages in length.
To open the Configure Page Labels dialog, right-click (ctrl-click may work on earlier versions of Mac OS, but does not work in 10.5 and 10.6) in the Lbl: field in the status bar (the field just to the right of the See Image button), or select it on the File>Projects sub-menu. If you have not yet set the page labels, it reads Lbl: None.
Note: You may need to drag the window wider in order to expose all the columns.
This dialog has settings in the top row of boxes and a table with one row for each page image in the book. The columns of this table show:
- the image file numbers in sequence, e.g. 019:. Guiguts takes these from the names of the files in the "pngs" folder. If there is no "pngs" folder, Guiguts will ask you to select the folder containing the page images.
- the Style of the page number: Arabic, Roman, or a ditto mark meaning "same as preceding page."
- which of three actions Guiguts will use to recalculate the page number: Start@, +1, or No Count. The default always is +1 (next consecutive number in ascending sequence).
- A numeric field showing the starting number to use when the action is Start@. You enter this in the "Action" box at the top of the Dialog using only Arabic numerals, but if the Style for that page number is Roman, the resulting page number will be shown in Roman after you click "Recalculate".
- The page numbers assigned to each image, initially empty. When you click "Recalculate," updated number assignments will appear to the right of the --> e.g., Pg 19. After you've used this Dialog to match the page numbers to what's printed in the book, and clicked "Recalculate", these numbers will update, e.g., to Pg 3 (as shown below).
Using the Page Labels Controls
To change the settings for any page:
- click any place in its row and it will be highlighted
- click one of the three choices in "Style" to select Arabic / Roman / ditto. Shortcut: shift+left-click to rotate through the three choices
- click one of the three choices in "Action" to set the page number. Shortcut: ctrl+left-click to rotate through the three choices. When "Start @" is the choice, the text box in the "Action" box grabs the keyboard focus and you can type in a page number.
To see the results and make sure you are matching the page numbers printed in the original book, click "Recalculate" from time to time.
There are three ways to see the image of the currently-selected page:
- double-click its row in the list. You can do that as you select and highlight it
- click the "View Img" button
- turn on "Auto Img" in the "Img" box. The image will appear even when single-click the row to select it
If you have not yet told Guiguts where to find your preferred image viewer, the first time you attempt to display a page image, Guiguts will ask you to select the viewer by using a standard file dialog. Your choice will be stored in the Preferences>File Paths>Set File Paths area.
Scrolling the Page Labels List
You can use most of the standard scrolling techniques with this Dialog, except for keyboard operations such as the arrows. There also are several ways of scrolling rapidly through the page list, although not all of them will work on all systems:
- roll the mouse wheel
- hold down the wheel (or middle button) in the list, then drag up or down (very fast)
- click above or below the slider
- hold and drag the slider (very fast)
- click middle button (or press the wheel down) in an empty part of the scrollbar (moves to approximately that percent location)
- click and hold left button in the list, then drag up or down
- ctrl+click above or below the slider to move to the first or last row of the list
Dealing with Front Matter
Pages such as the title page and blank pages at the beginning of a book often are unnumbered, and other early pages often use Roman numbers. You can use combinations of "Roman," "No Count", "Start @", "+1", and eventually "Arabic" to match your page numbering to what's in the book (first image below), then click "Recalculate" to see and check the results (second image below):
Dealing with Plates
Some illustrations in our books are on unnumbered pages that were ignored in the page numbering sequence. They usually are photographs that were printed separately on higher-quality paper, and their obverses tend to be blank. We refer to such pages here as "plates". They were inserted between consecutively-numbered folios during the binding process, and we need to adjust our page numbering accordingly. We do this by using "No Count" (sometimes in combination with "Start @").
In the book used in this example, Image 37 was a photograph and Image 38 (the obverse of that piece of paper) was blank. Our page numbering looked like the first image below, so we just change the "Action" for those two pages to "No Count" and click "Recalculate" (second image below):
Dealing with Skipped Blank Pages
Just as there are blank pages near the beginning of the book, there may be blank or other kinds of unnumbered pages elsewhere. We can adjust the numbering in such areas in the same way as we did in the Front Matter: set the "Action" for the blank or unnumbered pages to "No Count", then set the "Action" for the next numbered page to "Start @" and enter the number shown in the image for that page.
The options on this sub-menu are not used by post-processors and their use is beyond the scope of this manual, so the information below is incomplete. The Import and Export options will open a File Open or Save dialog to let you select the file to import or the name under which to export.
Importing and Exporting Prep files
While the main use of Guiguts, and indeed the intended purpose of Guiguts is as a complete post processing tool, there is some functionality which is intended for other purposes, such as the ability to import and export prep files.
Import Prep Text Files is used to load a document stored in an older Project Gutenberg format, in which each page is stored in a single file nnn .txt where nnn is the page number. It can also be used to import files for pre-processing before they are first uploaded for proofreading.
Guiguts presents a file-open dialog. Navigate to the folder containing the files in this form and click Open. Guiguts searches that folder for files with names in the expected format and loads the contents of each in numeric sequence. It records a page separator between each file's data. After loading the file(s), Guiguts goes directly into the "file save as" dialog, to save the consolidated file (don't forget the .txt suffix) and create a .bin file for it. The save dialog opens in the file detail directory, although generally one would save the file in that directory's parent. Word Frequency Character Counts requires that the file be saved before it will work.
Export As Prep Text Files lets you write the current document, regardless of its source, in the older format. Guiguts presents a file-open dialog. Navigate to the folder where you want to store the many small files nnn .txt and click Open. Guiguts writes a file for each page.
Export/Import One File with Page Separators
These are not used by post-processors, and their use is beyond the scope of this manual.
Import TIA Abbyy OCR File
This is a feature for Content Providers (CP) and Project Managers (PM) who do not have OCR software. It takes an Abbyy file, downloaded from The Internet Archive, and converts it into a text file suitable for the PM to process and upload for proofing.
Highlight WF Characters Not in Selected Suites will cause any characters shown in the Word Frequency Character Count list to be highlighted if they are not in the selected character suites. If a highlighted character is control-clicked in the WF dialog, the user will be prompted if they wish to enable a character suite that contains that character. (Note that the character suite suggested will be the first alphabetically that includes the character in the case that more than one suite contains the character.) If the character that is control-clicked is not in any of the DP character suites, a warning is issued instead. See Character Suites for available suites.
Manage Character Suites displays the following dialog that lists all the character suites available at DP. It is possible to enable and disable any of these except for Basic Latin, which is required in all projects. (The list may expand over time.) If a highlighted character is control-clicked in the WF dialog, the user will be prompted if they wish to enable a character suite that contains that character. (Note that the character suite suggested will be the first alphabetically that includes the character in the case that more than one suite contains the character.) If the character that is control-clicked is not in any of the DP character suites, a warning is issued instead.
CP Character Substitutions
The Content Providing menu contains a button to quickly replace any occurrences in the whole file of the tab, emdash and curly quote characters (with space, double hyphen and straight quotes respectively) since these characters are not permitted in projects during the proofing and formatting stages.
Closes the current file and closes Guiguts, along with the command-line window that has been controlling it. If the current file has been changed since the last Save, Guiguts will prompt you to save it first.
Page Markers and the Metadata File
Many useful Guiguts features depend on knowing where the text from each page-image begins and ends. For example, this is what allows Guiguts to display the page-image file for the cursor location.
Page Separator Lines
Initially, data on page-image boundaries comes from the page-separator lines in the downloaded text file. These lines may look like:
or, if you are using a copy that includes proofers' names:
When Guiguts opens a file and does not find a .bin file with the same name, it scans the text looking for the page-separator lines. For each page separator, Guiguts notes the page-image filename and the names of the proofers.
The scan for page-separator lines can take several seconds. If you do not want Guiguts to make this scan by default, you can disable it in the Preferences menu. You can initiate a scan for separators manually using File>Project>Set Page Markers.
The .bin Metadata File
Page-separator lines are not a permanent part of the book. Part-way through the post-proofing workflow you tell Guiguts to delete them. In order to be able to keep track of page-image boundaries after the visible separator lines have been deleted, Guiguts records the page boundary points in a separate file. This .bin file also holds other metadata about the file, such as the page label numbers and bookmarks you have set. But the most important data in the .bin file is a list of the page boundary locations as offsets in the text, along with the names of the proofers for each page, if they were present. The .bin file is rewritten every time you save the file.
The full name of the .bin file is filename . type .bin that is, .bin is appended to the full filename of the document. If the document is volume_2.txt then its .bin file is named volume_2.txt.bin.
If you use File>Save As to create volume_2.html, you will then have four files: volume_2.txt and volume_2.txt.bin, and volume_2.html with its related volume_2.html.bin.
Cautions on .bin File Use
Guiguts looks for the .bin file only in the same folder as the document file. When you move a file from one folder to another, you need to move its matching .bin file also. If you do not, Guiguts cannot load your bookmarks and data on page boundaries and proofers.
You can tell when Guiguts lacks page-boundary information. After the file is open and you click in it, the status bar ought to display the Page nnn and See Image buttons. If it does not, there was no .bin file or it was not readable.
If you lose the .bin file after you have cleared the page separator lines, there is no good way to recover the data. You can select File>Guess Page Markers, which generates some page boundaries at fixed intervals of text lines, but these are unlikely to correspond to the original pages.
If you use a different editor to modify a document (for example, WordPad, BBEdit, vi or emacs), that editor will not know about the .bin file, and will not update it to reflect changes in the relative position of text as Guiguts would do. Thus once you have modified a document outside of Guiguts, Guiguts may no longer have correct page boundary information. The results can be unimportant (Guiguts displays the wrong page image), or they can be serious (Guiguts generates improper HTML). If Guiguts already has generated the HTML version and the pagenums are in place, the file still is usable, even though Guiguts sometimes may display the wrong page image to you. Otherwise, consider the consequences of the external edit.
An earlier version of this manual suggested deleting the .bin file after using an external editor, but doing so also will have serious consequences, so that advice has been replaced with this paragraph.
Opening Multiple Files
Toward the end of the PP process you typically will have two versions of a book: Plain Text and HTML. Often, as you edit one version, you will find errors that should be corrected in both versions. You can open, fix and save each version serially, but that is tedious.
It would be convenient to have all versions of a book open at the same time, so you can make fixes in each and save them simultaneously. To do this in Guiguts, you need to launch multiple copies of the Guiguts program. Each instance of the program edits one document file, but you can copy from one document and paste into another.
Guiguts is not written to be used in this way but it does work. As long as each instance of Guiguts is editing a different document file, there is no interference. (It would be unwise to edit the same document in two instances of Guiguts; unpredictable things would happen if both tried to autosave at the same time.)
When each copy of Guiguts is terminated, it updates the settings file, so your preferences (for example, your saved search-and-replace patterns and window position on the screen) will reflect the usage in the copy of Guiguts that terminates last.
Very Large Files
Beginning with version 1.3, the Configure Page Labels Dialog can handle books that are thousands of pages in length, so the information formerly in this section is obsolete. (It's still in the underlying Wiki page, but commented out.)