Reconstructing a Guiguts Bin File

From DPWiki
Jump to navigation Jump to search

Introduction

In the discussion that follows, all references to the ".bin" file mean references to the ".html.bin" file created by Guiguts.

You can reconstruct the .bin file fairly easily if you have a .html file which contains generated page numbers. These can be as visible page numbers, hidden page numbers or comments of some sort.

It is worth considering, however, whether you need to reconstruct a corrupted .bin file at all. For instance you do not need a working .bin file to process .txt and .html files in Guiguts. Nor is a working .bin file a strict requirement for uploading finished .txt and .html files for PPV. The PPVers are not obliged to use Guiguts in their work so a working .bin file is only of help to those PPVers who use Guiguts.

Some background - how the procedure works

The main purpose of the .bin file is to allow Guiguts to automatically locate and display the scanned page images as you move through the text in the Guiguts window. It uses XnView for this purpose. While this is a very convenient feature in Guiguts, it is not an essential one. You can always run XnView independently of Guiguts to find and display the scanned page images you want to view.

These scanned page images are stored as PNG format files in the /pngs sub-folder of your project folder. These PNG files have the names 001.png, 002.png, ... and so on. There is one file for each page of the book and they are always named with numbers that increase consecutively from 001.

An important point to understand is that this number (001, 002, ...) in the PNG file name does not correspond with actual page numbers in the book. The link between PNG file and book page number has to be created by Guiguts. This mapping information is then stored in the .bin file. It is through this information that you can click on the "See Img" button at the bottom of the Guiguts window and have the scanned page image that corresponds to the text in the window, automatically displayed by XnView.

Guiguts creates the .bin file and the mapping it contains the first time you SAVE the .txt file extracted from the project zip archive.

To build the mapping, Guiguts uses the page marker info that is found in this .txt file. PPers will be familiar with the general appearance of this file. The start of this file for the book I am currently PPing looks something like like:

-----File: 001.png---\profer1\profer2\profer3\formatter1\formatter2\--------------
[Blank Page]
-----File: 002.png---\profer1\profer2\profer3\formatter1\formatter2\--------------




THE ROMANCE OF MODERN MECHANISM</code>
-----File: 003.png---\profer1\profer2\profer3\formatter1\formatter2\--------------

A line beginning -----File: is a page marker. The string 999.png in the page marker is the name of a PNG file in the /pngs sub-folder. This file will contain a scanned page image. The lines of text that follow a page marker are the proofed and formatted text produced from that scanned page image. In the case of the first page marker above we can see that the file /pngs/001.png contains a scan of a blank page.

Guiguts makes a note of the location of each page marker within the text and stores this information in the .bin file it just created. When you re-open the .txt file in Guiguts you can safely delete these page markers. Guiguts will remember this mapping through its .bin file. It is this mapping that you lose when the .bin file is corrupted or deleted.

The trick used to reconstruct a corrupted .bin file is to replicate this process in your .html file. You insert fake page markers immediately before each HTML pagenum span in the .html file. You then delete your old .bin file and re-open the edited .html file in Guiguts. This creates a new .bin file with the required mapping rebuilt. You then delete the fake markers from your .html file and you are finished!

Applying the procedure in more detail

A knowledge of regular expressions (regexes for short) will be helpful in following the steps described below. However most PPers should be able to modify the regexes I use so they work correctly in their own .html files. It will help to understand these regexes better if I illustrate the two ways in which HTML pagenum span code appears in my .html file:

<p>The old-fashioned watch was a bulky affair, protected<span class="pagenum"><a name="Page_19" id="Page_19">[19]</a></span>
by an outer case of ample proportions. From year to
...
even been constructed small enough to form part of a ring
or earring, without losing their time-keeping properties.</p>

<p><span class="pagenum"><a name="Page_20" id="Page_20">[20]</a></span></p>

<p>For practical purposes, however, it is advantageous to ...

The HTML pagenum code for Page_20 is where we want it to be; that is, on a line of its own. The purpose of the 'prepare your .html file' step described below is to move the HTML pagenum code for Page_19 to a line of its own.

The following steps outline the procedure to follow to rebuild the corrupted .bin file:

  • Run XnView to display the scanned page images of the book. These images will be used as a reference in the remaining steps.
  • Prepare your .html file by moving all the HTML pagenum span code to be on a line of their own. I used the following regexes in the Guiguts "Search" tool to do this.
    In the Search: field
    ([^>])<span class="pagenum"><a name="Page_(\d+?)" id="Page_(\d+?)">\[(\d+?)\]</a></span>
    and in the Replace: field
    $1\n<span class="pagenum"><a name="Page_$2" id="Page_$3">[$4]</a></span>
  • For frontmatter pages with no page number on them, or which use Roman numerals, insert fake page marker records into your .html file by hand.
    • A fake page marker record will look like "-----File: 001.png---\p1\p2\p3\f1\f2\--------------" (without the double quotes). Any text between the back slashes ("\") will do. In a real page marker record, it simply records the ID of the proofer/foofer who worked on the page at each stage from P1 to F2.
    • Position each fake page marker record in your .html file by reference to the scanned page images displayed by XnView.
    • Adjust the "File: 001.png" (i.e. the PNG file name) part of the inserted page marker record accordingly. The current PNG file name is always at the bottom of the XnView window.
  • For the rest of the book, where page numbers use Arabic numerals, you can automate the process of inserting fake page markers by using regexes. However some caution is needed here because the offset between a book page number, and the name of the PNG file that contains the scanned image of that page, will not remain constant if there are discontinuities in book page numbering. A discontinuity typically occurs where plates and blank pages are inserted in the book. These pages are not usually numbered.
    In the Search: field
    (^.*?<span class="pagenum"><a name="Page_(\d+?)" id="Page_(\d+?)">.+?\]</a></span>.*?$)
    and in the Replace: field
    -----File: \C sprintf("%03d", $2+3)\E.png---\a\b\c\d\e\--------------\n$1
    • The initial offset value I used in the "Replace" regex above is 3 (see the $2+3 parameter of sprintf). The initial offset value to use may be different for your own book.
    • Use the "Search" button to find the start of the next instance of the HTML pagenum span record.
    • Then use the "R&S" button repeatedly to insert a new fake page marker record and search forward for the start of the next instance of HTML pagenum span code. Stop at the HTML pagenum span record of the book page after the next page numbering discontinuity. It is best if these discontinuities (if any) are located and noted before you start this step.
    • The offset value will typically increase by 2 at this point; that is, add 1 for the plate plus 1 for the blank page that follows it. The exact offset value required at any point in this step can be obtained by inspection. Compare the page number found on a page image seen in XnView with the PNG file name that contains that page image.
  • When no more fake page markers need to be inserted, save all the changes you have made to the .html file and close it. Delete the current .html.bin file. Now re-open the .html file in Guiguts and it will immediately create a new .html.bin file with the mapping fixed. Save a copy of this .html file as it is important to preserve a version of your .html file with the fake page markers still present.
  • The final step of the procedure is to remove all the fake page markers from your production .html file and the job is done! The following search and replace regexes, when used with the "Rpl All" button, will find every fake page marker we added to the .html file and replace each with the null string. This will have the effect of closing up the space used by the fake page marker.
    In the Search: field
    ^-----File: .+?$\n
    and in the Replace: field
    • Use the "Rpl All" button to apply these regexes as a single operation throughout the whole of the file. Then save and close the .html file.
  • That's it!!