From DPWiki
Jump to navigation Jump to search

Two procedures for creating a formatted and linked Index during post-processing are explained here: the first uses Guigut's "Auto Index". If that doesn't work for you, or you're not using Guiguts, the second uses a series of regular expressions that should work with any editor that supports them.

GUIGUTS' Auto Index

This is very easy, and when it works, it's almost instantaneous. Some of these steps must be done with the Plain Text, just BEFORE running Auto Generate:

1. find the start of the Index and bookmark it (shift-ctrl-1)
2. replace the opening /* with /x and a blank line
3. find the end of the Index and bookmark it (shift-ctrl-2)
4. replace the closing */ with a blank line and x/
5. highlight the index but not what was done in steps 2 and 4
6. on the HTML menu, click: HTML Auto Index (List)
7. run Auto Generate
8. find the start and end of the Index and remove the <pre> and </pre> that were auto-generated
9. Move the page numbers, as AutoGen put them in the wrong (and invalid) place:
	Select the Index, then Search/Replace ALL within it with this regex:
	S: \n<span class="pagenum">(.+?)</span>(.+?)</li>(\n+?)<li(.+?)</li>
	R: \n$2</li>$3<li$4<span class="pagenum">$1</span></li>

Check the results. If it didn't work and is not easily salvageable, use the multi-step procedure below instead, AFTER using Auto Generate (in other words, go back to the original Plain Text before you started doing any of the above). If it did work, and the Index contains "See"-type references, skip down to the section describing how to link them.


This is what Guiguts Auto Generates by default, and is what's used by Auto Index:

ul.index {list-style-type: none;}
li.ifrst {margin-top: 1em;}
li.indx  {margin-top: .5em;}
li.isub1 {text-indent: 1em;}
li.isub2 {text-indent: 2em;}
li.isub3 {text-indent: 3em;}

REGEXPS to format and link a multi-level HTML Index.

This has been tested with Guiguts, but should work with any editor that supports regular expressions:

PART ONE: Convert index into ul/li LIST notation

1.  A. bookmark start and end of index; for safety, do all S&R's within this range.
	B. precede <h2> with: <div class="p4 index">
	C. add class="p0" to <h2>
	D. change starting <p> to <ul>
	E. add a <br /> at the end of the last entry of the index;
	F. change ending </p> to </ul></div>
			Tidy should warn of missing <li> at the very beginning
			Validator will have a few errors
	G. indicate where letter changes occur:
		S: \n<br />\n<br />
		R: \n@
2.  move pagenums to the end of the line and replace 'span' with '~~~~';
		Pagenums still will be one line too soon; will adjust later.
		Must be done over the range of the Index.
		Still need to handle <p>-type pagenums.
	S: <span class="pagenum">(.+?)</span>(.*?)<br />
	R: $2<~~~~ class="pagenum">$1</abcd><br />
3. Remove unneeded <br />'s (note intentionally omitted '>' in several of these):
	S: <br />\n<br /
	R: <br /

4. sub-entries (note leading spaces for visual identification and uniqueness;
		second variable handles trailing pagenums; second S&R rarely needed):
	A. S: <span style="margin-left: 1em;">(.+?)</span>(.*?)<br /
	   R:   <li class="sec">$1$2</li
	B. S: <span style="margin-left: 2em;">(.+?)</span>(.*?)<br /
	   R:     <li class="tri">$1$2</li
5. Main entries:
	S: \n([^ ])([^\n]+?)<br /
	R: \n<li>$1$2</li

6. new letters (if letter headings, use the 'pix' css for "let"):
	S: @\n<li>
	R: \n<li class="let">
	S: @\n([A-Z])<br />\n<li>
	R: \n<li class="pix">$1</li>\n<li>
7. Move pagenums to end of next entry (to the correct page) and fix placeholders:
	S: <~~~~ class="pagenum"(.+?)</abcd>(.*?)</li>((.|\n)+?)</li>
	R: $2</li>$3<span class="pagenum"$1</span></li>

8-10. (reserved for future expansion)

PART TWO: Add Links

11. change ---- to double emdashes (not a regex, do for entire document):
    S: ----
	R: ——
12. Regex to find 4-digit numbers (e.g., 'years' in the index)
	S: , (\d{4})
	R: @@$1     (uses @@ as interfering placeholder; remove them afterwards)
13. Regex to link the index (after zapping years):
	THIS USUALLY MUST BE RUN MORE THAN ONCE; repeat till no more found;
	run within a selection (entire Index), can't do whole document:
	S: , (\d+)                             (BEGINS WITH COMMA THEN SPACE)
	R: , <a href="#Page_$1">$1</a>              (DITTO)

14. Ranges:
	S: (\d)</a>-(\d+?)(?!\d)
	R: $1–$2</a>
	S:  to (\d+?)<                                 ("to" style ranges)
	R:  to <a href="#Page_$1">$1</a><

	Some indices use a single quote rather than a double, so modify if necessary:
15.	S: ," (\d+)([^\d])					(extra pass for quote mark before page #)
	R: ," <a href="#Page_$1">$1</a>$2
		(or, with curly quotes:)
	S: ,” (\d+)([^\d])
	R: ,” <a href="#Page_$1">$1</a>$2

16. Restore years:
	S: @@
	R: ,     (one space)

17-20. (reserved for future expansion)


This is the css used by the procedure. It replaces what Guiguts auto-generates by default. Depending on the Index, not all of these definitions will be needed:

.index {margin-left: 5%;}
.index ul {padding-left: 0;}
.index li {list-style-type: none; padding-left: 3em; text-indent: -3em;}
.index li.sec {padding-left: 4em;}
.index li.let {padding-top: 1em;}

li.pix {margin-top: 2em; margin-bottom: 1em; font-weight: bold; text-indent: 0;  padding-left: 4em;}


21.	Simple method for simple references; may not always be applicable:
	1. use Notepad++ (or GG or any text editor) to find each "See"
	2. in GG, start at top of Index, look for the definition of each word and highlight it
	3. on HTML Markup menu, click "Named anchor" (keep the menu open, as you'll use it repeatedly)
	4. repeat steps 1-3 until all 'See' references have been found
	5. in GG, start at top of Index, look for each "See" with this no-case regex:
		S: (see) \b(.+?)\b
		R: $1 <a href="#$2">$2</a>
	6. if string doesn't contain ligatures, spaces, or punctuation, REPLACE & Search
	6A. else, highlight the string, and on Markup menu, click "Internal Link",
		and double-click the anchor.
	7. repeat until no more "see"; if anything was not found, add the anchor manually.
        8. run the local HTML Validator and Link Checker; correct any errors.
22. Dashes between years (entire document; for ancient history, use {1,} instead of {4}):
	S: (\d{4})-(\d{4})
	R: $1–$2

23. Simple fix to find unlinked numbers (if missing space, add it);
	this finds ranges but doesn't encompass them (do so manually):	
	S: ([ ,])(\d+?)([^\d])
	R: $1<a href="#Page_$2">$1</a>$3