SmallCapsForEpub

From DPWiki
Jump to: navigation, search

Small Caps (aka Smallcaps or Small-caps or Small Capitals)

This page is begun 2018 Feb 6.

According to wikipedia,

“In typography, small capitals (usually abbreviated small caps) are lowercase characters typeset with glyphs that resemble uppercase letters ("capitals") but reduced in height and weight, close to the surrounding lowercase (small) letters or text figures.[1] Note that this is technically not a case-transformation, but a substitution of glyphs, although the effect is often simulated by case-transformation and scaling. Small caps are used in running text as a form of emphasis that is less dominant than all uppercase text, and as a method of emphasis or distinctiveness for text alongside or instead of italics, or when boldface is inappropriate. For example, the "Text in small caps" appears as Text in small caps in small caps. Small caps can be used to draw attention to the opening phrase or line of a new section of text, or to provide an additional style in a dictionary entry where many parts must be typographically differentiated.

“Well-designed small capitals are not simply scaled-down versions of normal capitals; they normally retain the same stroke weight as other letters and have a wider aspect ratio for readability.”

Unfortunately, well-designed small capitals are not supported in handheld media epub or mobi—nor, probably, in the html edition that you are reading on your own desktop or laptop computer. Even if the ebook reader software nominally supports the css attribute {font-variant: small-caps;}, the necessary fonts that contain the well-designed glyphs are not included on the device (AFAIK). Even more sadly, Adobe Digital Editions, and widely used software based upon it, does not support the css attribute {font-variant: small-caps;} at all.

Small caps are not often used semantically in books. Therefore, small caps could often be changed to something else—e.g. all capitals or title case or italics or bold &c.—or even eliminated. Nonetheless, sometimes we want to preserve the original small caps typography, and we want it to appear correctly in all venues including epub, not just the html. This essay provides a method for constructing typography that looks like small caps—unless you are a typography connoisseur—for the html, epub, and mobi editions.

DP Small Caps, Current Practice

The Guide_to_smallcaps covers current DP expectations for how post processors should handle small caps. There's a lot of good advice there, but quite a lot of it depends on the css attributes {font-variant:small-caps;} and {text-transform:uppercase;} and {text-transform:lowercase;}. None of these attributes work in Adobe Digital Editions ver.4.5.7.179634, and there's no good reason to predict that they will ever work well in epub, since small caps are not usually used semantically in books. The Formatting Guidelines cover expectations for how formatters should handle small caps during the F1, F2, and F3 rounds.

Small Caps Method for HTML, EPUB, and MOBI Editions

Unfortunately, the Formatting Guidelines are arguably self-contradictory. This leads to difficulty and error during the F1-F3 rounds, and therefore the post processor must examine each instance of small caps carefully, irrespective of the method proposed below. But my method makes this issue even more significant because the first step of my method requires breaking up all phrases of small caps into individual words.

The Formatting Guideline posts this example:

20180206-01.jpg

The first step of my method for the first example shown above would be to break

<sc>This is Small Caps</sc>

up into individual words, thus:

<sc>This</sc> <sc>is</sc> <sc>Small</sc> <sc>Caps</sc>;

but now “is” does not conform to the guidelines, and should be changed to

<sc>IS</sc>.

The next step is to install html elements appropriately, using the following css.

div,
span {
 margin: 0;
 padding: 0;
 text-indent: 0;
 text-align: center;
}
.nowrap,
.smcap {
 display: inline-block;
}
.smcap,
.smmaj {
 font-style: normal;
 text-transform: uppercase;
 letter-spacing: 0.05em;
}
b {
 font-weight: normal;
 margin: 0;
 padding: 0;
 text-indent: 0;
}
b,
.smmaj {
 font-size: 0.75em;
}

In this css, the attributes {text-transform: uppercase;} and {letter-spacing: 0.05em;} have no effect in the epub edition. The letter-spacing probably works only in the html edition. The text-transform is only a formality; it would make the small caps appear correctly in the html edition in case I forget to do the conversion manually during installation. The installed html for this example would be:

 <span class="smcap">T<b>HIS</b></span>
 <span class="smmaj">IS</span>
 <span class="smcap">S<b>MALL</b></span>
 <span class="smcap">C<b>APS</b></span>

A few comments are in order.

1. My use of <b> might be questioned, but it seems legitimate (see here, for instance). If you don't like <b> you can use <span> or <small> instead.
2. A phrase or string of small caps words must be split into individual words because of the inline-block attribute on span.smcap. The inline-block attribute is necessary to prevent inappropriate line-breaks inside words.
3. Reader apps enjoy placing line-breaks between </span> and adjacent punctuation on the right. Due to the inline-block specification on span.smcap, we can move punctuation inside the </span> to prevent such inappropriate line-breaking. Other punctuation, such as parentheses, can also be moved inside span.smcap. For span.smmaj, we need to add another layer of inline block (span.nowrap) to keep, for example, “(AARDVARKS).” all together. The html for that would be
 <span class="nowrap">(<span class="smmaj">AARDVARKS</span>).</span>
4. A known disadvantage of my code is that my small-caps words cannot be auto-hyphenated by browsers. (I stand corrected—in the example shown below, we see that Kindle Previewer can auto-hyphenate a word contained in an inline-block; who knew? This surprise does beg the question, however, whether my inline-blocks are really as effective against inappropriate line-breaks in mobi as I thought. Here is a short answer.) A paragraph full of small-caps text should possibly be left-aligned, not justified. Or else, if the small-caps is purely for show, then perhaps remove it.
5. Obviously, we could substitute <small> for <span.smmaj>, but GG autogenerate makes spans, so I've used them instead of converting them.
6. The specification 0.75em for the height of <span.smmaj> and <b> coded characters is approximate; the exact number needed will depend upon your personal preferences and the font you are using.
7. The css statement span{text-indent:0;} is critical, and sometimes people forget to include it in their standard css. I certainly forgot it while setting up this example.
8. The class name “.smmaj” is short for “small majuscule”.

Example

Here is an example of small cap markup coming from the rounds:

<sc>The Testing of Paper, Mechanical, Chemical, and Microscopical</sc>

In the text edition this goes to title case or all-caps, since it happens to be from a table of contents, and the small caps typography is purely decorative. For the html edition, my code would be

<div style="text-align: left;
          padding-left: 1.5em;
          text-indent: -1.5em;">
 <span class="smcap">T<b>HE</b></span>
 <span class="smcap">T<b>ESTING</b></span>
 <span class="smmaj">OF</span>
 <span class="smcap">P<b>APER</b>,</span>
 <span class="smcap">M<b>ECHANICAL</b>,</span>
 <span class="smcap">C<b>HEMICAL</b>,</span>
 <span class="smmaj">AND</span>
 <span class="smcap">M<b>ICROSCOPICAL</b></span></div>

The result produced by ebookmaker is:

20160206-02.jpg.

Regexes

Installation of this small cap code can be tedious. Regexes do help, but there are many ways to go astray, and considerable careful checking is needed, always. The following regexes are coded for Guiguts version 1.0.25.

BEFORE GENERATING HTML, IF POSSIBLE

 SPLIT WORDS
 S "<sc> *([^ \n<]+)([ \n])"
 R "<sc>$1</sc>$2<sc>" /*repeat ad nauseam*/

 INSTALL LINE-BREAKS TO MORE EASILY INSPECT <sc> MARKUP
 S " <sc"
 R "\n<sc"
 /*one pass should do it;
 NOTE you won't enjoy the result of this step if you 
 have much small caps inside tables*/

 NOW CHECK FOR SINGLE CHARACTERS OR INITIALS
 OR NUMBERS OR PUNCTUATION THAT SHOULD BE
 FULL-SIZED AND REMOVE OFFENDING <sc> MARKUP; ALSO TEST FOR
 ONE OR TWO CHARACTER STRINGS THAT SHOULD BE SMALL
 CAPITALS, AND CHANGE THOSE <sc> TO span.smmaj.
 S "<sc>(\d.+?|.\.?|..[.,;:]?|.\..\.|[\d\p{IsPunct}]+?)</sc>"
 R "$1" OR
 R "<span class="smmaj">\U$1\E</span>"
 /*check each instance carefully*/

 CHANGE NON-GUIDELINES-CONFORMING MARKUP TO CONFORMING
 S "<sc>(\p{IsLower}.+?)</sc>"
 R "<sc>\U$1\E</sc>"

If you failed to do the above steps prior to HTML generation, the regexes may be altered to work in the generated html, but it will much more cumbersome.

AFTER GENERATION OF HTML

 FINISH SMALL MAJUSCULE
 S "(<span class="smcap">)(\p{IsUpper}\p{IsPunct}?\p{IsUpper}[^<]*?)(</span>)"
 R "<span class="smmaj">\U$2\E$3"

 FINISH SMCAP
 S "(<span class="smcap">)(\p{IsUpper})([^<]+)(</span>)"
 R "$1$2<b>\U$3\E</b>$4" /
 *inspect, inspect; for example for something like
 <sc>Lincoln's-Inn</sc>, wherein both the L and the
 I are supposed to be full sized, this regex gets it 
 wrong.*/

The above two regexes grab inside the reduced height character range not only alphabetic characters, but also punctuation and anything else that might be on the right side of the markup. For example a "word" like "<sc>Aardvark).'</sc>" would be transformed to code "<span class="smcap">A<b>ARDVARK).'</b></span>" Whereas the following is probably better, and is hereby recommended: "<span class="smcap">A<b>ARDVARK</b>).'</span>" That's one reason the the following step is absolutely necessary. Do it.

 FINALLY, CHECK EACH AND ALL THE SMCAP AND SMMAJ.
 S ""smcap"|"smmaj""
 Fix Errors. And add .nowrap around adjacent punctuation, as in the "(<sc>AARDVARK</sc>)." case mentioned above.

20180207 15:12—I welcome and request constructive suggestions, comments, and questions—Richard.