Guide to smallcaps

From DPWiki
Jump to: navigation, search
DP Official Documentation - Post-Processing and Post-Processing Verification
Content of this page is being reviewed. If you have questions, please contact one of the page editors (shown in the footer at the bottom of this page).

This article is aimed at post-processors who need to convert <sc> markup into a final form in an etext. During the rounds, refer to the proofreading or formatting guidelines as appropriate.

Project-level Checks

When post-processing your book, before splitting to HTML and plaintext, confirm that <sc> has been used (and closed) correctly. The best way to do this, unfortunately, is to look at each instance, using a search for <sc>.

Next, consider doing related checks, for reasons which at this moment must be all too obvious. For example, if you have some <sc>A.D.</sc>'s in your project, it's a great plan to search for just A.D. (case insensitive, so it picks up a.d. too.) Note that some common abbreviations may be used as either big or small letters. So, P.M. and p.m. may both appear quite validly in DP-era books. Our Aim Here is Match the Printer's Text!

Correct DP-internal formatting, fresh from the rounds, looks like this:

  • This is Small Caps
  • You cannot be serious about AARDVARKS!
  • formats as <sc>This is Small Caps</sc>
  • formats as You cannot be serious about <sc>AARDVARKS</sc>!

[This last one trips up many formatters. Even if it looks smaller than the font elsewhere on the page, when it's a heading in ALLCAPS that's all the same size, it does not get smallcap treatment. It's a font size change, which we don't mark, and the blank lines for the heading are the only markup needed to set it apart.]

You may also see the old-DP-style <sc>a.d.</sc> - this is plain wrong according to the formatting guidelines. Uppercase these before going any further.

The Plaintext Version

Decisions, Decisions

Most of the time, you will simply change Small Caps into ALL CAPS for the plain-text file. But sometimes this will not work.

  • If every other word in the book is small-capped, ALL CAPS will make it seem as if you are shouting at the reader.
  • If the text uses both Small Caps and ALL CAPS, you may want to distinguish them.
  • If abbreviations like "Sec" and "SEC" mean different things, you have to to distinguish them or risk losing meaning.

In situations like these, think of small caps as just another kind of text formatting, alongside italics and boldface. Pick a markup such as +, = or # that you're not using for anything else, and use it for small caps. For all small caps (as in A.D.), you will probably want to leave them as CAPITALS without further markup.

If you use characters like #This# or something similar to mark small caps, it's best to explain the meaning in a transcriber's note so that readers will know what it is.


Most of the time the following regular expression search and replace is all that you will need for small caps in the plaintext version. It UPPERCASES all smallcap-marked text (whether it started as upper or mixed upper and lower case) and removes the <sc> tags.

Search: <sc>((.|\n)+?)</sc>
Replace: \U$1

Note: The \U (capitalization) option is not available in all editors.

If you want to treat ALL SMALL CAPS differently from Mixed Case Small Caps, use the following regex to find only ALL CAPS small caps spans. This version simply removes the markup, leaving plain all caps:

Search: <sc>(\P{IsLower}+?\n?)</sc>
Replace: $1

After you've changed the all small caps, the only remaining <sc> tags should be for mixed case. If you want to insert some markers around it like =this= while keeping the mixed case, use this regex:

Search: <sc>((.|\n)+?)</sc>
Replace: =$1=

Once you've also dealt with <i>, <b> and <tb>, it's worth doing a quick search for < and > to make sure all the little varmints have been seen to. You don't want them messing with the precious bodily fluids of your plaintext!

The HTML Version

There are various options here. Pick One and Stick To It. If you have a good reason to change this, do—but make it an informed decision, and if you're not sure, standardise.

CSS and HTML Background Information

CSS provides a font-variant: small-caps; option for applying small caps. You can create a class for this with whatever name you choose, such as "smcap":

.smcap   { font-variant:small-caps; }

Any text marked with class="smcap" will then have the CSS small caps property, which causes lower-case letters to appear as shorter capital letters.

Another CSS option often used when handling small caps is text-transform, which allows you to display text as uppercase or lowercase while the underlying text may be different. For example:

.lowercase   { text-transform:lowercase; }

Often PPers use the <span> tag for applying small caps in the HTML, but this is not always necessary. If a small-capped phrase already has markup all around it, you can apply the "smcap" class directly to that tag, as in a heading:

<h2 class="smcap">Chapter VII</h2>
instead of: <h2><span class="smcap">Chapter VII</span></h2>

Usual Handling

Put both of these CSS classes into the CSS style section of your HTML file:

.smcap   { font-variant:small-caps; }
.lowercase   { text-transform:lowercase; }

then use both these regexes:

Search: <sc>(\P{IsLower}+?\n?)</sc>
Replace: <span class="smcap lowercase">$1</span>

followed by

Search: <sc>((.|\n)+?)</sc>
Replace: <span class="smcap">$1</span>

Both regexes apply the class="smcap", but the first one applies an additional class of "lowercase" to the <sc> words/phrases that didn't contain any lower-case letters. The "lowercase" style will switch A-Z into a-z on the screen, and then when the small-caps styling is applied it'll end up as lower-case small caps. The underlying text will still be written as all-capitals (and thus people who copy and paste the HTML text will get capital letters not lower case). This way you get true lowercase smallcaps without making it a smaller font (which is prone to resizing differently to mixed case Smallcaps. Ugh!) BOTH regexes are needed, because Mixed Case Smallcaps obviously will look odd if you use the "lowercase" class on them and make them, ALL SMALLER-SIZE.

Other Situations

Lower Case

If your all-small-cap phrases seem to be a form of emphasized lower case, you may want to convert the letters to actual lower case. For example, in a linguistics text if the phrase "To have" appears at the start of a sentence and "to have" in the middle, all of the letters except for the taller capital "T" make sense as lower case letters; the book could have shown it as To have and to have or the equivalent using italics or quotation marks. If you want to lower-case the letters inside <sc> markup, use this regex:

Search: <sc>(\P{IsLower}+?\n?)</sc>
Replace: <span class="smcap">\L$1\E</span>

Note: The \L (lower-case) option is not available in all editors.

The preceding regex is not recommended for text that semantically is capitalized, as in A.D.

The <small> Tag

If your all-small-capped phrases are generally abbreviations such as A.D. you may want to mark them with the HTML <small> tag or something similar. This would be done using:

Search: <sc>(\P{IsLower}+?\n?)</sc>
Replace: <small>$1</small>


Since the <span> tag has no meaning unless it applies some CSS, if the reader views the text with CSS turned off the <span> will have no effect. If the small caps in your book is a form of emphasis, then you may want to consider using the <em> (emphasis) tag instead to apply small-cap styling. Here is the CSS:

em.smcap {font-style: normal; font-weight: normal; font-variant: small-caps;}

In the HTML you would use the same markup as any of the options shown above, but with an <em> tag instead of a <span>.

If CSS is turned off, then the default meaning of <em> will apply (usually italics); with CSS turned on it will appear as (unitalicized) small caps. Thus, in either case the reader will see some form of emphasis.

To comment or request edits to this page, please contact lhamilton or hdmtrad.

Return to DP Official Documentation Menu