PPTools/Ppgen/Tutorial/NbspNgram

From DPWiki
< PPTools‎ | Ppgen
Jump to navigation Jump to search

Non-breaking spaces look like any other space. The difference is that Ppgen will not put a line break where there is a non-breaking space. Let's see this in action. In a recent book, I noticed this in the generated text version:

Rfrank-06320511.png

Source Code

 “Say!” Maben turned on him in mock fierceness,
 “I’m of a mind to kick you for overstunting
 on that plane wing. No use being
 too risky—just plain foolishness, that. But,
 kid,” the aviator’s habitually tense face relaxed
 into a boyish grin. “I’ll say you made
 that come down O. K.,—all jake! An oldtimer
 couldn’t have done it prettier. Listen,
 I got a proposition I want to make you!”
 // 064.png

Look at that next-to-last line. The "K." is split from the "O." and that's certainly not what we want. The solution is to put a non-breaking space between the "O." and the "K.".

There are three ways to put a non-breaking space into the source code. One is to put an actual UTF-8 non-breaking space character. This works but I don't recommend it because it's not apparent that it's there. I'd rather see it in the source code to make sure it's there. Ppgen provides two forms: "\ " and "\_" to indicate a non-breaking space. (The second form is useful at the end of a line where many editors would strip the trailing space if the first form were used.) So I'll change the source to use "\ " and regenerate.

Rfrank-06320512.png

Source Code

 “Say!” Maben turned on him in mock fierceness,
 “I’m of a mind to kick you for overstunting
 on that plane wing. No use being
 too risky—just plain foolishness, that. But,
 kid,” the aviator’s habitually tense face relaxed
 into a boyish grin. “I’ll say you made
 that come down O.\ K.,—all jake! An oldtimer
 couldn’t have done it prettier. Listen,
 I got a proposition I want to make you!”
 // 064.png

Looking at that, the "O. K." is correct but now I notice that "oldtimer" doesn't look right. Might that be hyphenated? Google's Ngram viewer is helpful in this situation. For example, in olden-days, the word "tonight" was hyphenated. In modern spelling, it isn't. When did that change? Let's take a look. Red is "to-night," blue is "tonight," and the bottom axis is by decade:

Rfrank-06320513.png

This is using Google's Ngram Viewer. Let's try it on "oldtimer" and "old-timer".

Rfrank-06320514.png

Hmmm. This book was printed in 1930. Let's look at the original page image.

Rfrank-06320515.png

We see in the source it was ambiguous. It was correctly flagged as a maybe-hyphen with "old-*timer" by the proofers but this somehow got lost in post-processing (for the sake of this example). The PPer makes the final call, and seeing the ngram for "old-timer" in 1930 as well as finding it hyphenated in another place in this same book makes it an easy decision. It becomes "old-timer" and we're done.

Rfrank-06320516.png

Source Code

 “Say!” Maben turned on him in mock fierceness,
 “I’m of a mind to kick you for overstunting
 on that plane wing. No use being
 too risky—just plain foolishness, that. But,
 kid,” the aviator’s habitually tense face relaxed
 into a boyish grin. “I’ll say you made
 that come down O.\ K.,—all jake! An old-timer
 couldn’t have done it prettier. Listen,
 I got a proposition I want to make you!”
 // 064.png