Byte-order mark

From DPWiki
Jump to navigation Jump to search

A byte-order mark (BOM) is a special Unicode character. See the Wikipedia article for more details.

The BOM should be removed from UTF-8 plain text or HTML files before uploading them for Smooth Reading, PPV or direct uploading to Project Gutenberg. (Submitters use different toolsets and it's not always easy to know whether a BOM is included or to remove it if is. Therefore, don't panic if you are not sure: there is automation in place at Project Gutenberg to ensure errant BOMs are not included in the final release.)

Removing the BOM

How the BOM is removed depends on your computer's operating system.


Windows OS

Notepad

  1. Backup the file.
  2. Open Windows Notepad.
  3. Open the file but with manually specifying ANSI encoding: File -> Open -> Choose filename ("Encoding" bar will jump to UTF-8 automatically) -> Change Encoding to ANSI -> Click Open.
  4. Delete the first three characters of the first line in the file (should be ?»?)
  5. Save the file.

Notepad++

Under the Encoding menu, you can check the current character encoding of your file. If "Encode in UTF-8" is marked, then the BOM is present. To remove it, under the Encoding menu, select Convert to UTF-8 without BOM. If you check the encoding again, it now should indicate "Encode in UTF-8 without BOM".

Mac OS

Look through your text editor's preferences. If you find an option for BOM, turn it off by default. It is much easier to leave it out in the first place than to search and destroy it each time.

There are several text editors that make it easy to save without the BOM.

TextWrangler and BBEdit

Make sure that the Text Status Display tab in Preferences is set to show the status bar, and that at least the Text encoding checkbox is checked. Then you can set the character encoding using the pull-down menu at the bottom of any text window. The choice you want is Unicode (UTF-8).

There may be minor differences, but TextWrangler and BBEdit appear to be pretty close to the same for this preference.

SubEthaEdit

In Preferences, go to the Edit tab and use the "mode" popup to select Default (meaning .txt). At the very bottom of the window, make sure the button for "Save UTF-8 encoded files with BOM" is UNclicked. Now use the popup to select HTML mode. If you've never changed the settings, it will say "use default" and everything will be gray. If you do need to use different settings for HTML than for .txt, make sure the BOM box is again UNclicked. Otherwise you don't need to do anything.

Linux OS

Open file in Vim or Gvim. Press escape to enter command mode. Type ':set nobomb'. Save the file.