- The LONG DP Jargon Guide
- Forums jargon for DPers
- Wiki jargon for DPers
- Book Publishing jargon for DPers
- LOTE Jargon Guides
UTF-8 is a widely-used standardized method to encode Unicode characters as a sequence of bytes (or octets or numbers between 0 and 255, inclusive). One benefit of it is that the first 128 characters are encoded the same as ASCII encoding.
Byte values from 0 to 127, inclusive, represent the usual ASCII characters, byte values from 128 to 191, inclusive, are used to represent a block of 6 bits from a larger Unicode code number, byte values 192 and above are used as prefixes both determining how many 6 bit blocks follow and containing a couple of initial bits.
Incidentally, Latin-1 characters with Unicode numbers from 128 to 191, inclusive, are encoded as a byte with value 192 followed by (the code of) the character itself; Latin-1 characters from 192 to 255 are encoded as 193 followed by the character code minus 64.
For information about Post-Processing and UTF-8, please review the Post-Processing FAQ.