Unicode

From DPWiki
Jump to: navigation, search
Jargon Guides

Unicode is a much bigger character set than ASCII and does so by extending the number range used to encode from 7 bits to 32 bits. Apart from ASCII and other characters from related alphabets, the repertoire contains also Asian or Arabic alphabets, including historical ones, as well as mathematical, technical and other symbols. In fact, the intention is to give every character a unique code point. The character code assignment is still growing and is maintained by the Unicode consortium.

The most common characters are all encoded with numbers below 216 = 65536 and the first 256 numbers correspond to ASCII with Latin-1 extensions. Text using the Unicode character set is usually saved or transmitted by using special character encodings (UTF-8 or UTF-16). These allow one to store/transmit any character from the Unicode character set while wasting very little space if most of the characters are from the ASCII subset.