UTF-16

From DPWiki
Jump to navigation Jump to search
Jargon Guides

In computing, UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points (characters) into a sequence of 16-bit words, called code units.

UTF-16 is the native internal representation of text in the Microsoft Windows NT/2000/XP/CE, Qualcomm BREW, and Symbian operating systems; the Java and .NET bytecode environments; Mac OS X's Cocoa and Core Foundation frameworks; and the Qt cross-platform graphical widget toolkit.

Older Windows NT systems (prior to Windows 2000) only support UCS-2. The Python language environment has used UCS-2 internally since version 2.1, although newer versions can use UCS-4 to store supplementary characters (instead of UTF-16).

Examples

code point character UTF-16 code value(s) glyph*
122 (hex 7A) small Z (Latin) 007A z
27700 (hex 6C34) water (Chinese) 6C34
119070 (hex 1D11E) musical G clef D834 DD1E 𝄞

Note - the last example is sometimes corrupt.