UTF-16
In computing, UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points (characters) into a sequence of 16-bit words, called code units.
UTF-16 is the native internal representation of text in the Microsoft Windows NT/2000/XP/CE, Qualcomm BREW, and Symbian operating systems; the Java and .NET bytecode environments; Mac OS X's Cocoa and Core Foundation frameworks; and the Qt cross-platform graphical widget toolkit.
Older Windows NT systems (prior to Windows 2000) only support UCS-2. The Python language environment has used UCS-2 internally since version 2.1, although newer versions can use UCS-4 to store supplementary characters (instead of UTF-16).
Examples
code point | character | UTF-16 code value(s) | glyph* |
---|---|---|---|
122 (hex 7A) | small Z (Latin) | 007A | z |
27700 (hex 6C34) | water (Chinese) | 6C34 | 水 |
119070 (hex 1D11E) | musical G clef | D834 DD1E | 𝄞 |
Note - the last example is sometimes corrupt.