Categories of Code Conversions

Apache C++ Standard Library User's Guide

40.2 Categories of Code Conversions

Code conversions fall into various categories depending on the properties of the character encodings involved. There are:

Constant-size conversions
Multibyte conversions, which again fall into the categories of:

State-independent conversions
State-dependent conversions

Constant-size conversions are between character encodings where all characters are of equal size. All single- or wide-character encodings are examples of such character encodings. Each single character stands for itself and can be recognized and translated independently of its context. Conversions between ASCII and EBCDIC, or Unicode and ISO10646, are examples of constant-size conversions.

Multibyte conversions involve multibyte encodings. In multibyte encodings, characters have varying size. Some mulitbyte characters consist of two or more bytes, while others are represented by just one byte.

There is a substantial difference between code conversions involving state-dependent character encodings, and conversions between state-independent encodings. (Again, see Section 23.3.)

State-dependent multibyte conversions involve one character encoding that is state-dependent. In state-dependent character encodings, character sequences can have different meanings depending on the current context. State-dependent encodings typically have modes and escape sequences that allow switching between modes. An example of a state-dependent character conversion is the conversion between the state-dependent JIS encoding for Japanese characters and the Unicode wide-character encoding.

State-independent multibyte conversions do not have modes. A sequence of characters can always be interpreted independently of its context. An example of a state-independent mulitbyte conversion is the conversion between EUC, which a state-independent multibyte encoding, and Unicode.