Previous fileTop of DocumentContentsIndex pageNext file
Apache C++ Standard Library User's Guide

40.5 Example 2: Defining a Multibyte Character Code Conversion (JIS <-> Unicode)

Let us consider the example of a state-dependent code conversion. As mentioned previously, this type of conversion would occur between JIS, which is a state-dependent multibyte encoding for Japanese characters, and Unicode, which is a wide-character encoding. As usual, we assume that the external device uses multibyte encoding, and the internal processing uses wide-character encoding.

Here is what you must do to implement and use a state-dependent code conversion facet:

  1. Define a new conversion state type if necessary.

  2. Define a new character traits type if necessary, or instantiate the character traits template with the new state type.

  3. Define the code conversion facet.

  4. Instantiate new stream types using the new character traits type.

  5. Imbue a file stream's buffer with a locale that carries the new code conversion facet.

These steps are explained in detail in the following sections.

40.5.1 Define a New Conversion State Type

While parsing or creating a sequence of multibytes in a state-dependent multibyte encoding, the code conversion facet has to maintain a conversion state. This state is by default of type mbstate_t, which is the implementation-dependent state type defined in <cwchar>. If this type does not suffice to keep track of the conversion state, you must provide your own conversion state type that satisfies the requirements of CopyConstructible.

40.5.2 Define a New Character Traits Type

The conversion state type is part of the character traits. Hence, with a new conversion state type, you need a new character traits type.

If you do not want to rely on a nonstandard and thus non-portable feature of the library, you must define a new character traits type and redefine the necessary types:

40.5.3 Define the Code Conversion Facet

Just as in the first example, you must define the actual code conversion facet. The steps are basically the same as before, too: define a new class template for the new code conversion type and specialize it. The code would look like this:

In this case, the member function do_encoding() has to return -1, which identifies the code conversion as state-dependent. Again, the member functions in() and out() must conform to the error indication policy explained under class codecvt in the Apache C++ Standard Library Reference Guide.

The distinguishing characteristic of a state-independent conversion is that the conversion state argument to in() and out() is used for communication between the file stream buffer and the code conversion facet. The file stream buffer is responsible for creating, maintaining, and deleting the conversion state. At the beginning, the file stream buffer creates a conversion state object that represents the initial conversion state and hands it over to the code conversion facet. The facet modifies it according to the conversion it performs. The file stream buffer receives it and stores it between two subsequent code conversions.

40.5.4 Use the New Code Conversion Facet

Here is an example of how the new code conversion facet can be used:

//1Our Unicode-JIS code conversion needs a conversion state type different from the default type std::mbstate_t. Since the conversion state type is contained in the character traits, we must create a new file type.
//2Here the stream buffer's locale is replaced by a copy of the global locale that has a Unicode-JIS code conversion facet.
//3The content of the JIS encoded file "/tmp/fil" is read, automatically converted to Unicode, and written to std::wcout.


Previous fileTop of DocumentContentsIndex pageNext file