diff options
Diffstat (limited to 'manual/charset.texi')
-rw-r--r-- | manual/charset.texi | 96 |
1 files changed, 32 insertions, 64 deletions
diff --git a/manual/charset.texi b/manual/charset.texi index 147d9c579a..1867ace485 100644 --- a/manual/charset.texi +++ b/manual/charset.texi @@ -98,9 +98,8 @@ designed to keep one character of a wide character string. To maintain the similarity there is also a type corresponding to @code{int} for those functions that take a single wide character. -@comment stddef.h -@comment ISO @deftp {Data type} wchar_t +@standards{ISO, stddef.h} This data type is used as the base type for wide character strings. In other words, arrays of objects of this type are the equivalent of @code{char[]} for multibyte character strings. The type is defined in @@ -123,9 +122,8 @@ resorting to multi-wide-character encoding contradicts the purpose of the @code{wchar_t} type. @end deftp -@comment wchar.h -@comment ISO @deftp {Data type} wint_t +@standards{ISO, wchar.h} @code{wint_t} is a data type used for parameters and variables that contain a single wide character. As the name suggests this type is the equivalent of @code{int} when using the normal @code{char} strings. The @@ -143,18 +141,16 @@ As there are for the @code{char} data type macros are available for specifying the minimum and maximum value representable in an object of type @code{wchar_t}. -@comment wchar.h -@comment ISO @deftypevr Macro wint_t WCHAR_MIN +@standards{ISO, wchar.h} The macro @code{WCHAR_MIN} evaluates to the minimum value representable by an object of type @code{wint_t}. This macro was introduced in @w{Amendment 1} to @w{ISO C90}. @end deftypevr -@comment wchar.h -@comment ISO @deftypevr Macro wint_t WCHAR_MAX +@standards{ISO, wchar.h} The macro @code{WCHAR_MAX} evaluates to the maximum value representable by an object of type @code{wint_t}. @@ -163,9 +159,8 @@ This macro was introduced in @w{Amendment 1} to @w{ISO C90}. Another special wide character value is the equivalent to @code{EOF}. -@comment wchar.h -@comment ISO @deftypevr Macro wint_t WEOF +@standards{ISO, wchar.h} The macro @code{WEOF} evaluates to a constant expression of type @code{wint_t} whose value is different from any member of the extended character set. @@ -402,18 +397,16 @@ conversion functions (as shown in the examples below). The @w{ISO C} standard defines two macros that provide this information. -@comment limits.h -@comment ISO @deftypevr Macro int MB_LEN_MAX +@standards{ISO, limits.h} @code{MB_LEN_MAX} specifies the maximum number of bytes in the multibyte sequence for a single character in any of the supported locales. It is a compile-time constant and is defined in @file{limits.h}. @pindex limits.h @end deftypevr -@comment stdlib.h -@comment ISO @deftypevr Macro int MB_CUR_MAX +@standards{ISO, stdlib.h} @code{MB_CUR_MAX} expands into a positive integer expression that is the maximum number of bytes in a multibyte character in the current locale. The value is never greater than @code{MB_LEN_MAX}. Unlike @@ -463,9 +456,8 @@ Since the conversion functions allow converting a text in more than one step we must have a way to pass this information from one call of the functions to another. -@comment wchar.h -@comment ISO @deftp {Data type} mbstate_t +@standards{ISO, wchar.h} @cindex shift state A variable of type @code{mbstate_t} can contain all the information about the @dfn{shift state} needed from one call to a conversion @@ -501,9 +493,8 @@ state. This is necessary, for example, to decide whether to emit escape sequences to set the state to the initial state at certain sequence points. Communication protocols often require this. -@comment wchar.h -@comment ISO @deftypefun int mbsinit (const mbstate_t *@var{ps}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} @c ps is dereferenced once, unguarded. This would call for @mtsrace:ps, @c but since a single word-sized field is (atomically) accessed, any @@ -564,9 +555,8 @@ of the multibyte character set. In such a scenario, each ASCII character stands for itself, and all other characters have at least a first byte that is beyond the range @math{0} to @math{127}. -@comment wchar.h -@comment ISO @deftypefun wint_t btowc (int @var{c}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} @c Calls btowc_fct or __fct; reads from locale, and from the @c get_gconv_fcts result multiple times. get_gconv_fcts calls @@ -628,9 +618,8 @@ this, using @code{btowc} is required. @noindent There is also a function for the conversion in the other direction. -@comment wchar.h -@comment ISO @deftypefun int wctob (wint_t @var{c}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{wctob} function (``wide character to byte'') takes as the parameter a valid wide character. If the multibyte representation for @@ -648,9 +637,8 @@ multibyte representation to wide characters and vice versa. These functions pose no limit on the length of the multibyte representation and they also do not require it to be in the initial state. -@comment wchar.h -@comment ISO @deftypefun size_t mbrtowc (wchar_t *restrict @var{pwc}, const char *restrict @var{s}, size_t @var{n}, mbstate_t *restrict @var{ps}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:mbrtowc/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} @cindex stateful The @code{mbrtowc} function (``multibyte restartable to wide @@ -743,9 +731,8 @@ away. Unfortunately there is no function to compute the length of the wide character string directly from the multibyte string. There is, however, a function that does part of the work. -@comment wchar.h -@comment ISO @deftypefun size_t mbrlen (const char *restrict @var{s}, size_t @var{n}, mbstate_t *@var{ps}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:mbrlen/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{mbrlen} function (``multibyte restartable length'') computes the number of at most @var{n} bytes starting at @var{s}, which form the @@ -827,9 +814,8 @@ this conversion might be quite expensive. So it is necessary to think about the consequences of using the easier but imprecise method before doing the work twice. -@comment wchar.h -@comment ISO @deftypefun size_t wcrtomb (char *restrict @var{s}, wchar_t @var{wc}, mbstate_t *restrict @var{ps}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:wcrtomb/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} @c wcrtomb uses a static, non-thread-local unguarded state variable when @c PS is NULL. When a state is passed in, and it's not used @@ -1015,9 +1001,8 @@ defines conversions on entire strings. However, the defined set of functions is quite limited; therefore, @theglibc{} contains a few extensions that can help in some important situations. -@comment wchar.h -@comment ISO @deftypefun size_t mbsrtowcs (wchar_t *restrict @var{dst}, const char **restrict @var{src}, size_t @var{len}, mbstate_t *restrict @var{ps}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:mbsrtowcs/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{mbsrtowcs} function (``multibyte string restartable to wide character string'') converts the NUL-terminated multibyte character @@ -1100,9 +1085,8 @@ consumed from the input string. This way the problem of @code{mbsrtowcs}'s example above could be solved by determining the line length and passing this length to the function. -@comment wchar.h -@comment ISO @deftypefun size_t wcsrtombs (char *restrict @var{dst}, const wchar_t **restrict @var{src}, size_t @var{len}, mbstate_t *restrict @var{ps}) +@standards{ISO, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:wcsrtombs/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{wcsrtombs} function (``wide character string restartable to multibyte string'') converts the NUL-terminated wide character string at @@ -1146,9 +1130,8 @@ input characters. One has to place the NUL wide character at the correct place or control the consumed input indirectly via the available output array size (the @var{len} parameter). -@comment wchar.h -@comment GNU @deftypefun size_t mbsnrtowcs (wchar_t *restrict @var{dst}, const char **restrict @var{src}, size_t @var{nmc}, size_t @var{len}, mbstate_t *restrict @var{ps}) +@standards{GNU, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:mbsnrtowcs/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{mbsnrtowcs} function is very similar to the @code{mbsrtowcs} function. All the parameters are the same except for @var{nmc}, which is @@ -1199,9 +1182,8 @@ Since we don't insert characters in the strings that were not in there right from the beginning and we use @var{state} only for the conversion of the given buffer, there is no problem with altering the state. -@comment wchar.h -@comment GNU @deftypefun size_t wcsnrtombs (char *restrict @var{dst}, const wchar_t **restrict @var{src}, size_t @var{nwc}, size_t @var{len}, mbstate_t *restrict @var{ps}) +@standards{GNU, wchar.h} @safety{@prelim{}@mtunsafe{@mtasurace{:wcsnrtombs/!ps}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{wcsnrtombs} function implements the conversion from wide character strings to multibyte character strings. It is similar to @@ -1344,9 +1326,8 @@ conversion functions.} @node Non-reentrant Character Conversion @subsection Non-reentrant Conversion of Single Characters -@comment stdlib.h -@comment ISO @deftypefun int mbtowc (wchar_t *restrict @var{result}, const char *restrict @var{string}, size_t @var{size}) +@standards{ISO, stdlib.h} @safety{@prelim{}@mtunsafe{@mtasurace{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{mbtowc} (``multibyte to wide character'') function when called with non-null @var{string} converts the first multibyte character @@ -1379,9 +1360,8 @@ returns nonzero if the multibyte character code in use actually has a shift state. @xref{Shift State}. @end deftypefun -@comment stdlib.h -@comment ISO @deftypefun int wctomb (char *@var{string}, wchar_t @var{wchar}) +@standards{ISO, stdlib.h} @safety{@prelim{}@mtunsafe{@mtasurace{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{wctomb} (``wide character to multibyte'') function converts the wide character code @var{wchar} to its corresponding multibyte @@ -1419,9 +1399,8 @@ Similar to @code{mbrlen} there is also a non-reentrant function that computes the length of a multibyte character. It can be defined in terms of @code{mbtowc}. -@comment stdlib.h -@comment ISO @deftypefun int mblen (const char *@var{string}, size_t @var{size}) +@standards{ISO, stdlib.h} @safety{@prelim{}@mtunsafe{@mtasurace{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{mblen} function with a non-null @var{string} argument returns the number of bytes that make up the multibyte character beginning at @@ -1458,9 +1437,8 @@ convert entire strings instead of single characters. These functions suffer from the same problems as their reentrant counterparts from @w{Amendment 1} to @w{ISO C90}; see @ref{Converting Strings}. -@comment stdlib.h -@comment ISO @deftypefun size_t mbstowcs (wchar_t *@var{wstring}, const char *@var{string}, size_t @var{size}) +@standards{ISO, stdlib.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} @c Odd... Although this was supposed to be non-reentrant, the internal @c state is not a static buffer, but an automatic variable. @@ -1501,9 +1479,8 @@ mbstowcs_alloc (const char *string) @end deftypefun -@comment stdlib.h -@comment ISO @deftypefun size_t wcstombs (char *@var{string}, const wchar_t *@var{wstring}, size_t @var{size}) +@standards{ISO, stdlib.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} The @code{wcstombs} (``wide character string to multibyte string'') function converts the null-terminated wide character array @var{wstring} @@ -1674,9 +1651,8 @@ data type. Just like other open--use--close interfaces the functions introduced here work using handles and the @file{iconv.h} header defines a special type for the handles used. -@comment iconv.h -@comment XPG2 @deftp {Data Type} iconv_t +@standards{XPG2, iconv.h} This data type is an abstract type defined in @file{iconv.h}. The user must not assume anything about the definition of this type; it must be completely opaque. @@ -1689,9 +1665,8 @@ the conversions for which the handles stand for have to. @noindent The first step is the function to create a handle. -@comment iconv.h -@comment XPG2 @deftypefun iconv_t iconv_open (const char *@var{tocode}, const char *@var{fromcode}) +@standards{XPG2, iconv.h} @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} @c Calls malloc if tocode and/or fromcode are too big for alloca. Calls @c strip and upstr on both, then gconv_open. strip and upstr call @@ -1763,9 +1738,8 @@ the handle returned by @code{iconv_open}. Therefore, it is crucial to free all the resources once all conversions are carried out and the conversion is not needed anymore. -@comment iconv.h -@comment XPG2 @deftypefun int iconv_close (iconv_t @var{cd}) +@standards{XPG2, iconv.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{}}} @c Calls gconv_close to destruct and release each of the conversion @c steps, release the gconv_t object, then call gconv_close_transform. @@ -1795,9 +1769,8 @@ therefore, the most general interface: it allows conversion from one buffer to another. Conversion from a file to a buffer, vice versa, or even file to file can be implemented on top of it. -@comment iconv.h -@comment XPG2 @deftypefun size_t iconv (iconv_t @var{cd}, char **@var{inbuf}, size_t *@var{inbytesleft}, char **@var{outbuf}, size_t *@var{outbytesleft}) +@standards{XPG2, iconv.h} @safety{@prelim{}@mtsafe{@mtsrace{:cd}}@assafe{}@acunsafe{@acucorrupt{}}} @c Without guarding access to the iconv_t object pointed to by cd, call @c the conversion function to convert inbuf or flush the internal @@ -2356,9 +2329,8 @@ conversion and the second describes the state etc. There are really two type definitions like this in @file{gconv.h}. @pindex gconv.h -@comment gconv.h -@comment GNU @deftp {Data type} {struct __gconv_step} +@standards{GNU, gconv.h} This data structure describes one conversion a module can perform. For each function in a loaded module with conversion functions there is exactly one object of this type. This object is shared by all users of @@ -2424,9 +2396,8 @@ conversion function. @end table @end deftp -@comment gconv.h -@comment GNU @deftp {Data type} {struct __gconv_step_data} +@standards{GNU, gconv.h} This is the data structure that contains the information specific to each use of the conversion functions. @@ -2557,9 +2528,8 @@ this use of the conversion functions. There are three data types defined for the three module interface functions and these define the interface. -@comment gconv.h -@comment GNU @deftypevr {Data type} int {(*__gconv_init_fct)} (struct __gconv_step *) +@standards{GNU, gconv.h} This specifies the interface of the initialization function of the module. It is called exactly once for each conversion the module implements. @@ -2714,9 +2684,8 @@ The function called before the module is unloaded is significantly easier. It often has nothing at all to do; in which case it can be left out completely. -@comment gconv.h -@comment GNU @deftypevr {Data type} void {(*__gconv_end_fct)} (struct gconv_step *) +@standards{GNU, gconv.h} The task of this function is to free all resources allocated in the initialization function. Therefore only the @code{__data} element of the object pointed to by the argument is of interest. Continuing the @@ -2737,9 +2706,8 @@ get quite complicated for complex character sets. But since this is not of interest here, we will only describe a possible skeleton for the conversion function. -@comment gconv.h -@comment GNU @deftypevr {Data type} int {(*__gconv_fct)} (struct __gconv_step *, struct __gconv_step_data *, const char **, const char *, size_t *, int) +@standards{GNU, gconv.h} The conversion function can be called for two basic reasons: to convert text or to reset the state. From the description of the @code{iconv} function it can be seen why the flushing mode is necessary. What mode |