summary refs log tree commit diff
path: root/manual/locale.texi
blob: d2d7557ea9bb6b9a55d7f55f99dd31cb024ac213 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
@node Locales, Searching and Sorting, Extended Characters, Top
@chapter Locales and Internationalization

Different countries and cultures have varying conventions for how to
communicate.  These conventions range from very simple ones, such as the
format for representing dates and times, to very complex ones, such as
the language spoken.

@cindex internationalization
@cindex locales
@dfn{Internationalization} of software means programming it to be able
to adapt to the user's favorite conventions.  In ANSI C,
internationalization works by means of @dfn{locales}.  Each locale
specifies a collection of conventions, one convention for each purpose.
The user chooses a set of conventions by specifying a locale (via
environment variables).

All programs inherit the chosen locale as part of their environment.
Provided the programs are written to obey the choice of locale, they
will follow the conventions preferred by the user.

@menu
* Effects of Locale::           Actions affected by the choice of
                                 locale. 
* Choosing Locale::             How the user specifies a locale.
* Locale Categories::           Different purposes for which you can
                                 select a locale. 
* Setting the Locale::          How a program specifies the locale
                                 with library functions. 
* Standard Locales::            Locale names available on all systems.
* Numeric Formatting::          How to format numbers according to the
                                 chosen locale. 
@end menu

@node Effects of Locale, Choosing Locale,  , Locales
@section What Effects a Locale Has

Each locale specifies conventions for several purposes, including the
following:

@itemize @bullet
@item
What multibyte character sequences are valid, and how they are
interpreted (@pxref{Extended Characters}).

@item
Classification of which characters in the local character set are
considered alphabetic, and upper- and lower-case conversion conventions
(@pxref{Character Handling}).

@item
The collating sequence for the local language and character set
(@pxref{Collation Functions}).

@item
Formatting of numbers and currency amounts (@pxref{Numeric Formatting}).

@item
Formatting of dates and times (@pxref{Formatting Date and Time}).

@item
What language to use for output, including error messages.
(The C library doesn't yet help you implement this.)

@item
What language to use for user answers to yes-or-no questions.

@item
What language to use for more complex user input.
(The C library doesn't yet help you implement this.)
@end itemize

Some aspects of adapting to the specified locale are handled
automatically by the library subroutines.  For example, all your program
needs to do in order to use the collating sequence of the chosen locale
is to use @code{strcoll} or @code{strxfrm} to compare strings.

Other aspects of locales are beyond the comprehension of the library.
For example, the library can't automatically translate your program's
output messages into other languages.  The only way you can support
output in the user's favorite language is to program this more or less
by hand.  (Eventually, we hope to provide facilities to make this
easier.)

This chapter discusses the mechanism by which you can modify the current
locale.  The effects of the current locale on specific library functions
are discussed in more detail in the descriptions of those functions.

@node Choosing Locale, Locale Categories, Effects of Locale, Locales
@section Choosing a Locale

The simplest way for the user to choose a locale is to set the
environment variable @code{LANG}.  This specifies a single locale to use
for all purposes.  For example, a user could specify a hypothetical
locale named @samp{espana-castellano} to use the standard conventions of
most of Spain.

The set of locales supported depends on the operating system you are
using, and so do their names.  We can't make any promises about what
locales will exist, except for one standard locale called @samp{C} or
@samp{POSIX}.

@cindex combining locales
A user also has the option of specifying different locales for different
purposes---in effect, choosing a mixture of multiple locales.

For example, the user might specify the locale @samp{espana-castellano}
for most purposes, but specify the locale @samp{usa-english} for
currency formatting.  This might make sense if the user is a
Spanish-speaking American, working in Spanish, but representing monetary
amounts in US dollars.

Note that both locales @samp{espana-castellano} and @samp{usa-english},
like all locales, would include conventions for all of the purposes to
which locales apply.  However, the user can choose to use each locale
for a particular subset of those purposes.

@node Locale Categories, Setting the Locale, Choosing Locale, Locales
@section Categories of Activities that Locales Affect
@cindex categories for locales
@cindex locale categories

The purposes that locales serve are grouped into @dfn{categories}, so
that a user or a program can choose the locale for each category
independently.  Here is a table of categories; each name is both an
environment variable that a user can set, and a macro name that you can
use as an argument to @code{setlocale}.

@table @code
@comment locale.h
@comment ANSI
@item LC_COLLATE
@vindex LC_COLLATE
This category applies to collation of strings (functions @code{strcoll}
and @code{strxfrm}); see @ref{Collation Functions}.

@comment locale.h
@comment ANSI
@item LC_CTYPE
@vindex LC_CTYPE
This category applies to classification and conversion of characters,
and to multibyte and wide characters;
see @ref{Character Handling} and @ref{Extended Characters}.

@comment locale.h
@comment ANSI
@item LC_MONETARY
@vindex LC_MONETARY
This category applies to formatting monetary values; see @ref{Numeric
Formatting}.

@comment locale.h
@comment ANSI
@item LC_NUMERIC
@vindex LC_NUMERIC
This category applies to formatting numeric values that are not
monetary; see @ref{Numeric Formatting}.

@comment locale.h
@comment ANSI
@item LC_TIME
@vindex LC_TIME
This category applies to formatting date and time values; see
@ref{Formatting Date and Time}.

@ignore   This is apparently a feature that was in some early
draft of the POSIX.2 standard, but it's not listed in draft 11.  Do we
still support this anyway?  Is there a corresponding environment
variable?

@comment locale.h
@comment GNU
@item LC_RESPONSE
@vindex LC_RESPONSE
This category applies to recognizing ``yes'' or ``no'' responses to
questions.
@end ignore

@comment locale.h
@comment ANSI
@item LC_ALL
@vindex LC_ALL
This is not an environment variable; it is only a macro that you can use
with @code{setlocale} to set a single locale for all purposes.

@comment locale.h
@comment ANSI
@item LANG
@vindex LANG
If this environment variable is defined, its value specifies the locale
to use for all purposes except as overridden by the variables above.
@end table

@node Setting the Locale, Standard Locales, Locale Categories, Locales
@section How Programs Set the Locale

A C program inherits its locale environment variables when it starts up.
This happens automatically.  However, these variables do not
automatically control the locale used by the library functions, because
ANSI C says that all programs start by default in the standard @samp{C}
locale.  To use the locales specified by the environment, you must call
@code{setlocale}.  Call it as follows:

@smallexample
setlocale (LC_ALL, "");
@end smallexample

@noindent
to select a locale based on the appropriate environment variables.

@cindex changing the locale
@cindex locale, changing
You can also use @code{setlocale} to specify a particular locale, for
general use or for a specific category.

@pindex locale.h
The symbols in this section are defined in the header file @file{locale.h}.

@comment locale.h
@comment ANSI
@deftypefun {char *} setlocale (int @var{category}, const char *@var{locale})
The function @code{setlocale} sets the current locale for 
category @var{category} to @var{locale}.

If @var{category} is @code{LC_ALL}, this specifies the locale for all
purposes.  The other possible values of @var{category} specify an
individual purpose (@pxref{Locale Categories}).

You can also use this function to find out the current locale by passing
a null pointer as the @var{locale} argument.  In this case,
@code{setlocale} returns a string that is the name of the locale
currently selected for category @var{category}.

The string returned by @code{setlocale} can be overwritten by subsequent
calls, so you should make a copy of the string (@pxref{Copying and
Concatenation}) if you want to save it past any further calls to
@code{setlocale}.  (The standard library is guaranteed never to call
@code{setlocale} itself.)

You should not modify the string returned by @code{setlocale}.
It might be the same string that was passed as an argument in a 
previous call to @code{setlocale}.

When you read the current locale for category @code{LC_ALL}, the value
encodes the entire combination of selected locales for all categories.
In this case, the value is not just a single locale name.  In fact, we
don't make any promises about what it looks like.  But if you specify
the same ``locale name'' with @code{LC_ALL} in a subsequent call to
@code{setlocale}, it restores the same combination of locale selections.

When the @var{locale} argument is not a null pointer, the string returned
by @code{setlocale} reflects the newly modified locale.

If you specify an empty string for @var{locale}, this means to read the
appropriate environment variable and use its value to select the locale
for @var{category}.

If you specify an invalid locale name, @code{setlocale} returns a null
pointer and leaves the current locale unchanged.
@end deftypefun

Here is an example showing how you might use @code{setlocale} to
temporarily switch to a new locale.

@smallexample
#include <stddef.h>
#include <locale.h>
#include <stdlib.h>
#include <string.h>

void
with_other_locale (char *new_locale,
                   void (*subroutine) (int),
                   int argument)
@{
  char *old_locale, *saved_locale;

  /* @r{Get the name of the current locale.}  */
  old_locale = setlocale (LC_ALL, NULL);
  
  /* @r{Copy the name so it won't be clobbered by @code{setlocale}.} */
  saved_locale = strdup (old_locale);
  if (old_locale == NULL)
    fatal ("Out of memory");
  
  /* @r{Now change the locale and do some stuff with it.} */
  setlocale (LC_ALL, new_locale);
  (*subroutine) (argument);
  
  /* @r{Restore the original locale.} */
  setlocale (LC_ALL, saved_locale);
  free (saved_locale);
@}
@end smallexample

@strong{Portability Note:} Some ANSI C systems may define additional
locale categories.  For portability, assume that any symbol beginning
with @samp{LC_} might be defined in @file{locale.h}.

@node Standard Locales, Numeric Formatting, Setting the Locale, Locales
@section Standard Locales

The only locale names you can count on finding on all operating systems
are these three standard ones:

@table @code
@item "C"
This is the standard C locale.  The attributes and behavior it provides
are specified in the ANSI C standard.  When your program starts up, it
initially uses this locale by default.

@item "POSIX"
This is the standard POSIX locale.  Currently, it is an alias for the
standard C locale.

@item ""
The empty name says to select a locale based on environment variables.
@xref{Locale Categories}.
@end table

Defining and installing named locales is normally a responsibility of
the system administrator at your site (or the person who installed the
GNU C library).  Some systems may allow users to create locales, but
we don't discuss that here.
@c ??? If we give the GNU system that capability, this place will have
@c ??? to be changed.

If your program needs to use something other than the @samp{C} locale,
it will be more portable if you use whatever locale the user specifies
with the environment, rather than trying to specify some non-standard
locale explicitly by name.  Remember, different machines might have
different sets of locales installed.

@node Numeric Formatting,  , Standard Locales, Locales
@section Numeric Formatting

When you want to format a number or a currency amount using the
conventions of the current locale, you can use the function
@code{localeconv} to get the data on how to do it.  The function
@code{localeconv} is declared in the header file @file{locale.h}.
@pindex locale.h
@cindex monetary value formatting
@cindex numeric value formatting

@comment locale.h
@comment ANSI
@deftypefun {struct lconv *} localeconv (void)
The @code{localeconv} function returns a pointer to a structure whose
components contain information about how numeric and monetary values
should be formatted in the current locale.

You shouldn't modify the structure or its contents.  The structure might
be overwritten by subsequent calls to @code{localeconv}, or by calls to
@code{setlocale}, but no other function in the library overwrites this
value.
@end deftypefun

@comment locale.h
@comment ANSI
@deftp {Data Type} {struct lconv}
This is the data type of the value returned by @code{localeconv}.
@end deftp

If a member of the structure @code{struct lconv} has type @code{char},
and the value is @code{CHAR_MAX}, it means that the current locale has
no value for that parameter.

@menu
* General Numeric::             Parameters for formatting numbers and
                                 currency amounts.
* Currency Symbol::             How to print the symbol that identifies an
                                 amount of money (e.g. @samp{$}).
* Sign of Money Amount::        How to print the (positive or negative) sign
                                 for a monetary amount, if one exists.
@end menu

@node General Numeric, Currency Symbol,  , Numeric Formatting
@subsection Generic Numeric Formatting Parameters

These are the standard members of @code{struct lconv}; there may be
others.

@table @code
@item char *decimal_point
@itemx char *mon_decimal_point
These are the decimal-point separators used in formatting non-monetary
and monetary quantities, respectively.  In the @samp{C} locale, the
value of @code{decimal_point} is @code{"."}, and the value of
@code{mon_decimal_point} is @code{""}.
@cindex decimal-point separator

@item char *thousands_sep
@itemx char *mon_thousands_sep
These are the separators used to delimit groups of digits to the left of
the decimal point in formatting non-monetary and monetary quantities,
respectively.  In the @samp{C} locale, both members have a value of
@code{""} (the empty string).

@item char *grouping
@itemx char *mon_grouping
These are strings that specify how to group the digits to the left of
the decimal point.  @code{grouping} applies to non-monetary quantities
and @code{mon_grouping} applies to monetary quantities.  Use either
@code{thousands_sep} or @code{mon_thousands_sep} to separate the digit
groups.
@cindex grouping of digits

Each string is made up of decimal numbers separated by semicolons.
Successive numbers (from left to right) give the sizes of successive
groups (from right to left, starting at the decimal point).  The last
number in the string is used over and over for all the remaining groups.

If the last integer is @code{-1}, it means that there is no more
grouping---or, put another way, any remaining digits form one large
group without separators.

For example, if @code{grouping} is @code{"4;3;2"}, the correct grouping
for the number @code{123456787654321} is @samp{12}, @samp{34},
@samp{56}, @samp{78}, @samp{765}, @samp{4321}.  This uses a group of 4
digits at the end, preceded by a group of 3 digits, preceded by groups
of 2 digits (as many as needed).  With a separator of @samp{,}, the
number would be printed as @samp{12,34,56,78,765,4321}.

A value of @code{"3"} indicates repeated groups of three digits, as
normally used in the U.S.

In the standard @samp{C} locale, both @code{grouping} and
@code{mon_grouping} have a value of @code{""}.  This value specifies no
grouping at all.

@item char int_frac_digits
@itemx char frac_digits
These are small integers indicating how many fractional digits (to the
right of the decimal point) should be displayed in a monetary value in
international and local formats, respectively.  (Most often, both
members have the same value.)

In the standard @samp{C} locale, both of these members have the value
@code{CHAR_MAX}, meaning ``unspecified''.  The ANSI standard doesn't say
what to do when you find this the value; we recommend printing no
fractional digits.  (This locale also specifies the empty string for
@code{mon_decimal_point}, so printing any fractional digits would be
confusing!)
@end table

@node Currency Symbol, Sign of Money Amount, General Numeric, Numeric Formatting
@subsection Printing the Currency Symbol
@cindex currency symbols

These members of the @code{struct lconv} structure specify how to print
the symbol to identify a monetary value---the international analog of
@samp{$} for US dollars.

Each country has two standard currency symbols.  The @dfn{local currency
symbol} is used commonly within the country, while the
@dfn{international currency symbol} is used internationally to refer to
that country's currency when it is necessary to indicate the country
unambiguously.

For example, many countries use the dollar as their monetary unit, and
when dealing with international currencies it's important to specify
that one is dealing with (say) Canadian dollars instead of U.S. dollars
or Australian dollars.  But when the context is known to be Canada,
there is no need to make this explicit---dollar amounts are implicitly
assumed to be in Canadian dollars.

@table @code
@item char *currency_symbol
The local currency symbol for the selected locale.

In the standard @samp{C} locale, this member has a value of @code{""}
(the empty string), meaning ``unspecified''.  The ANSI standard doesn't
say what to do when you find this value; we recommend you simply print
the empty string as you would print any other string found in the
appropriate member.

@item char *int_curr_symbol
The international currency symbol for the selected locale.

The value of @code{int_curr_symbol} should normally consist of a
three-letter abbreviation determined by the international standard
@cite{ISO 4217 Codes for the Representation of Currency and Funds},
followed by a one-character separator (often a space).

In the standard @samp{C} locale, this member has a value of @code{""}
(the empty string), meaning ``unspecified''.  We recommend you simply
print the empty string as you would print any other string found in the
appropriate member.

@item char p_cs_precedes
@itemx char n_cs_precedes
These members are @code{1} if the @code{currency_symbol} string should
precede the value of a monetary amount, or @code{0} if the string should
follow the value.  The @code{p_cs_precedes} member applies to positive
amounts (or zero), and the @code{n_cs_precedes} member applies to
negative amounts.

In the standard @samp{C} locale, both of these members have a value of
@code{CHAR_MAX}, meaning ``unspecified''.  The ANSI standard doesn't say
what to do when you find this value, but we recommend printing the
currency symbol before the amount.  That's right for most countries.
In other words, treat all nonzero values alike in these members.

The POSIX standard says that these two members apply to the
@code{int_curr_symbol} as well as the @code{currency_symbol}.  The ANSI
C standard seems to imply that they should apply only to the
@code{currency_symbol}---so the @code{int_curr_symbol} should always
precede the amount.

We can only guess which of these (if either) matches the usual
conventions for printing international currency symbols.  Our guess is
that they should always preceed the amount.  If we find out a reliable
answer, we will put it here.

@item char p_sep_by_space
@itemx char n_sep_by_space
These members are @code{1} if a space should appear between the
@code{currency_symbol} string and the amount, or @code{0} if no space
should appear.  The @code{p_sep_by_space} member applies to positive
amounts (or zero), and the @code{n_sep_by_space} member applies to
negative amounts.

In the standard @samp{C} locale, both of these members have a value of
@code{CHAR_MAX}, meaning ``unspecified''.  The ANSI standard doesn't say
what you should do when you find this value; we suggest you treat it as
one (print a space).  In other words, treat all nonzero values alike in
these members.

These members apply only to @code{currency_symbol}.  When you use
@code{int_curr_symbol}, you never print an additional space, because
@code{int_curr_symbol} itself contains the appropriate separator.

The POSIX standard says that these two members apply to the
@code{int_curr_symbol} as well as the @code{currency_symbol}.  But an
example in the ANSI C standard clearly implies that they should apply
only to the @code{currency_symbol}---that the @code{int_curr_symbol}
contains any appropriate separator, so you should never print an
additional space.

Based on what we know now, we recommend you ignore these members when
printing international currency symbols, and print no extra space.
@end table

@node Sign of Money Amount,  , Currency Symbol, Numeric Formatting
@subsection Printing the Sign of an Amount of Money

These members of the @code{struct lconv} structure specify how to print
the sign (if any) in a monetary value.

@table @code
@item char *positive_sign
@itemx char *negative_sign
These are strings used to indicate positive (or zero) and negative
(respectively) monetary quantities.

In the standard @samp{C} locale, both of these members have a value of
@code{""} (the empty string), meaning ``unspecified''.

The ANSI standard doesn't say what to do when you find this value; we
recommend printing @code{positive_sign} as you find it, even if it is
empty.  For a negative value, print @code{negative_sign} as you find it
unless both it and @code{positive_sign} are empty, in which case print
@samp{-} instead.  (Failing to indicate the sign at all seems rather
unreasonable.)

@item char p_sign_posn
@itemx char n_sign_posn
These members have values that are small integers indicating how to
position the sign for nonnegative and negative monetary quantities,
respectively.  (The string used by the sign is what was specified with
@code{positive_sign} or @code{negative_sign}.)  The possible values are
as follows:

@table @code
@item 0
The currency symbol and quantity should be surrounded by parentheses.

@item 1
Print the sign string before the quantity and currency symbol.

@item 2
Print the sign string after the quantity and currency symbol.

@item 3
Print the sign string right before the currency symbol.

@item 4
Print the sign string right after the currency symbol.

@item CHAR_MAX
``Unspecified''.  Both members have this value in the standard
@samp{C} locale.
@end table

The ANSI standard doesn't say what you should do when the value is
@code{CHAR_MAX}.  We recommend you print the sign after the currency
symbol.
@end table

It is not clear whether you should let these members apply to the
international currency format or not.  POSIX says you should, but
intuition plus the examples in the ANSI C standard suggest you should
not.  We hope that someone who knows well the conventions for formatting
monetary quantities will tell us what we should recommend.