From b8a46c1d5abdbb5224e3f0776abbfd76a0820b41 Mon Sep 17 00:00:00 2001 From: Ulrich Drepper Date: Sat, 22 Jan 2000 09:20:14 +0000 Subject: Update. * manual/message.texi: Document new interfaces. --- manual/message.texi | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 309 insertions(+), 5 deletions(-) (limited to 'manual') diff --git a/manual/message.texi b/manual/message.texi index 294b9f0a77..232f087431 100644 --- a/manual/message.texi +++ b/manual/message.texi @@ -226,7 +226,7 @@ When an error occured the global variable @var{errno} is set to @item EBADF The catalog does not exist. @item ENOMSG -The set/message ttuple does not name an existing element in the +The set/message tuple does not name an existing element in the message catalog. @end table @@ -470,7 +470,7 @@ This is the interface defined in the X/Open standard. If no @var{Input-File} parameter is given input will be read from standard input. Multiple input files will be read as if they are concatenated. If @var{Output-File} is also missing, the output will be written to -standard output. To provide the interface one is used from other +standard output. To provide the interface one is used to from other programs a second interface is provided. @smallexample @@ -604,10 +604,10 @@ gencat -H ex.h -o ex.cat ex.msg This generates a header file with the following content: @smallexample -#define SetTwoSet 0x2 /* u.msg:8 */ +#define SetTwoSet 0x2 /* ex.msg:8 */ -#define SetOneSet 0x1 /* u.msg:4 */ -#define SetOnetwo 0x2 /* u.msg:6 */ +#define SetOneSet 0x1 /* ex.msg:4 */ +#define SetOnetwo 0x2 /* ex.msg:6 */ @end smallexample As can be seen the various symbols given in the source file are mangled @@ -768,6 +768,8 @@ categories: @menu * Translation with gettext:: What has to be done to translate a message. * Locating gettext catalog:: How to determine which catalog to be used. +* Advanced gettext functions:: Additional functions for more complicated + situations. * Using gettextized software:: The possibilities of the user to influence the way @code{gettext} works. @end menu @@ -800,6 +802,8 @@ the @file{libintl.h} header file. On systems where these functions are not part of the C library they can be found in a separate library named @file{libintl.a} (or accordingly different for shared libraries). +@comment libintl.h +@comment GNU @deftypefun {char *} gettext (const char *@var{msgid}) The @code{gettext} function searches the currently selected message catalogs for a string which is equal to @var{msgid}. If there is such a @@ -845,6 +849,8 @@ uses the @code{gettext} functions but since it must not depend on a currently selected default message catalog it must specify all ambiguous information. +@comment libintl.h +@comment GNU @deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid}) The @code{dgettext} functions acts just like the @code{gettext} function. It only takes an additional first argument @var{domainname} @@ -857,6 +863,8 @@ As for @code{gettext} the return value type is @code{char *} which is an anachronism. The returned string must never be modified. @end deftypefun +@comment libintl.h +@comment GNU @deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category}) The @code{dcgettext} adds another argument to those which @code{dgettext} takes. This argument @var{category} specifies the last @@ -990,6 +998,8 @@ domain named @code{foo}. The important point is that at any time exactly one domain is active. This is controlled with the following function. +@comment libintl.h +@comment GNU @deftypefun {char *} textdomain (const char *@var{domainname}) The @code{textdomain} function sets the default domain, which is used in all future @code{gettext} calls, to @var{domainname}. Please note that @@ -1019,6 +1029,8 @@ This possibility is questionable to use since the domain @code{messages} really never should be used. @end deftypefun +@comment libintl.h +@comment GNU @deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname}) The @code{bindtextdomain} function can be used to specify the directory which contains the message catalogs for domain @var{domainname} for the @@ -1056,6 +1068,298 @@ variable @var{errno} is set accordingly. @end deftypefun +@node Advanced gettext functions +@subsubsection Additional functions for more complicated situations + +The functions of the @code{gettext} family described so far (and all the +@code{catgets} functions as well) have one problem in the real world +which have been neglected completely in all existing approaches. What +is meant here is the handling of plural forms. + +Looking through Unix source code before the time anybody thought about +internationalization (and, sadly, even afterwards) one can often find +code similar to the following: + +@smallexample + printf ("%d file%s deleted", n, n == 1 ? "" : "s"); +@end smallexample + +@noindent +After the first complains from people internationalizing the code people +either completely avoided formulations like this or used strings like +@code{"file(s)"}. Both look unnatural and should be avoided. First +tries to solve the problem correctly looked like this: + +@smallexample + if (n == 1) + printf ("%d file deleted", n); + else + printf ("%d files deleted", n); +@end smallexample + +But this does not solve the problem. It helps languages where the +plural form of a noun is not simply constructed by adding an `s' but +that is all. Once again people fell into the trap of believing the +rules their language is using are universal. But the handling of plural +forms differs widely between the language families. There are two +things we can differ between (and even inside language families); + +@itemize @bullet +@item +The form how plural forms are build differs. This is a problem with +language which have many irregularities. German, for instance, is a +drastic case. Though English and German are part of the same language +family (Germanic), the almost regular forming of plural noun forms +(appending an `s') is ardly found in German. + +@item +The number of plural forms differ. This is somewhat surprising for +those who only have experiences with Romanic and Germanic languages +since here the number is the same (there are two). + +But other language families have only one form or many forms. More +information on this in an extra section. +@end itemize + +The consequence of this is that application writers should not try to +solve the problem in their code. This would be localization since it is +only usable for certain, hardcoded language environments. Instead the +extended @code{gettext} interface should be used. + +These extra functions are taking instead of the one key string two +strings and an numerical argument. The idea behind this is that using +the numerical argument and the first string as a key, the implementation +can select using rules specified by the translator the right plural +form. The two string arguments then will be used to provide a return +value in case no message catalog is found (similar to the normal +@code{gettext} behaviour). In this case the rules for Germanic language +is used and it is assumed that the first string argument is the singular +form, the second the plural form. + +This has the consequence that programs without language catalogs can +display the correct strings only if the program itself is written using +a Germanic language. This is a limitation but since the GNU C library +(as well as the GNU @code{gettext} package) are written as part of the +GNU package and the coding standards for the GNU project require program +being written in English, this solution nevertheless fulfills its +purpose. + +@comment libintl.h +@comment GNU +@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) +The @code{ngettext} function is similar to the @code{gettext} function +as it finds the message catalogs in the same way. But it takes two +extra arguments. The @var{msgid1} parameter must contain the singular +form of the string to be converted. It is also used as the key for the +search in the catalog. The @var{msgid2} parameter is the plural form. +The parameter @var{n} is used to determine the plural form. If no +message catalog is found @var{msgid1} is returned if @code{n == 1}, +otherwise @code{msgid2}. + +An example for the us of this function is: + +@smallexample + printf (ngettext ("%d file removed", "%d files removed", n), n); +@end smallexample + +Please note that the numeric value @var{n} has to be passed to the +@code{printf} function as well. It is not sufficient to pass it only to +@code{ngettext}. +@end deftypefun + +@comment libintl.h +@comment GNU +@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) +The @code{dngettext} is similar to the @code{dgettext} function in the +way the message catalog is selected. The difference is that it takes +two extra parameter to provide the correct plural form. These two +parameters are handled in the same way @code{ngettext} handles them. +@end deftypefun + +@comment libintl.h +@comment GNU +@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category}) +The @code{dcngettext} is similar to the @code{dcgettext} function in the +way the message catalog is selected. The difference is that it takes +two extra parameter to provide the correct plural form. These two +parameters are handled in the same way @code{ngettext} handles them. +@end deftypefun + +@subsubheading The problem of plural forms + +A description of the problem can be found at the beginning of the last +section. Now there is the question how to solve it. Without the input +of linguists (which was not available) it was not possible to determine +whether there are only a few different forms in which plural forms are +formed or whether the number can increase with every new supported +language. + +Therefore the solution implemented is to allow the translator to specify +the rules of how to select the plural form. Since the formula varies +with every language this is the only viable solution except for +harcoding the information in the code (which still would require the +possibility of extensionsto not prevent the use of new languages). The +details are explained in the GNU @code{gettext} manual. Here only a a +bit of information is provided. + +The information about the plural form selection has to be stored in the +header entry (the one with the empty (@code{msgid} string). There shoud +be something like: + +@smallexample + nplurals=2; plural=n == 1 ? 0 : 1 +@end smallexample + +The @code{nplurals} value must be a decimal number which specifies how +many different plural forms exist for this language. The string +following @code{plural} is an expression which is using the C language +syntax. Exceptions are that no negative number are allowed, numbers +must be decimal, and the only variable allowed is @code{n}. This +expression will be evaluated whenever one of the functions +@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The +numeric value passed to these functions is then substituted for all uses +of the variable @code{n} in the expression. The resulting value then +must be greater or equal to zero and smaller than the value given as the +value of @code{nplurals}. + +@noindent +The following rules are known at this point. The language with families +are listed. But this does not necessarily mean the information can be +generalized for the whole family (as can be easily seen in the table +below).@footnote{Additions are welcome. Send appropriate information to +@email{bug-glibc-manual@@gnu.org}.} + +@table @asis +@item Only one form: +Some languages only require one single form. There is no distinction +between the singular and plural form. And appropriate header entry +would look like this: + +@smallexample +nplurals=1; plural=0 +@end smallexample + +@noindent +Languages with this property include: + +@table @asis +@item Finno-Ugric family +Hungarian +@item Asian family +Japanese +@item Turkic/Altaic family +Turkish +@end table + +@item Two forms, singular used for one only +This is the form used in most existing programs sine it is what English +is using. A header entry would look like this: + +@smallexample +nplurals=2; plural=n != 1 +@end smallexample + +(Note: this uses the feature of C expressions that boolean expressions +have to value zero or one.) + +@noindent +Languages with this property include: + +@table @asis +@item Germanic family +Danish, Dutch, English, German, Norwegian, Swedish +@item Finno-Ugric family +Finnish +@item Latin/Greek family +Greek +@item Semitic family +Hebrew +@item Romance family +Italian, Spanish +@item Artificial +Esperanto +@end table + +@item Two forms, singular used for zero and one +Exceptional case in the language family. The header entry would be: + +@smallexample +nplurals=2; plural=n>1 +@end smallexample + +@noindent +Languages with this property include: + +@table @asis +@item Romanic family +French +@end table + +@item Three forms, special cases for one and two +The header entry would be: + +@smallexample +nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2 +@end smallexample + +@noindent +Languages with this property include: + +@table @asis +@item Celtic +Gaeilge +@end table + +@item Three forms, special case for one and all numbers ending in 2, 3, or 4 +The header entry would look like this: + +@smallexample +nplurals=3; plural=n==1 ? 0 : n%10>=2 && n%10<=4 ? 1 : 2 +@end smallexample + +@noindent +Languages with this property include: + +@table @asis +@item Slavic family +Russian +@end table + +@item Three forms, special case for one and some numbers ending in 2, 3, or 4 +The header entry would look like this: + +@smallexample +nplurals=3; plural=n==1 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2 +@end smallexample + +(Continuation in the next line is possible.) + +@noindent +Languages with this property include: + +@table @asis +@item Slavic family +Polish +@end table + +@item Four forms, special case for one and all numbers ending in 2, 3, or 4 +The header entry would look like this: + +@smallexample +nplurals=4; plural=n==1 ? 0 : n%10==2 ? 1 : n==3 || n+=4 ? 2 : 3 +@end smallexample + +@noindent +Languages with this property include: + +@table @asis +@item Slavic family +Slovenian +@end table +@end table + + @node Using gettextized software @subsubsection User influence on @code{gettext} -- cgit 1.4.1