about summary refs log tree commit diff
path: root/localedata/locales/iso14651_t1_common
Commit message (Collapse)AuthorAgeFilesLines
* Keep expected behaviour for [a-z] and [A-z] (Bug 23393).Carlos O'Donell2018-07-251-964/+964
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4 we updated all of the collation data to harmonize with the new version of ISO 14651 which is derived from Unicode 9.0.0. This collation update brought with it some changes to locales which were not desirable by some users, in particular it altered the meaning of the locale-dependent-range regular expression, namely [a-z] and [A-Z], and for en_US it caused uppercase letters to be matched by [a-z] for the first time. The matching of uppercase letters by [a-z] is something which is already known to users of other locales which have this property, but this change could cause significant problems to en_US and other similar locales that had never had this change before. Whether this behaviour is desirable or not is contentious and GNU Awk has this to say on the topic: https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html While the POSIX standard also has this further to say: "RE Bracket Expression": http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html "The current standard leaves unspecified the behavior of a range expression outside the POSIX locale. ... As noted above, efforts were made to resolve the differences, but no solution has been found that would be specific enough to allow for portable software while not invalidating existing implementations." In glibc we implement the requirement of ISO POSIX-2:1993 and use collation element order (CEO) to construct the range expression, the API internally is __collseq_table_lookup(). The fact that we use CEO and also have 4-level weights on each collation rule means that we can in practice reorder the collation rules in iso14651_t1_common (the new data) to provide consistent range expression resolution *and* the weights should maintain the expected total order. Therefore this patch does three things: * Reorder the collation rules for the LATIN script in iso14651_t1_common to deinterlace uppercase and lowercase letters in the collation element orders. * Adds new test data en_US.UTF-8.in for sort-test.sh which exercises strcoll* and strxfrm* and ensures the ISO 14651 collation remains. * Add back tests to tst-fnmatch.input and tst-regexloc.c which exercise that [a-z] does not match A or Z. The reordering of the ISO 14651 data is done in an entirely mechanical fashion using the following program attached to the bug: https://sourceware.org/bugzilla/show_bug.cgi?id=23393#c28 It is up for discussion if the iso14651_t1_common data should be refined further to have 3 very tight collation element ranges that include only a-z, A-Z, and 0-9, which would implement the solution sought after in: https://sourceware.org/bugzilla/show_bug.cgi?id=23393#c12 and implemented here: https://www.sourceware.org/ml/libc-alpha/2018-07/msg00854.html No regressions on x86_64. Verified that removal of the iso14651_t1_common change causes tst-fnmatch to regress with: 422: fnmatch ("[a-z]", "A", 0) = 0 (FAIL, expected FNM_NOMATCH) *** ... 425: fnmatch ("[A-Z]", "z", 0) = 0 (FAIL, expected FNM_NOMATCH) ***
* Adapt collation in several locales to the new iso14651_t1_common fileMike FABIAN2018-02-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BZ #22550] - es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR [BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan) * localedata/Makefile: Add new test files. * localedata/lv_LV.UTF-8.in: Adapt test file to new collation order. * localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation order. * localedata/uk_UA.UTF-8.in: Adapt test file to new collation order. * localedata/am_ET.UTF-8.in: New test file. * localedata/az_AZ.UTF-8.in: Likewise. * localedata/be_BY.UTF-8.in: Likewise. * localedata/ber_DZ.UTF-8.in: Likewise. * localedata/ber_MA.UTF-8.in: Likewise. * localedata/bg_BG.UTF-8.in: Likewise. * localedata/br_FR.UTF-8.in: Likewise. * localedata/cmn_TW.UTF-8.in: Likewise. * localedata/crh_UA.UTF-8.in: Likewise. * localedata/csb_PL.UTF-8.in: Likewise. * localedata/cv_RU.UTF-8.in: Likewise. * localedata/cy_GB.UTF-8.in: Likewise. * localedata/dz_BT.UTF-8.in: Likewise. * localedata/eo.UTF-8.in: Likewise. * localedata/es_ES.UTF-8.in: Likewise. * localedata/fa_IR.UTF-8.in: Likewise. * localedata/fi_FI.UTF-8.in: Likewise. * localedata/fil_PH.UTF-8.in: Likewise. * localedata/fur_IT.UTF-8.in: Likewise. * localedata/gez_ER.UTF-8@abegede.in: Likewise. * localedata/ha_NG.UTF-8.in: Likewise. * localedata/ig_NG.UTF-8.in: Likewise. * localedata/ik_CA.UTF-8.in: Likewise. * localedata/kk_KZ.UTF-8.in: Likewise. * localedata/ku_TR.UTF-8.in: Likewise. * localedata/ky_KG.UTF-8.in: Likewise. * localedata/ln_CD.UTF-8.in: Likewise. * localedata/mi_NZ.UTF-8.in: Likewise. * localedata/ml_IN.UTF-8.in: Likewise. * localedata/mn_MN.UTF-8.in: Likewise. * localedata/mr_IN.UTF-8.in: Likewise. * localedata/mt_MT.UTF-8.in: Likewise. * localedata/nb_NO.UTF-8.in: Likewise. * localedata/om_KE.UTF-8.in: Likewise. * localedata/os_RU.UTF-8.in: Likewise. * localedata/ps_AF.UTF-8.in: Likewise. * localedata/ro_RO.UTF-8.in: Likewise. * localedata/ru_RU.UTF-8.in: Likewise. * localedata/sc_IT.UTF-8.in: Likewise. * localedata/se_NO.UTF-8.in: Likewise. * localedata/sq_AL.UTF-8.in: Likewise. * localedata/sv_SE.UTF-8.in: Likewise. * localedata/szl_PL.UTF-8.in: Likewise. * localedata/tg_TJ.UTF-8.in: Likewise. * localedata/tk_TM.UTF-8.in: Likewise. * localedata/tt_RU.UTF-8.in: Likewise. * localedata/tt_RU.UTF-8@iqtelif.in: Likewise. * localedata/ug_CN.UTF-8.in: Likewise. * localedata/uz_UZ.UTF-8.in: Likewise. * localedata/vi_VN.UTF-8.in: Likewise. * localedata/yi_US.UTF-8.in: Likewise. * localedata/yo_NG.UTF-8.in: Likewise. * localedata/zh_CN.UTF-8.in: Likewise. * localedata/locales/am_ET: Adapt collation rules to new iso14651_t1_common file and fix bugs in the collation. * localedata/locales/az_AZ: Likewise. * localedata/locales/be_BY: Likewise. * localedata/locales/ber_DZ: Likewise. * localedata/locales/ber_MA: Likewise. * localedata/locales/bg_BG: Likewise. * localedata/locales/br_FR: Likewise. * localedata/locales/br_FR@euro: Likewise. * localedata/locales/ca_ES: Likewise. * localedata/locales/cns11643_stroke: Likewise. * localedata/locales/crh_UA: Likewise. * localedata/locales/cs_CZ: Likewise. * localedata/locales/csb_PL: Likewise. * localedata/locales/cv_RU: Likewise. * localedata/locales/cy_GB: Likewise. * localedata/locales/da_DK: Likewise. * localedata/locales/dz_BT: Likewise. * localedata/locales/en_CA: Likewise. * localedata/locales/eo: Likewise. * localedata/locales/es_CU: Likewise. * localedata/locales/es_EC: Likewise. * localedata/locales/es_ES: Likewise. * localedata/locales/es_US: Likewise. * localedata/locales/et_EE: Likewise. * localedata/locales/fa_IR: Likewise. * localedata/locales/fi_FI: Likewise. * localedata/locales/fil_PH: Likewise. * localedata/locales/fur_IT: Likewise. * localedata/locales/gez_ER@abegede: Likewise. * localedata/locales/ha_NG: Likewise. * localedata/locales/hr_HR: Likewise. * localedata/locales/hsb_DE: Likewise. * localedata/locales/hu_HU: Likewise. * localedata/locales/ig_NG: Likewise. * localedata/locales/ik_CA: Likewise. * localedata/locales/is_IS: Likewise. * localedata/locales/iso14651_t1_pinyin: Likewise. * localedata/locales/kk_KZ: Likewise. * localedata/locales/ku_TR: Likewise. * localedata/locales/ky_KG: Likewise. * localedata/locales/ln_CD: Likewise. * localedata/locales/lt_LT: Likewise. * localedata/locales/lv_LV: Likewise. * localedata/locales/mi_NZ: Likewise. * localedata/locales/ml_IN: Likewise. * localedata/locales/mn_MN: Likewise. * localedata/locales/mr_IN: Likewise. * localedata/locales/mt_MT: Likewise. * localedata/locales/nb_NO: Likewise. * localedata/locales/om_KE: Likewise. * localedata/locales/os_RU: Likewise. * localedata/locales/pl_PL: Likewise. * localedata/locales/ps_AF: Likewise. * localedata/locales/ro_RO: Likewise. * localedata/locales/ru_RU: Likewise. * localedata/locales/ru_UA: Likewise. * localedata/locales/sc_IT: Likewise. * localedata/locales/se_NO: Likewise. * localedata/locales/si_LK: Likewise. * localedata/locales/sq_AL: Likewise. * localedata/locales/sv_FI: Likewise. * localedata/locales/sv_FI@euro: Likewise. * localedata/locales/sv_SE: Likewise. * localedata/locales/szl_PL: Likewise. * localedata/locales/tg_TJ: Likewise. * localedata/locales/ti_ER: Likewise. * localedata/locales/tk_TM: Likewise. * localedata/locales/tl_PH: Likewise. * localedata/locales/tr_TR: Likewise. * localedata/locales/tt_RU: Likewise. * localedata/locales/tt_RU@iqtelif: Likewise. * localedata/locales/ug_CN: Likewise. * localedata/locales/uk_UA: Likewise. * localedata/locales/uz_UZ: Likewise. * localedata/locales/uz_UZ@cyrillic: Likewise. * localedata/locales/vi_VN: Likewise. * localedata/locales/yi_US: Likewise. * localedata/locales/yo_NG: Likewise.
* Add sections for various scripts to the iso14651_t1_common fileMike FABIAN2018-02-271-9/+68
| | | | | * localedata/locales/iso14651_t1_common: Add sections for various scripts to the iso14651_t1_common file.
* iso14651_t1_common: make the fourth level the codepoint for characters which ↵Mike FABIAN2018-02-271-457/+457
| | | | | | | | | | | | | | | | | | | | | | are ignorable on all 4 levels Entries for characters which have “IGNORE” on all 4 levels like: <U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429) are changed into: <U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429) i.e. putting the code point of the character into the fourth level instead of “IGNORE”. Without that change, all such characters would compare equal which would make a wcscoll test case fail. It is better to have a clearly defined sort order even for characters like this so it is good to use the code point as a tie-break. * localedata/locales/iso14651_t1_common: Use the code point of a character in the fourth collation level instead of IGNORE for all entries which have IGNORE on all 4 levels.
* Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_commonMike FABIAN2018-02-271-0/+120
| | | | | | * localedata/locales/iso14651_t1_common: Add some convenient collation symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using rules similar to those in CLDR.
* Fixing syntax errors after updating the iso14651_t1_common fileMike FABIAN2018-02-271-4/+32811
| | | | | | * localedata/locales/iso14651_t1_common: The new version of this file downloaded from ISO contained several syntax errors which are fixed by this patch.
* iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> → <U000\1>Mike FABIAN2018-02-271-13294/+13294
| | | | | * localedata/locales/iso14651_t1_common: replace all <U.....> with <U000.....> because glibc understands only 4 digit or 8 digit
* Necessary changes after updating the iso14651_t1_common fileMike FABIAN2018-02-271-8/+16
| | | | | * localedata/locales/iso14651_t1_common: Necessary changes to make the file downloaded from ISO usable by glibc.
* Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095]Mike FABIAN2018-02-271-9494/+52571
| | | | | | | | | | | | | | | | | | | | | | [BZ #14095] - Review / update collation data from Unicode / ISO 14651 File downloaded from: http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt Updating this file alone is not enough, there are problems in the new file which need to be fixed and the collation rules for many locales need to be adapted. This is done by the following patches. This update also fixes the problem that many characters are treated as identical when sorting because they were not yet in the old iso14651_t1_common file, see: https://bugzilla.redhat.com/show_bug.cgi?id=1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq [BZ #14095] * localedata/locales/iso14651_t1_common: Update file to latest version from ISO (ISO14651_2016_TABLE1_en.txt).
* Collation fix: make forward accent sorting the default [BZ #17750]Alexandre Oliva2017-11-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BZ #17750] * Makefile: add fr_CA.UTF-8 to test-input and LOCALES. * localedata/fr_CA.UTF-8.in: New file with test data for backward accents sorting. * localedata/fr_FR.UTF-8.in: Fix test data for forward accents sorting. * localedata/locales/cs_CZ (LC_COLLATE): Remove “define DIACRIT_FORWARD” * localedata/locales/de_DE (LC_COLLATE): Likewise. * localedata/locales/hu_HU (LC_COLLATE): Likewise. * localedata/locales/lb_LU (LC_COLLATE): Likewise. * localedata/locales/yuw_PG (LC_COLLATE): Likewise. * localedata/locales/fr_CA (LC_COLLATE): Add “define DIACRIT_BACKWARD” * localedata/locales/iso14651_t1_common: Use “ifdef DIACRIT_FORWARD” instead of “ifdef DIACRIT_BACKWARD”. The only locale which currently needs backward accents sorting is fr_CA. Therefore, forward accents sorting should be the default. Before this patch, backwards accent sorting was the default and all locales except fr_CA had to use define DIACRIT_FORWARD before copy "iso14651_t1" Most locales didn’t do that and thus got the inappropriate backwards accents sorting by accident. Now only the fr_CA locale needs to use define DIACRIT_BACKWARD before copy "iso14651_t1" Original patch slightly modified by: Mike FABIAN <mfabian@redhat.com>
* Correct collation rules for Malayalam.Santhosh Thottingal2017-06-111-4/+22
| | | | | | | | [BZ #19922] * locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF. [BZ #19919] * locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.
* localedata: standardize copyright/license information [BZ #11213]Mike Frysinger2016-03-211-0/+7
| | | | | | | Use the language from the FSF in all locale files to disclaim any license/copyright on locale data. See https://sourceware.org/ml/libc-locales/2013-q1/msg00048.html
* Fix whitespacesUlrich Drepper2011-05-151-1/+1
|
* Move Dzonghka collation rules to common collation rules fileUlrich Drepper2011-05-151-0/+1383
|
* Fix sorting of malayalam letter 'na'.Pravin Satpute2010-02-031-2/+2
|
* Fix whitespaces.Ulrich Drepper2010-02-031-2/+2
|
* Move Tamil collation data to common source file.Pravin Satpute2010-02-031-0/+221
|
* Implement Burmese language locale for Myanmar.Keith Stribley2009-10-301-0/+3286
|
* * localedata/locales/bn_BD: Remove comment about missing collationUlrich Drepper2009-05-041-0/+245
| | | | | | rules. * localedata/locales/iso14651_t1_common: Add Bengali collation rules. Patch by Pravin Satpute <psatpute@redhat.com>.
* [BZ #9759]Ulrich Drepper2009-03-151-8/+8
| | | | | | | | | | | | | | * dirent/dirent.h: Adjust prototypes of scandir, scandir64, alphasort, alphasort64, versionsort, and versionsort64 to POSIX 2008. * dirent/alphasort.c: Adjust implementation to type change. * dirent/alphasort64.c: Likewise. * dirent/scandir.c: Likewise. * dirent/versionsort.c: Likewise. * dirent/versionsort64.c: Likewise. * sysdeps/wordsize-64/alphasort.c: Add hack to hide alphasort64 declaration. * sysdeps/wordsize-64/versionsort.c: Add hack to hide versionsort64 declaration.
* * locales/iso14651_t1_common: Add rules for sorting Malayalam.Ulrich Drepper2009-02-111-0/+325
| | | | Patch by Santhosh Thottingal <santhosh.thottingal@gmail.com>.
* * locales/iso14651_t1_common: Fix sorting of U+0AB3. cvs/fedora-glibc-20090102T0809Ulrich Drepper2008-12-311-3/+3
| | | | Patch by Pravin Satpute <psatpute@redhat.com>.
* [BZ #6867]Ulrich Drepper2008-10-311-0/+84
| | | | * sysdeps/powerpc/elf/rtld-global-offsets.sym: Fix typo.
* * locales/iso14651_t1_common: Add Kannada collation support.Ulrich Drepper2008-07-111-0/+270
| | | | | Patch by Pravin Satpute <psatpute@redhat.com>.
* * locales/iso14651_t1_common: Add support for Gurumukhi script.Ulrich Drepper2008-06-241-0/+226
| | | | Patch by Pravin Satpute <psatpute@redhat.com>.
* Remove U0C0D entry added for Telugu.Ulrich Drepper2008-05-211-3/+0
|
* * string/strcasestr.c (CMP_FUNC): Use __strncasecmp, not strncasecmp. cvs/fedora-glibc-20080516T2152Ulrich Drepper2008-05-161-1/+2
|
* [BZ #6442]Ulrich Drepper2008-05-151-0/+273
| | | | | | | | | | | * string/endian.h: Add macros for fixed-size endian conversion. * bits/byteswap.h: Allow inclusion from <endian.h>. * sysdeps/i386/bits/byteswap.h: Likewise. * sysdeps/ia64/bits/byteswap.h: Likewise. * sysdeps/s390/bits/byteswap.h: Likewise. * sysdeps/x86_64/bits/byteswap.h: Likewise. * string/Makefile (tests): Add tst-endian. * string/tst-endian.c: New file.
* Fix first weight for U+1E60, U+1E62, U+1E64, U+1E66, and U+1E68.Ulrich Drepper2008-04-071-423/+423
|
* * locales/iso14651_t1_common: Add support for Gujarati script.Ulrich Drepper2008-03-311-0/+260
| | | | Patch by Pravin Satpute <psatpute@redhat.com>.
* * locales/iso14651_t1_common: Add support for Devanagari script.Ulrich Drepper2008-03-241-0/+302
| | | | | * locales/mr_IN: Adjust Devanagari sorting for mr_IN. Patch by Pravin Satpute <psatpute@redhat.com>.
* * locale/programs/locfile-token.h: Remove tok_elif, add tok_elifdefUlrich Drepper2007-10-111-0/+4
| | | | | | and tok_elifndef. * locale/programs/locfile-kw.gperf: Likewise. * locale/programs/ld-collate.c: Implement primitive preprocessor.
* * po/pt_BR.po: Fix typo.Ulrich Drepper2007-09-301-0/+216
|
* * locale/programs/ld-collate.c (collate_read): Allow order_startUlrich Drepper2007-04-281-0/+2424
after copy.