summary refs log tree commit diff
path: root/localedata/unicode-gen
Commit message (Collapse)AuthorAgeFilesLines
* Bug 24307: Update to Unicode 12.0.0Mike FABIAN2019-03-086-247/+1021
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unicode 12.0.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 12.0.0, using the generator scripts contributed by Mike FABIAN (Red Hat). Some info about the number of characters added or changed: Total added characters in newly generated CHARMAP: 554 Total added characters in newly generated WIDTH: 106 alpha: Missing 8 characters of old ctype in new ctype (These are combining marks, apparently they were removed from alpha on purpose) alpha: Added 295 characters in new ctype which were not in old ctype combining: Missing 2 characters of old ctype in new ctype (U+1CF2 VEDIC SIGN ARDHAVISARGA and U+1CF3 VEDIC SIGN ROTATED ARDHAVISARGA, these are now "Alphabetic" in Unicode 12.0.0) combining: Added 37 characters in new ctype which were not in old ctype combining_level3: Missing 2 characters of old ctype in new ctype (U+1CF2 VEDIC SIGN ARDHAVISARGA and U+1CF3 VEDIC SIGN ROTATED ARDHAVISARGA, these are now "Alphabetic" in Unicode 12.0.0) combining_level3: Added 26 characters in new ctype which were not in old ctype graph: Added 554 characters in new ctype which were not in old ctype lower: Added 6 characters in new ctype which were not in old ctype print: Added 554 characters in new ctype which were not in old ctype punct: Missing 29 characters of old ctype in new ctype (These characters have all become "Alphabetic" in Unicode 12.0.0. Therefore, they are not in "punct" anymore (see: is_punct() in unicode_utils.py)) punct: Added 296 characters in new ctype which were not in old ctype tolower: Added 7 characters in new ctype which were not in old ctype totitle: Added 7 characters in new ctype which were not in old ctype toupper: Added 7 characters in new ctype which were not in old ctype upper: Added 7 characters in new ctype which were not in old ctype [BZ #24307] * localedata/unicode-gen/Makefile (UNICODE_VERSION): Set to 12.0.0. * localedata/unicode-gen/DerivedCoreProperties.txt: Update to Unicode 12.0.0. * localedata/unicode-gen/EastAsianWidth.txt: Likewise. * localedata/unicode-gen/PropList.txt: Likewise. * localedata/unicode-gen/UnicodeData.txt: Likewise. * localedata/unicode-gen/ctype_compatibility_test_cases.py: U+108D became "Alphabetic" in Unicode 12.0.0. Adapt test case. * localedata/charmaps/UTF-8: Regenerate. * localedata/locales/i18n_ctype: Likewise. * localedata/locales/tr_TR: Likewise. * localedata/locales/translit_circle: Likewise. * localedata/locales/translit_cjk_compat: Likewise. * localedata/locales/translit_combining: Likewise. * localedata/locales/translit_compat: Likewise. * localedata/locales/translit_font: Likewise. * localedata/locales/translit_fraction: Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-0113-13/+13
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Put the correct Unicode version number 11.0.0 into the generated filesMike FABIAN2018-07-102-41/+75
| | | | | | | | | | | In some places there was still the old Unicode version 10.0.0 in the files. * localedata/charmaps/UTF-8: Use correct Unicode version 11.0.0 in comment. * localedata/locales/i18n_ctype: Use correct Unicode version in comments and headers. * localedata/unicode-gen/utf8_gen.py: Add option to specify Unicode version * localedata/unicode-gen/Makefile: Use option to specify Unicode version for utf8_gen.py
* Bug 23308: Update to Unicode 11.0.0Mike FABIAN2018-07-045-267/+1240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unicode 11.0.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 11.0.0, using the generator scripts contributed by Mike FABIAN (Red Hat). Some info about the number of characters added: Total added characters in newly generated CHARMAP: 684 Total added characters in newly generated WIDTH: 119 alpha: Added 380 characters in new ctype which were not in old ctype combining: Added 56 characters in new ctype which were not in old ctype combining_level3: Added 37 characters in new ctype which were not in old ctype graph: Added 684 characters in new ctype which were not in old ctype lower: Added 82 characters in new ctype which were not in old ctype print: Added 684 characters in new ctype which were not in old ctype punct: Added 304 characters in new ctype which were not in old ctype tolower: Added 79 characters in new ctype which were not in old ctype totitle: Added 33 characters in new ctype which were not in old ctype toupper: Added 79 characters in new ctype which were not in old ctype upper: Added 79 characters in new ctype which were not in old ctype No characters were removed. [BZ #23308] * unicode-gen/Makefile (UNICODE_VERSION): Set to 11.0.0. * localedata/unicode-gen/DerivedCoreProperties.txt: Update to Unicode 11.0.0. * localedata/unicode-gen/EastAsianWidth.txt: likewise. * localedata/unicode-gen/PropList.txt: likewise. * localedata/unicode-gen/UnicodeData.txt: likewise. * localedata/charmaps/UTF-8: Regenerate. * localedata/locales/i18n_ctype: likewise. * localedata/locales/tr_TR: likewise. * localedata/locales/translit_circle: likewise. * localedata/locales/translit_cjk_compat: likewise. * localedata/locales/translit_combining: likewise. * localedata/locales/translit_compat: likewise. * localedata/locales/translit_font: likewise. * localedata/locales/translit_fraction: likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-0113-13/+13
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* localedata: Once again correct and regenerate i18n_ctype.Rafal Luzynski2017-10-311-1/+1
| | | | | | | | | | | | | Following the previous work by Carlos O'Donell the category of LC_CTYPE is correctly set to "i18n:2012" rather than "unicode:2014" and the i18n_ctype file is once again regenerated from scratch to make sure it does not contain any manual additions except the copyright message. Reviewed-by: Carlos O'Donell <carlos@redhat.com> * localedata/unicode-gen/gen_unicode_ctype.py (output_head): category of LC_CTYPE set to "i18n:2012". * localedata/locales/i18n_ctype: Regenerate.
* localedata: Fix unicode-gen check target.Carlos O'Donell2017-10-251-2/+2
| | | | | | | | | | | | | | | After the transition to generating a distinct file for Unicode ctype information e.g. i18n_ctype, the check target was left with the wrong target name. This patch fixes the check target and regenerates the files with more information than previously used, filling in the the LC_IDENTIFICATION data. Tested on x86_64 by regenerating from Unicode source files, and running checks. Tested by subsequently rebuilding all locales. No regressions in testsuite. Signed-off-by: Carlos O'Donell <carlos@redhat.com> Reported-by: Rafal Luzynski <digitalfreak@lingonborough.com>
* localedata: Reorganize Unicode LC_CTYPE inclusion.Carlos O'Donell2017-10-131-13/+13
| | | | | | | | | | | | | | | | The commit does the following things: * Move non-transliteration Unicode generated data to i18n_ctype. * Copy the i18n_ctype data into i18n and add transliteration. In the future, any locale which needs Unicode LC_CTYPE data can also just use `copy i18n_ctype` and get the base character classes and maps without transliteration. Tested by compiling all the locales and my prototype C.UTF-8 which uses it. Signed-off-by: Carlos O'Donell <carlos@redhat.com>
* Improve utf8_gen.py to set the width for characters with ↵Mike FABIAN2017-09-063-7/+1648
| | | | | | | | | Prepended_Concatenation_Mark property to 1 [BZ #22070] * localedata/unicode-gen/utf8_gen.py: Set the width for characters with Prepended_Concatenation_Mark property to 1 * localedata/charmaps/UTF-8: Updated using the improved script.
* Write all ranges of neighbouring characters with the same width using the ↵Mike FABIAN2017-09-061-13/+38
| | | | | | | | | | | | | | | | | | | range notation in charmaps/UTF-8 Writing ranges of neighbouring characters with the same with like this <U000E0100>...<U000E01EF> 0 in charmaps/UTF-8 is more efficient than writing many single character lines like: <U000E0100> 0 <U000E0101> 0 ... [BZ #21750] * unicode-gen/utf8_gen.py: Write all ranges of neighbouring characters with the same width using the range notation in charmaps/UTF-8.
* Resolve some historically special cases of ambiguous widthThorsten Glaser2017-08-171-0/+12
| | | | | | | | [BZ #21750] * unicode-gen/utf8_gen.py (U+00AD): Set width to 1. * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0. * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2. * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
* Handle more cases of combining charactersThorsten Glaser2017-08-171-1/+1
| | | | | [BZ #21750] * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining.
* UnicodeData has precedence over EastAsianWidthThorsten Glaser2017-08-171-17/+9
| | | | | | | | [BZ #19852] [BZ #21750] * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before UnicodeData lines so the latter have precedence; remove hack to group output by EastAsianWidth ranges.
* Bug 21533: Update to Unicode 10.0.0Mike FABIAN2017-06-224-93/+1355
| | | | | | * Unicode 10.0.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 10.0.0, using generator scripts contributed by Mike FABIAN (Red Hat).
* Bug 20313: Update to Unicode 9.0.0Mike FABIAN2017-02-214-132/+1953
| | | | | | * Unicode 9.0.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 9.0.0, using generator scripts contributed by Mike FABIAN (Red Hat).
* Update copyright dates with scripts/update-copyrights.Joseph Myers2017-01-0113-13/+13
|
* unicode-gen: include standard comment file headerMike Frysinger2016-06-117-0/+17
| | | | | We deployed this header to all the locale files, so make sure we include it in the generated ones too so we don't lose it.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2016-01-0413-13/+13
|
* Automate LC_CTYPE generation for tr_TR, update to Unicode 8.0.0 (bug 18491).Joseph Myers2015-12-113-5/+37
| | | | | | | | | | | | | | | | | | | | This patch makes the automation of Unicode LC_CTYPE generation also support generating the modified LC_CTYPE used for Turkish (where case conversions of 'i' and 'I' differ from ASCII conventions), so allowing that to be more readily kept in sync for future Unicode updates. The patch includes the locale update generated by the scripts. Tested for x86_64. [BZ #18491] * unicode-gen/unicode_utils.py (to_upper_turkish): New function. (to_lower_turkish): Likewise. * unicode-gen/gen_unicode_ctype.py (output_tables): Support producing output with Turkish case conversions. (--turkish): New command-line option. * unicode-gen/Makefile (GENERATED): Add tr_TR. (tr_TR): New rule. * locales/tr_TR: Regenerate LC_CTYPE.
* Update to Unicode 8.0.0.Mike FABIAN2015-12-105-267/+2504
| | | | | | Update __STDC_ISO_10646__ to 201505L for Unicode 8.0.0. Update character encoding, ctype, and transliteration tables. New scripts autogenerate transliteration tables.
* Update transliteration support to Unicode 7.0.0.Carlos O'Donell2015-12-0911-665/+2112
| | | | | The transliteration files are now autogenerated from upstream Unicode data.
* Amendments to Unicode 7 update.Alexandre Oliva2015-02-235-14/+14
| | | | | | | | | | | | | | | | | | for ChangeLog * include/stdc-predef.h (__STDC_ISO_10646__): Update to 201304L, for Unicode 7. for localedata/ChangeLog * unicode-gen/ctype_compatibility.py: Use date ranges in copyright notice. * unicode-gen/ctype_compatibility_test_cases.py: Likewise. * unicode-gen/gen_unicode_ctype.py: Likewise. * unicode-gen/utf8_compatibility.py: Likewise. * unicode-gen/utf8_gen.py: Likewise. Use upper case for global variables, use tuples for global constant arrays. From Mike FABIAN. Suggested by Mike Frysinger <vapier@gentoo.org>.
* Unicode 7.0.0 update; added generator scripts.Alexandre Oliva2015-02-2010-0/+43265
for localedata/ChangeLog [BZ #17588] [BZ #13064] [BZ #14094] [BZ #17998] * unicode-gen/Makefile: New. * unicode-gen/unicode-license.txt: New, from Unicode. * unicode-gen/UnicodeData.txt: New, from Unicode. * unicode-gen/DerivedCoreProperties.txt: New, from Unicode. * unicode-gen/EastAsianWidth.txt: New, from Unicode. * unicode-gen/gen_unicode_ctype.py: New generator, from Mike FABIAN <mfabian@redhat.com>. * unicode-gen/ctype_compatibility.py: New verifier, from Pravin Satpute <psatpute@redhat.com> and Mike FABIAN. * unicode-gen/ctype_compatibility_test_cases.py: New verifier module, from Mike FABIAN. * unicode-gen/utf8_gen.py: New generator, from Pravin Satpute and Mike FABIAN. * unicode-gen/utf8_compatibility.py: New verifier, from Pravin Satpute and Mike FABIAN. * charmaps/UTF-8: Update. * locales/i18n: Update. * gen-unicode-ctype.c: Remove. * tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns true for ordinal indicators.