make nl_langinfo(CODESET) always return "UTF-8" - mirror/musl - mirror of git://git.musl-libc.org/musl

diff options

author	Rich Felker <dalias@aerifal.cx>	2015-09-09 05:13:33 +0000
committer	Rich Felker <dalias@aerifal.cx>	2015-09-09 05:13:33 +0000
commit	844212d94f582c4e3c5055e0a1524931e89ebe76 (patch)
tree	030ff98a01574f22af211ce6953160164b3ed43d /include/arpa
parent	426a0e2912c07f0e86feee2ed12f24a808eac2f4 (diff)
download	musl-844212d94f582c4e3c5055e0a1524931e89ebe76.tar.gz musl-844212d94f582c4e3c5055e0a1524931e89ebe76.tar.xz musl-844212d94f582c4e3c5055e0a1524931e89ebe76.zip

make nl_langinfo(CODESET) always return "UTF-8"

this restores the original behavior prior to the addition of the
byte-based C locale and fixes what is effectively a regression in
musl's property of always providing working UTF-8 support.

commit 1507ebf837334e9e07cfab1ca1c2e88449069a80 introduced the codeset
name "UTF-8-CODE-UNITS" for the byte-based C locale to represent that
the semantic content is UTF-8 but that it is being processed as code
units (bytes) rather than whole multibyte characters. however, many
programs assume that the codeset name is usable with iconv and/or
comes from a set of standard/widely-used names known to the
application. such programs are likely to produce warnings or errors,
run with reduced functionality, or mangle character data when run
explicitly in the C locale.

the standard places basically no requirements for the string returned
by nl_langinfo(CODESET) and how it interacts with other interfaces, so
returning "UTF-8" is permissible. moreover, it seems like the right
thing to do, since the identity of the character encoding as "UTF-8"
is independent of whether it is being processed as bytes of characters
by the standard library functions.

Diffstat (limited to 'include/arpa')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: