about summary refs log tree commit diff
path: root/localedata
diff options
context:
space:
mode:
authorFlorian Weimer <fweimer@redhat.com>2022-07-05 09:05:45 +0200
committerFlorian Weimer <fweimer@redhat.com>2022-07-05 09:06:50 +0200
commitb15538d77c6a7893c8bb42831dcd3a1a12b727d4 (patch)
tree0ebaa0c09cc8f21437021e25d06a22d62a82cb72 /localedata
parent7dcaabb94caa00c9dd68a207ea62fef5a2551ac4 (diff)
downloadglibc-b15538d77c6a7893c8bb42831dcd3a1a12b727d4.tar.gz
glibc-b15538d77c6a7893c8bb42831dcd3a1a12b727d4.tar.xz
glibc-b15538d77c6a7893c8bb42831dcd3a1a12b727d4.zip
locale: localdef input files are now encoded in UTF-8
Previously, they were assumed to be in ISO-8859-1, and that the output
charset overlapped with ISO-8859-1 for the characters actually used.
However, this did not work as intended on many architectures even for
an ISO-8859-1 output encoding because of the char signedness bug in
lr_getc.  Therefore, this commit switches to UTF-8 without making
provisions for backwards compatibility.

The following Elisp code can be used to convert locale definition files
to UTF-8:

(defun glibc/convert-localedef (from to)
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region from to)
      (goto-char (point-min))
      (save-match-data
	(while (re-search-forward "<U\\([0-9a-fA-F]+\\)>" nil t)
	  (let* ((codepoint (string-to-number (match-string 1) 16))
		 (converted
		  (cond
		   ((memq codepoint '(?/ ?\ ?< ?>))
		    (string ?/ codepoint))
		   ((= codepoint ?\") "<U0022>")
		   (t (string codepoint)))))
	    (replace-match converted t)))))))

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Diffstat (limited to 'localedata')
0 files changed, 0 insertions, 0 deletions