stdio-common: Add tests for formatted printf output specifiers - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options
author	Maciej W. Rozycki <macro@redhat.com>	2024-11-07 06:14:24 +0000
committer	Maciej W. Rozycki <macro@redhat.com>	2024-11-07 06:14:24 +0000
commit	7ec4d7e3d1c0c6da11dbad1292fd9d94124c57ca (patch)
tree	0830b8fd35b67d528a6fb3189a8d4d14c35f46a2 /stdio-common/tst-printf-format-s.sh
parent	1b70a0a024f024328a12e31216d4d725f22e78b5 (diff)
download	glibc-7ec4d7e3d1c0c6da11dbad1292fd9d94124c57ca.tar.gz glibc-7ec4d7e3d1c0c6da11dbad1292fd9d94124c57ca.tar.xz glibc-7ec4d7e3d1c0c6da11dbad1292fd9d94124c57ca.zip
stdio-common: Add tests for formatted printf output specifiers
This is a collection of tests for formatted printf output specifiers
covering the d, i, o, u, x, and X integer conversions, the e, E, f, F,
g, and G floating-point conversions, the c character conversion, and the
s string conversion.  Also the hh, h, l, and ll length modifiers are
covered with the integer conversions as is the L length modifier with
the floating-point conversions.

The -, +, space, #, and 0 flags are iterated over, as permitted by the
conversion handled, in tuples of 1..5, including tuples with repetitions
of 2, and combined with field width and/or precision, again as permitted
by the conversion.  The resulting format string is then used to produce
output from respective sets of input data corresponding to the specific
conversion under test.  POSIX extensions beyond ISO C are not used.

Output is produced in the form of records which include both the format
string (and width and/or precision where given in the form of separate
arguments) and the conversion result, and is verified with GNU AWK using
the format obtained from each such record against the reference value
also supplied, relying on the fact that GNU AWK has its own independent
implementation of format processing, striving to be ISO C compatible.

In the course of implementation I have determined that in the non-bignum
mode GNU AWK uses system sprintf(3) for the floating-point conversions,
defeating the objective of doing the verification against an independent
implementation.  Additionally the bignum mode (using MPFR) is required
to correctly output wider integer and floating-point data.  Therefore
for the conversions affected the relevant shell scripts sanity-check AWK
and terminate with unsupported status if the bignum mode is unavailable
for floating-point data or where data is output incorrectly.

The f and F floating-point conversions are build-time options for GNU
AWK, depending on the environment, so they are probed for before being
used.  Similarly the a and A floating-point conversions, however they
are currently not used, see below.  Also GNU AWK does not handle the b
or B integer conversions at all at the moment, as at 5.3.0.  Support for
the a, A, b, and B conversions can however be easily added following the
approach taken for the f and F conversions.

Output produced by gawk for the a and A floating-point conversions does
not match one produced by us: insufficient precision is used where one
hasn't been explicitly given, e.g. for the negated maximum finite IEEE
754 64-bit value of -1.79769313486231570814527423731704357e+308 and "%a"
format we produce -0x1.fffffffffffffp+1023 vs gawk's -0x1.000000p+1024
and a different exponent is chosen otherwise, such as with "%.a" where
we output -0x2p+1023 vs gawk's -0x1p+1024 for the same value, or "%.20a"
where -0x1.fffffffffffff0000000p+1023 is our output, but gawk produces
-0xf.ffffffffffff80000000p+1020 instead.  Consequently I chose not to
include a and A conversions in testing at this time.

And last but not least there are numerous corner cases that GNU AWK does
not handle correctly, which are worked around by explicit handling in
the AWK script.  These are in particular:

- extraneous leading 0 produced for the alternative form with the o
  conversion, e.g. { printf "%#.2o", 1 } produces "001" rather than
  "01",

- unexpected 0 produced where no characters are expected for the input
  of 0 and the alternative form with the precision of 0 and the integer
  hexadecimal conversions, e.g. { printf "%#.x", 0 } produces "0" rather
  than "",

- missing + character in the non-bignum mode only for the input of 0
  with the + flag, precision of 0 and the signed integer conversions,
  e.g. { printf "%+.i", 0 } produces "" rather than "+",

- missing space character in the non-bignum mode only for the input of 0
  with the space flag, precision of 0 and the signed integer
  conversions, e.g. { printf "% .i", 0 } produces "" rather than " ",

- for released gawk versions of up to 4.2.1 missing - character for the
  input of -NaN with the floating-point conversions, e.g. { printf "%e",
  "-nan" }' produces "nan" rather than "-nan",

- for released gawk versions from 5.0.0 onwards + character output for
  the input of -NaN with the floating-point conversions, e.g. { printf
  "%e", "-nan" }' produces "+nan" rather than "-nan",

- for released gawk versions from 5.0.0 onwards + character output for
  the input of Inf or NaN in the absence of the + or space flags with
  the floating-point conversions, e.g. { printf "%e", "inf" }' produces
  "+inf" rather than "inf",

- for released gawk versions of up to 4.2.1 missing + character for the
  input of Inf or NaN with the + flag and the floating-point
  conversions, e.g. { printf "%+e", "inf" }' produces "inf" rather than
  "+inf",

- for released gawk versions of up to 4.2.1 missing space character for
  the input of Inf or NaN with the space flag and the floating-point
  conversions, e.g. { printf "% e", "nan" }' produces "nan" rather than
  " nan",

- for released gawk versions from 5.0.0 onwards + character output for
  the input of Inf or NaN with the space flag and the floating-point
  conversions, e.g. { printf "% e", "inf" }' produces "+inf" rather than
  " inf",

- for released gawk versions from 5.0.0 onwards the field width is
  ignored for the input of Inf or NaN and the floating-point
  conversions, e.g. { printf "%20e", "-inf" }' produces "-inf" rather
  than "                -inf",

NB for released gawk versions of up to 4.2.1 floating-point conversion
issues apply to the bignum mode only, as in the non-bignum mode system
sprintf(3) is used.  As from version 5.0.0 specialized handling has been
added for [-]Inf and [-]NaN inputs and the issues listed apply to both
modes.  The '--posix' flag makes gawk versions from 5.0.0 onwards avoid
the issue with field width and the + character unconditionally output
for the input of Inf or NaN, however not the remaining issues and then
the 'gensub' function is not supported in the POSIX mode, so to go this
path I deemed not worth it.

Each test completes within single seconds except for the long double
one.  There the F/f formats produce a large number of digits, which
appears to be computationally intensive and CPU-bound.  Standalone
execution time for 'tst-printf-format-p-ldouble --direct f' is in the
range of 00m36s for POWER9@2.166GHz and 09m52s for FU740@1.2GHz and
output redirected locally to /dev/null, and 10m11s for FU740 and output
redirected over 100Mbps network via SSH to /dev/null, so the throughput
of the network adds very little (~3.2% in this case) to the processing
time.  This is with IEEE 754 quad.

So I have scaled the timeout for 'tst-printf-format-skeleton-ldouble'
accordingly.  Regardless, following recent practice the test has been
added to the standard rather than extended set.  However, unlike most
of the remaining tests it has been split by the conversion specifier,
so as to allow better parallelization of this long-running test.  As
a side effect this lets the test report the unsupported status for the
F/f conversions where applicable, so 'tst-printf-format-p-double' has
been split for consistency as well.

Only printf itself is handled at the moment, but the infrastructure
provides for all the printf family functions to be verified, changes
for which to be supplied separately.  The complication around having
some tests iterating over all the relevant conversion specifiers and
other verifying conversion specifiers individually combined with
iterating over printf family functions has hit a peculiarity in GNU
make where the use of multiple targets with a pattern rule is handled
differently from such use with an ordinary rule.  Consequently it
seems impossible to bulk-define a pattern rule using '$(foreach ...)',
where each target would simply trigger the recipe according to the
pattern and matching dependencies individually (such a rule does work,
but implies all targets to be updated with a single recipe execution).

Therefore as a compromise a single single-target pattern rule has been
defined that has listed all the conversion-specific scripts and all the
test executables as dependencies.  Consequently tests will be rerun in
the absence of changes to their actual sources or scripts whenever an
unrelated file has changed that has been listed.  Also all the formatted
printf output tests will always be built whenever any single one is to
be run.  This only affects test development and not test runs in the
field, though it does change the order of execution of the individual
steps and also acts as a Makefile barrier in parallel runs.  As the
execution time dominates the compilation time for these tests it is not
seen as a serious shortcoming.

As pointed out by Florian Weimer <fweimer@redhat.com> the malloc tracing
facility can take a substantial amount of time in calling dladdr(3) to
determine the caller's location.  This is not needed by the verification
made with these tests, so I chose to interpose the symbol with a stub
implementation that always fails in the shared skeleton.  We have total
control over the test environment, so I think it is a safe and minimal
impact approach.  If there's ever anything else added to the tests that
would actually rely on dladdr(3) returning usable results, only then we
can think of a different approach.

Reviewed-by: DJ Delorie <dj@redhat.com>
Diffstat (limited to 'stdio-common/tst-printf-format-s.sh')
-rw-r--r--
stdio-common/tst-printf-format-s.sh
1 files changed, 34 insertions, 0 deletions

context:
space:
mode: