| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
add a member of appropriate type to the fpos_t union so that accesses
are well-defined. use long long instead of off_t since off_t is not
always exposed in stdio.h and there's no namespace-clean alias for it.
access is still performed using pointer casts rather than by naming
the union member as a matter of style; to the extent possible, the
naming of fields in opaque types defined in the public headers is not
treated as an API contract with the implementation. access via the
pointer cast is valid as long as the union has a member of matching
type.
|
|
|
|
|
| |
this is the idiom that's used elsewhere and should be more efficient
or at least no worse.
|
| |
|
| |
|
|
|
|
|
|
| |
they seem to be relics of e3cd6c5c265cd481db6e0c5b529855d99f0bda30
where this code was refactored from a check that previously masked
against (F_ERR|F_NOWR) instead of just F_NOWR.
|
|
|
|
|
|
|
|
| |
formally, calling readv with a zero-length first iov component should
behave identically to calling read on just the second component, but
presence of a zero-length iov component has triggered bugs in some
kernels and performs significantly worse than a simple read on some
file types.
|
|
|
|
|
|
|
|
|
|
| |
the stdio FILE read backend's return type is size_t, not ssize_t, and
all of the special (non-fd-backed) FILE types already return the
number of bytes read (zero) on error or eof. only __stdio_read leaked
a syscall error return into its return value.
fread had a workaround for this behavior going all the way back to the
original check-in. remove the workaround since it's no longer needed.
|
|
|
|
|
| |
replace with simple conditional that doesn't rely on assumption that
cnt is either 0 or -1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when a null buffer pointer is passed to fmemopen, requesting it
allocate its own memory buffer, extremely large size arguments near
SIZE_MAX could overflow and result in underallocation. this results
from omission of the size of the cookie structure in the overflow
check but inclusion of it in the calloc call.
instead of accounting for individual small contributions to the total
allocation size needed, simply reject sizes larger than PTRDIFF_MAX,
which will necessarily fail anyway. then adding arbitrary fixed-size
structures is safe without matching up the expressions in the
comparison and the allocation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 78897b0dc00b7cd5c29af5e0b7eebf2396d8dce0 wrongly simplified
Dmitry Levin's original submitted patch fixing alt-form octal with the
zero flag and field width present, omitting the special case where the
value is zero. as a result, printf("%#o",0) wrongly prints "00" rather
than "0".
the logic prior to this commit was actually better, in that it was
aligned with how the alt-form flag (#) for printf is specified ("it
shall increase the precision"). at the time there was no good way to
avoid the zero flag issue with the old logic, but commit
167dfe9672c116b315e72e57a55c7769f180dffa added tracking of whether an
explicit precision was provided.
revert commit 78897b0dc00b7cd5c29af5e0b7eebf2396d8dce0 and switch to
using the explicit precision indicator for suppressing the zero flag.
|
|
|
|
| |
In all cases this is just a change from two volatile int to one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
notes added by maintainer:
this function is a GNU extension. it was chosen over the similar BSD
function funopen because the latter depends on fpos_t being an
arithmetic type as part of its public API, conflicting with our
definition of fpos_t and with the intent that it be an opaque type. it
was accepted for inclusion because, despite not being widely used, it
is usually very difficult to extricate software using it from the
dependency on it.
calling pattern for the read and write callbacks is not likely to
match glibc or other implementations, but should work with any
reasonable callbacks. in particular the read function is never called
without at least one byte being needed to satisfy its caller, so that
spurious blocking is not introduced.
contracts for what callbacks called from inside libc/stdio can do are
always complicated, and at some point still need to be specified
explicitly. at the very least, the callbacks must return or block
indefinitely (they cannot perform nonlocal exits) and they should not
make calls to stdio using their own FILE as an argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, fgetwc left all but the first byte of an illegal sequence
unread (available for subsequent calls) when reading out of the FILE
buffer, but dropped all bytes contibuting to the error when falling
back to reading a byte at a time. neither behavior was ideal. in the
buffered case, each malformed character produced one error per byte,
rather than one per character. in the unbuffered case, consuming the
last byte that caused the transition from "incomplete" to "invalid"
state potentially dropped (and produced additional spurious encoding
errors for) the next valid character.
to handle both cases uniformly without duplicate code, revise the
buffered case to only cover situations where a complete and valid
character is present in the buffer, and fall back to byte-at-a-time
for all other cases. this allows using mbtowc (stateless) instead of
mbrtowc, which may slightly improve performance too.
when an encoding error has been hit in the byte-at-a-time case, leave
the final byte that produced the error unread (via ungetc) except in
the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and
continuation bytes with no lead byte). single-byte errors are fully
consumed so as not to leave the caller in an infinite loop repeating
the same error.
none of these changes are distinguished from a conformance standpoint,
since the file position is unspecified after encoding errors. they are
intended merely as QoI/consistency improvements.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fgetwc does not set the stream's error indicator on encoding errors,
making ferror insufficient to distinguish between error and eof
conditions. feof is also insufficient, since it will return true if
the file ended with a partial character encoding error.
whether fgetwc should be setting the error indicator itself is a
question with conflicting answers. the POSIX text for the function
states it as a requirement, but the ISO C text seems to require that
it not. this may be revisited in the future based on the outcome of
Austin Group issue #1170.
|
|
|
|
|
| |
Update the buffer position according to the bytes consumed into st when
decoding an incomplete character at the end of the buffer.
|
|
|
|
|
| |
this is mandated by C and POSIX standards and is in accordance with
glibc behavior.
|
|
|
|
|
|
| |
commit c002668eb0352e619ea7064e4940b397b4a6e68d inadvertently moved
the check for unflushed write buffer outside of the scope of the
existing lock.
|
|
|
|
|
|
| |
The switch statement has no 'default:' case and the function ends
immediately following the switch, so the extra comparison did not
communicate any extra information to the compiler.
|
|
|
|
|
| |
commit 58e2396a9aa23c132faf4198ca4d779c84955b38 missed that the same
code was duplicated in implementation of vfwprintf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the code being removed was written to optimize for size assuming the
compiler cannot collapse code paths for different types with the same
underlying representation. modern compilers sometimes succeed in
making this optimization themselves, but either way it's a small size
difference and not worth the source-level complexity or the UB
involved in this hack.
some incorrect use of va_arg still remains, particularly use of void *
where the actual argument has a different pointer type. fixing this
requires some actual code additions, rather than just removing cruft,
so I'm leaving it to be done later as a separate commit.
|
| |
|
|
|
|
|
|
|
|
| |
the swprintf write callback never reset its buffer pointers, so after
its 256-byte buffer filled up, it would keep repeating those bytes
over and over in the output until the destination buffer filled up. it
also failed to set the error indicator for the stream on EILSEQ,
potentially allowing output to continue after the error.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the old snprintf design setup the FILE buffer pointers to point
directly into the destination buffer; if n was actually larger than
the buffer size, the pointer arithmetic to compute the buffer end
pointer was undefined. this affected sprintf, which is implemented in
terms of snprintf, as well as some unusual but valid direct uses of
snprintf.
instead, setup the FILE as unbuffered and have its write function
memcpy to the destination. the printf core sets up its own temporary
buffer for unbuffered streams.
|
|
|
|
|
|
|
|
| |
in nearest rounding mode exact halfway cases were not following the
round to even rule if the rounding happened at a base 1000000000 digit
boundary of the internal representation and the previous digit was odd.
e.g. printf("%.0f", 1.5) printed 1 instead of 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this patch fixes a large number of missed internal signed-overflow
checks and errors in determining when the return value (output length)
would exceed INT_MAX, which should result in EOVERFLOW. some of the
issues fixed were reported by Alexander Cherepanov; others were found
in subsequent review of the code.
aside from the signed overflows being undefined behavior, the
following specific bugs were found to exist in practice:
- overflows computing length of floating point formats with huge
explicit precisions, integer formats with prefix characters and huge
explicit precisions, or string arguments or format strings longer
than INT_MAX, resulted in wrong return value and wrong %n results.
- literal width and precision values outside the range of int were
misinterpreted, yielding wrong behavior in at least one well-defined
case: string formats with precision greater than INT_MAX were
sometimes truncated.
- in cases where EOVERFLOW is produced, incorrect values could be
written for %n specifiers past the point of exceeding INT_MAX.
in addition to fixing these bugs, we now stop producing output
immediately when output length would exceed INT_MAX, rather than
continuing and returning an error only at the end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the requested precision is close to INT_MAX, adding
LDBL_MANT_DIG/3+8 overflows. in practice the resulting undefined
behavior manifests as a large negative result, which is then used to
compute the new end pointer (z) with a wildly out-of-bounds value
(more overflow, more undefined behavior). the end result is at least
incorrect output and character count (return value); worse things do
not seem to happen, but detailed analysis has not been done.
this patch fixes the overflow by performing the intermediate
computation as unsigned; after division by 9, the final result
necessarily fits in int.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, fflush_unlocked was an alias for an internal backend that
was called by fflush, either for its argument or in a loop for each
file if a null pointer was passed. since the logic for the latter was
in the main fflush function, fflush_unlocked crashed when passed a
null pointer, rather than flushing all open files. since
fflush_unlocked is not a standard function and has no specification,
it's not clear whether it should be expected to accept null pointers
like fflush does, but a reasonable argument could be made that it
should.
this patch eliminates the helper function, simplifying fflush, and
makes fflush_unlocked an alias for fflush, which is valid because the
two functions agree in their behavior in all cases where their
behavior is defined (the unlocked version has undefined behavior if
another thread could hold locks).
|
|
|
|
|
|
|
|
|
|
| |
commit b91cdbe2bc8b626aa04dc6e3e84345accf34e4b1, in fixing another
issue, changed the logic for how alt-form octal adds the leading zero
to adjust the precision rather than using a prefix character. this
wrongly suppressed the zero flag by mimicing an explicit precision
given by the format string. switch back to using a prefix character.
based on bug report and patch by Dmitry V. Levin, but simplified.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 7e816a6487932cbb3cb71d94b609e50e81f4e5bf (version 1.1.11
release cycle) moved the code that performs wchar_t to multibyte
conversion across code that used the resulting length in bytes,
thereby breaking the unget buffer space check in ungetwc and
clobbering up to three bytes below the start of the buffer.
for allocated FILEs (all read-enabled FILEs except stdin), the
underflow clobbers at most the FILE-specific locale pointer. no stores
are performed through this pointer, but subsequent loads may result in
a crash or mismatching encoding rule (UTF-8 multibyte vs byte-based).
for stdin, the buffer lies in .bss and the underflow may clobber
another object. in practice, for libc.so the adjacent object seems to
be stderr's buffer, which is completely unused, but this could vary
with linking options, or when static linking.
applications which do not attempt to use more than one character of
ungetwc pushback, or which do not use ungetwc, are not affected.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the comparison f->wpos > f->buf has undefined behavior when f->wpos is
a null pointer, despite the intuition (and actual compiler behavior,
for all known compilers) being that NULL > ptr is false for all valid
pointers ptr.
the purpose of the comparison is to determine if the write buffer is
non-empty, and the idiom used elsewhere for that is comparison against
f->wbase, which is either a null pointer when not writing, or equal to
f->buf when writing. in the former case, both f->wpos and f->wbase are
null; in the latter they are both non-null and point into the same
array.
|
|
|
|
|
|
|
| |
the idiom fprintf(f, "%.*s", n, "") was wrongly used in vfwprintf as a
means of producing n spaces; instead it produces no output. the
correct form is fprintf(f, "%*s", n, ""), using width instead of
precision, since for %s the later is a maximum rather than a minimum.
|
|
|
|
|
|
|
|
|
|
| |
internally, the idiom of passing nmemb=1 to fwrite and interpreting
the return value of fwrite (which is necessarily 0 or 1) as
failure/success is fairly widely used. this is not correct, however,
when the size argument is unknown and may be zero, since C requires
fwrite to return 0 in that special case. previously fwrite always
returned nmemb on success, but this was changed for conformance with
ISO C by commit 500c6886c654fd45e4926990fee2c61d816be197.
|
|
|
|
|
|
|
| |
when the size argument was zero but nmemb was nonzero, these functions
were returning nmemb, despite no data having been written.
conceptually this is not wrong, but the standard requires a return
value of zero in this case.
|
|
|
|
|
|
|
|
|
|
|
|
| |
when a write error occurred while flushing output due to a newline,
fwrite falsely reported all bytes up to and including the newline as
successfully written. in general, due to buffering such "spurious
success" returns are acceptable for stdio; however for line-buffered
mode it was subtly wrong. errors were still visible via ferror() or as
a short-write return if there was more data past the newline that
should have been written, but since the contract for line-buffered
mode is that everything up through the newline be written out
immediately, a discrepency was observable in the actual file contents.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, getdelim was allocating twice the space needed every time
it expanded its buffer to implement exponential buffer growth (in
order to avoid quadratic run time). however, this doubling was
performed even when the final buffer length needed was already known,
which is the common case that occurs whenever the delimiter is in the
FILE's buffer.
this patch makes two changes to remedy the situation:
1. over-allocation is no longer performed if the delimiter has already
been found when realloc is needed.
2. growth factor is reduced from 2x to 1.5x to reduce the relative
excess allocation in cases where the delimiter is not initially in the
buffer, including unbuffered streams.
in theory these changes could lead to quadratic time if the same
buffer is reused to process a sequence of lines successively
increasing in length, but once this length exceeds the stdio buffer
size, the delimiter will not be found in the buffer right away and
exponential growth will still kick in.
|
|
|
|
|
|
|
|
|
|
|
| |
getdelim was updating *n, the caller's stored buffer size, before
calling realloc. if getdelim then failed due to realloc failure, the
caller would see in *n a value larger than the actual size of the
allocated block, and use of that value is unsafe. in particular,
passing it again to getdelim is unsafe.
now, temporary storage is used for the desired new size, and *n is not
written until realloc succeeds.
|
|
|
|
|
|
|
|
|
|
| |
the buffer enlargement logic here accounted for the terminating null
byte, but not for the possibility of hitting the delimiter in the
buffer-refill code path that uses getc_unlocked, in which case two
additional bytes (the delimiter and the null termination) are written
without another chance to enlarge the buffer.
this patch and the corresponding bug report are by Felix Janda.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the specification for these functions requires that the buffer/size
exposed to the caller be valid after any successful call to fflush or
fclose on the stream. the implementation's approach is to update them
only at flush time, but that misses the case where fflush or fclose is
called without any writes having taken place, in which case the write
flushing callback will not be called.
to fix both the observable bug and the desired invariant, setup empty
buffers at open time and fail the open operation if no memory is
available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this fixes a bug reported by Nuno Gonçalves. previously, calling
fclose on stdin or stdout resulted in deadlock at exit time, since
__stdio_exit attempts to lock these streams to flush/seek them, and
has no easy way of knowing that they were closed.
conceptually, leaving a FILE stream locked on fclose is valid since,
in the abstract machine, it ceases to exist. but to satisfy the
implementation-internal assumption in __stdio_exit that it can access
these streams unconditionally, we need to unlock them.
it's also necessary that fclose leaves permanent streams in a state
where __stdio_exit will not attempt any further operations on them.
fortunately, the call to fflush already yields this property.
|
|
|
|
|
|
|
| |
tempnam uses an uninitialized buffer which is filled using memcpy and
__randname. It is therefore necessary to explicitly null-terminate it.
based on patch by Felix Janda.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functions which open in-memory FILE stream variants all shared a tail
with __fdopen, adding the FILE structure to stdio's open file list.
replacing this common tail with a function call reduces code size and
duplication of logic. the list is also partially encapsulated now.
function signatures were chosen to facilitate tail call optimization
and reduce the need for additional accessor functions.
with these changes, static linked programs that do not use stdio no
longer have an open file list at all.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this patch adjusts libc components which use the multibyte functions
internally, and which depend on them operating in a particular
encoding, to make the appropriate locale changes before calling them
and restore the calling thread's locale afterwards. activating the
byte-based C locale without these changes would cause regressions in
stdio and iconv.
in the case of iconv, the current implementation was simply using the
multibyte functions as UTF-8 conversions. setting a multibyte UTF-8
locale for the duration of the iconv operation allows the code to
continue working.
in the case of stdio, POSIX requires that FILE streams have an
encoding rule bound at the time of setting wide orientation. as long
as all locales, including the C locale, used the same encoding,
treating high bytes as UTF-8, there was no need to store an encoding
rule as part of the stream's state.
a new locale field in the FILE structure points to the locale that
should be made active during fgetwc/fputwc/ungetwc on the stream. it
cannot point to the locale active at the time the stream becomes
oriented, because this locale could be mutable (the global locale) or
could be destroyed (locale_t objects produced by newlocale) before the
stream is closed. instead, a pointer to the static C or C.UTF-8 locale
object added in commit commit aeeac9ca5490d7d90fe061ab72da446c01ddf746
is used. this is valid since categories other than LC_CTYPE will not
affect these functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 58165923890865a6ac042fafce13f440ee986fd9 added these optional
cancellation points on the basis that cancellable stdio could be
useful, to unblock threads stuck on stdio operations that will never
complete. however, the only way to ensure that cancellation can
achieve this is to violate the rules for side effects when
cancellation is acted upon, discarding knowledge of any partial data
transfer already completed. our implementation exhibited this behavior
and was thus non-conforming.
in addition to improving correctness, removing these cancellation
points moderately reduces code size, and should significantly improve
performance on i386, where sysenter/syscall instructions can be used
instead of "int $128" for non-cancellable syscalls.
|
|
|
|
|
|
|
|
|
|
|
|
| |
the old idiom, f->mode |= f->mode+1, was adapted from the idiom for
setting byte orientation, f->mode |= f->mode-1, but the adaptation was
incorrect. unless the stream was alreasdy set byte-oriented, this code
incremented f->mode each time it was executed, which would eventually
lead to overflow. it could be fixed by changing it to f->mode |= 1,
but upcoming changes will require slightly more work at the time of
wide orientation, so it makes sense to just call fwide. as an
optimization in the single-character functions, fwide is only called
if the stream is not already wide-oriented.
|
|
|
|
|
| |
this is undefined, but supported in our implementation of the normal
printf, so for consistency the wide variant should support it too.
|
| |
|
| |
|
|
|
|
|
|
| |
aside from being invalid, the early check only optimized the error
case, and likely pessimized the common case by separating the
two branches on isascii(c) at opposite ends of the function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
these functions were written to handle clearing eof status, but failed
to account for the __toread function's handling of eof. with this
patch applied, __toread still returns EOF when the file is in eof
status, so that read operations will fail, but it also sets up valid
buffer pointers for read mode, which are set to the end of the buffer
rather than the beginning in order to make the whole buffer available
to ungetc/ungetwc.
minor changes to __uflow were needed since it's now possible to have
non-zero buffer pointers while in eof status. as made, these changes
remove a 'fast path' bypassing the function call to __toread, which
could be reintroduced with slightly different logic, but since
ordinary files have a syscall in f->read, optimizing the code path
does not seem worthwhile.
the __stdio_read function is also updated not to zero the read buffer
pointers on eof/error. while not necessary for correctness, this
change avoids the overhead of calling __toread in ungetc after
reaching eof, and it also reduces code size and increases consistency
with the fmemopen read operation which does not zero the pointers.
|
| |
|