diff options
Diffstat (limited to 'REORG.TODO/manual/arith.texi')
-rw-r--r-- | REORG.TODO/manual/arith.texi | 3227 |
1 files changed, 3227 insertions, 0 deletions
diff --git a/REORG.TODO/manual/arith.texi b/REORG.TODO/manual/arith.texi new file mode 100644 index 0000000000..dec12a06ae --- /dev/null +++ b/REORG.TODO/manual/arith.texi @@ -0,0 +1,3227 @@ +@node Arithmetic, Date and Time, Mathematics, Top +@c %MENU% Low level arithmetic functions +@chapter Arithmetic Functions + +This chapter contains information about functions for doing basic +arithmetic operations, such as splitting a float into its integer and +fractional parts or retrieving the imaginary part of a complex value. +These functions are declared in the header files @file{math.h} and +@file{complex.h}. + +@menu +* Integers:: Basic integer types and concepts +* Integer Division:: Integer division with guaranteed rounding. +* Floating Point Numbers:: Basic concepts. IEEE 754. +* Floating Point Classes:: The five kinds of floating-point number. +* Floating Point Errors:: When something goes wrong in a calculation. +* Rounding:: Controlling how results are rounded. +* Control Functions:: Saving and restoring the FPU's state. +* Arithmetic Functions:: Fundamental operations provided by the library. +* Complex Numbers:: The types. Writing complex constants. +* Operations on Complex:: Projection, conjugation, decomposition. +* Parsing of Numbers:: Converting strings to numbers. +* Printing of Floats:: Converting floating-point numbers to strings. +* System V Number Conversion:: An archaic way to convert numbers to strings. +@end menu + +@node Integers +@section Integers +@cindex integer + +The C language defines several integer data types: integer, short integer, +long integer, and character, all in both signed and unsigned varieties. +The GNU C compiler extends the language to contain long long integers +as well. +@cindex signedness + +The C integer types were intended to allow code to be portable among +machines with different inherent data sizes (word sizes), so each type +may have different ranges on different machines. The problem with +this is that a program often needs to be written for a particular range +of integers, and sometimes must be written for a particular size of +storage, regardless of what machine the program runs on. + +To address this problem, @theglibc{} contains C type definitions +you can use to declare integers that meet your exact needs. Because the +@glibcadj{} header files are customized to a specific machine, your +program source code doesn't have to be. + +These @code{typedef}s are in @file{stdint.h}. +@pindex stdint.h + +If you require that an integer be represented in exactly N bits, use one +of the following types, with the obvious mapping to bit size and signedness: + +@itemize @bullet +@item int8_t +@item int16_t +@item int32_t +@item int64_t +@item uint8_t +@item uint16_t +@item uint32_t +@item uint64_t +@end itemize + +If your C compiler and target machine do not allow integers of a certain +size, the corresponding above type does not exist. + +If you don't need a specific storage size, but want the smallest data +structure with @emph{at least} N bits, use one of these: + +@itemize @bullet +@item int_least8_t +@item int_least16_t +@item int_least32_t +@item int_least64_t +@item uint_least8_t +@item uint_least16_t +@item uint_least32_t +@item uint_least64_t +@end itemize + +If you don't need a specific storage size, but want the data structure +that allows the fastest access while having at least N bits (and +among data structures with the same access speed, the smallest one), use +one of these: + +@itemize @bullet +@item int_fast8_t +@item int_fast16_t +@item int_fast32_t +@item int_fast64_t +@item uint_fast8_t +@item uint_fast16_t +@item uint_fast32_t +@item uint_fast64_t +@end itemize + +If you want an integer with the widest range possible on the platform on +which it is being used, use one of the following. If you use these, +you should write code that takes into account the variable size and range +of the integer. + +@itemize @bullet +@item intmax_t +@item uintmax_t +@end itemize + +@Theglibc{} also provides macros that tell you the maximum and +minimum possible values for each integer data type. The macro names +follow these examples: @code{INT32_MAX}, @code{UINT8_MAX}, +@code{INT_FAST32_MIN}, @code{INT_LEAST64_MIN}, @code{UINTMAX_MAX}, +@code{INTMAX_MAX}, @code{INTMAX_MIN}. Note that there are no macros for +unsigned integer minima. These are always zero. Similiarly, there +are macros such as @code{INTMAX_WIDTH} for the width of these types. +Those macros for integer type widths come from TS 18661-1:2014. +@cindex maximum possible integer +@cindex minimum possible integer + +There are similar macros for use with C's built in integer types which +should come with your C compiler. These are described in @ref{Data Type +Measurements}. + +Don't forget you can use the C @code{sizeof} function with any of these +data types to get the number of bytes of storage each uses. + + +@node Integer Division +@section Integer Division +@cindex integer division functions + +This section describes functions for performing integer division. These +functions are redundant when GNU CC is used, because in GNU C the +@samp{/} operator always rounds towards zero. But in other C +implementations, @samp{/} may round differently with negative arguments. +@code{div} and @code{ldiv} are useful because they specify how to round +the quotient: towards zero. The remainder has the same sign as the +numerator. + +These functions are specified to return a result @var{r} such that the value +@code{@var{r}.quot*@var{denominator} + @var{r}.rem} equals +@var{numerator}. + +@pindex stdlib.h +To use these facilities, you should include the header file +@file{stdlib.h} in your program. + +@comment stdlib.h +@comment ISO +@deftp {Data Type} div_t +This is a structure type used to hold the result returned by the @code{div} +function. It has the following members: + +@table @code +@item int quot +The quotient from the division. + +@item int rem +The remainder from the division. +@end table +@end deftp + +@comment stdlib.h +@comment ISO +@deftypefun div_t div (int @var{numerator}, int @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@c Functions in this section are pure, and thus safe. +The function @code{div} computes the quotient and remainder from +the division of @var{numerator} by @var{denominator}, returning the +result in a structure of type @code{div_t}. + +If the result cannot be represented (as in a division by zero), the +behavior is undefined. + +Here is an example, albeit not a very useful one. + +@smallexample +div_t result; +result = div (20, -6); +@end smallexample + +@noindent +Now @code{result.quot} is @code{-3} and @code{result.rem} is @code{2}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftp {Data Type} ldiv_t +This is a structure type used to hold the result returned by the @code{ldiv} +function. It has the following members: + +@table @code +@item long int quot +The quotient from the division. + +@item long int rem +The remainder from the division. +@end table + +(This is identical to @code{div_t} except that the components are of +type @code{long int} rather than @code{int}.) +@end deftp + +@comment stdlib.h +@comment ISO +@deftypefun ldiv_t ldiv (long int @var{numerator}, long int @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{ldiv} function is similar to @code{div}, except that the +arguments are of type @code{long int} and the result is returned as a +structure of type @code{ldiv_t}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftp {Data Type} lldiv_t +This is a structure type used to hold the result returned by the @code{lldiv} +function. It has the following members: + +@table @code +@item long long int quot +The quotient from the division. + +@item long long int rem +The remainder from the division. +@end table + +(This is identical to @code{div_t} except that the components are of +type @code{long long int} rather than @code{int}.) +@end deftp + +@comment stdlib.h +@comment ISO +@deftypefun lldiv_t lldiv (long long int @var{numerator}, long long int @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{lldiv} function is like the @code{div} function, but the +arguments are of type @code{long long int} and the result is returned as +a structure of type @code{lldiv_t}. + +The @code{lldiv} function was added in @w{ISO C99}. +@end deftypefun + +@comment inttypes.h +@comment ISO +@deftp {Data Type} imaxdiv_t +This is a structure type used to hold the result returned by the @code{imaxdiv} +function. It has the following members: + +@table @code +@item intmax_t quot +The quotient from the division. + +@item intmax_t rem +The remainder from the division. +@end table + +(This is identical to @code{div_t} except that the components are of +type @code{intmax_t} rather than @code{int}.) + +See @ref{Integers} for a description of the @code{intmax_t} type. + +@end deftp + +@comment inttypes.h +@comment ISO +@deftypefun imaxdiv_t imaxdiv (intmax_t @var{numerator}, intmax_t @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{imaxdiv} function is like the @code{div} function, but the +arguments are of type @code{intmax_t} and the result is returned as +a structure of type @code{imaxdiv_t}. + +See @ref{Integers} for a description of the @code{intmax_t} type. + +The @code{imaxdiv} function was added in @w{ISO C99}. +@end deftypefun + + +@node Floating Point Numbers +@section Floating Point Numbers +@cindex floating point +@cindex IEEE 754 +@cindex IEEE floating point + +Most computer hardware has support for two different kinds of numbers: +integers (@math{@dots{}-3, -2, -1, 0, 1, 2, 3@dots{}}) and +floating-point numbers. Floating-point numbers have three parts: the +@dfn{mantissa}, the @dfn{exponent}, and the @dfn{sign bit}. The real +number represented by a floating-point value is given by +@tex +$(s \mathrel? -1 \mathrel: 1) \cdot 2^e \cdot M$ +@end tex +@ifnottex +@math{(s ? -1 : 1) @mul{} 2^e @mul{} M} +@end ifnottex +where @math{s} is the sign bit, @math{e} the exponent, and @math{M} +the mantissa. @xref{Floating Point Concepts}, for details. (It is +possible to have a different @dfn{base} for the exponent, but all modern +hardware uses @math{2}.) + +Floating-point numbers can represent a finite subset of the real +numbers. While this subset is large enough for most purposes, it is +important to remember that the only reals that can be represented +exactly are rational numbers that have a terminating binary expansion +shorter than the width of the mantissa. Even simple fractions such as +@math{1/5} can only be approximated by floating point. + +Mathematical operations and functions frequently need to produce values +that are not representable. Often these values can be approximated +closely enough for practical purposes, but sometimes they can't. +Historically there was no way to tell when the results of a calculation +were inaccurate. Modern computers implement the @w{IEEE 754} standard +for numerical computations, which defines a framework for indicating to +the program when the results of calculation are not trustworthy. This +framework consists of a set of @dfn{exceptions} that indicate why a +result could not be represented, and the special values @dfn{infinity} +and @dfn{not a number} (NaN). + +@node Floating Point Classes +@section Floating-Point Number Classification Functions +@cindex floating-point classes +@cindex classes, floating-point +@pindex math.h + +@w{ISO C99} defines macros that let you determine what sort of +floating-point number a variable holds. + +@comment math.h +@comment ISO +@deftypefn {Macro} int fpclassify (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This is a generic macro which works on all floating-point types and +which returns a value of type @code{int}. The possible values are: + +@vtable @code +@item FP_NAN +The floating-point number @var{x} is ``Not a Number'' (@pxref{Infinity +and NaN}) +@item FP_INFINITE +The value of @var{x} is either plus or minus infinity (@pxref{Infinity +and NaN}) +@item FP_ZERO +The value of @var{x} is zero. In floating-point formats like @w{IEEE +754}, where zero can be signed, this value is also returned if +@var{x} is negative zero. +@item FP_SUBNORMAL +Numbers whose absolute value is too small to be represented in the +normal format are represented in an alternate, @dfn{denormalized} format +(@pxref{Floating Point Concepts}). This format is less precise but can +represent values closer to zero. @code{fpclassify} returns this value +for values of @var{x} in this alternate format. +@item FP_NORMAL +This value is returned for all other values of @var{x}. It indicates +that there is nothing special about the number. +@end vtable + +@end deftypefn + +@code{fpclassify} is most useful if more than one property of a number +must be tested. There are more specific macros which only test one +property at a time. Generally these macros execute faster than +@code{fpclassify}, since there is special hardware support for them. +You should therefore use the specific macros whenever possible. + +@comment math.h +@comment ISO +@deftypefn {Macro} int iscanonical (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +In some floating-point formats, some values have canonical (preferred) +and noncanonical encodings (for IEEE interchange binary formats, all +encodings are canonical). This macro returns a nonzero value if +@var{x} has a canonical encoding. It is from TS 18661-1:2014. + +Note that some formats have multiple encodings of a value which are +all equally canonical; @code{iscanonical} returns a nonzero value for +all such encodings. Also, formats may have encodings that do not +correspond to any valid value of the type. In ISO C terms these are +@dfn{trap representations}; in @theglibc{}, @code{iscanonical} returns +zero for such encodings. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn {Macro} int isfinite (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro returns a nonzero value if @var{x} is finite: not plus or +minus infinity, and not NaN. It is equivalent to + +@smallexample +(fpclassify (x) != FP_NAN && fpclassify (x) != FP_INFINITE) +@end smallexample + +@code{isfinite} is implemented as a macro which accepts any +floating-point type. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn {Macro} int isnormal (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro returns a nonzero value if @var{x} is finite and normalized. +It is equivalent to + +@smallexample +(fpclassify (x) == FP_NORMAL) +@end smallexample +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn {Macro} int isnan (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro returns a nonzero value if @var{x} is NaN. It is equivalent +to + +@smallexample +(fpclassify (x) == FP_NAN) +@end smallexample +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn {Macro} int issignaling (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro returns a nonzero value if @var{x} is a signaling NaN +(sNaN). It is from TS 18661-1:2014. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn {Macro} int issubnormal (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro returns a nonzero value if @var{x} is subnormal. It is +from TS 18661-1:2014. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn {Macro} int iszero (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro returns a nonzero value if @var{x} is zero. It is from TS +18661-1:2014. +@end deftypefn + +Another set of floating-point classification functions was provided by +BSD. @Theglibc{} also supports these functions; however, we +recommend that you use the ISO C99 macros in new code. Those are standard +and will be available more widely. Also, since they are macros, you do +not have to worry about the type of their argument. + +@comment math.h +@comment BSD +@deftypefun int isinf (double @var{x}) +@comment math.h +@comment BSD +@deftypefunx int isinff (float @var{x}) +@comment math.h +@comment BSD +@deftypefunx int isinfl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function returns @code{-1} if @var{x} represents negative infinity, +@code{1} if @var{x} represents positive infinity, and @code{0} otherwise. +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun int isnan (double @var{x}) +@comment math.h +@comment BSD +@deftypefunx int isnanf (float @var{x}) +@comment math.h +@comment BSD +@deftypefunx int isnanl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function returns a nonzero value if @var{x} is a ``not a number'' +value, and zero otherwise. + +@strong{NB:} The @code{isnan} macro defined by @w{ISO C99} overrides +the BSD function. This is normally not a problem, because the two +routines behave identically. However, if you really need to get the BSD +function for some reason, you can write + +@smallexample +(isnan) (x) +@end smallexample +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun int finite (double @var{x}) +@comment math.h +@comment BSD +@deftypefunx int finitef (float @var{x}) +@comment math.h +@comment BSD +@deftypefunx int finitel (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function returns a nonzero value if @var{x} is finite or a ``not a +number'' value, and zero otherwise. +@end deftypefun + +@strong{Portability Note:} The functions listed in this section are BSD +extensions. + + +@node Floating Point Errors +@section Errors in Floating-Point Calculations + +@menu +* FP Exceptions:: IEEE 754 math exceptions and how to detect them. +* Infinity and NaN:: Special values returned by calculations. +* Status bit operations:: Checking for exceptions after the fact. +* Math Error Reporting:: How the math functions report errors. +@end menu + +@node FP Exceptions +@subsection FP Exceptions +@cindex exception +@cindex signal +@cindex zero divide +@cindex division by zero +@cindex inexact exception +@cindex invalid exception +@cindex overflow exception +@cindex underflow exception + +The @w{IEEE 754} standard defines five @dfn{exceptions} that can occur +during a calculation. Each corresponds to a particular sort of error, +such as overflow. + +When exceptions occur (when exceptions are @dfn{raised}, in the language +of the standard), one of two things can happen. By default the +exception is simply noted in the floating-point @dfn{status word}, and +the program continues as if nothing had happened. The operation +produces a default value, which depends on the exception (see the table +below). Your program can check the status word to find out which +exceptions happened. + +Alternatively, you can enable @dfn{traps} for exceptions. In that case, +when an exception is raised, your program will receive the @code{SIGFPE} +signal. The default action for this signal is to terminate the +program. @xref{Signal Handling}, for how you can change the effect of +the signal. + +@findex matherr +In the System V math library, the user-defined function @code{matherr} +is called when certain exceptions occur inside math library functions. +However, the Unix98 standard deprecates this interface. We support it +for historical compatibility, but recommend that you do not use it in +new programs. When this interface is used, exceptions may not be +raised. + +@noindent +The exceptions defined in @w{IEEE 754} are: + +@table @samp +@item Invalid Operation +This exception is raised if the given operands are invalid for the +operation to be performed. Examples are +(see @w{IEEE 754}, @w{section 7}): +@enumerate +@item +Addition or subtraction: @math{@infinity{} - @infinity{}}. (But +@math{@infinity{} + @infinity{} = @infinity{}}). +@item +Multiplication: @math{0 @mul{} @infinity{}}. +@item +Division: @math{0/0} or @math{@infinity{}/@infinity{}}. +@item +Remainder: @math{x} REM @math{y}, where @math{y} is zero or @math{x} is +infinite. +@item +Square root if the operand is less than zero. More generally, any +mathematical function evaluated outside its domain produces this +exception. +@item +Conversion of a floating-point number to an integer or decimal +string, when the number cannot be represented in the target format (due +to overflow, infinity, or NaN). +@item +Conversion of an unrecognizable input string. +@item +Comparison via predicates involving @math{<} or @math{>}, when one or +other of the operands is NaN. You can prevent this exception by using +the unordered comparison functions instead; see @ref{FP Comparison Functions}. +@end enumerate + +If the exception does not trap, the result of the operation is NaN. + +@item Division by Zero +This exception is raised when a finite nonzero number is divided +by zero. If no trap occurs the result is either @math{+@infinity{}} or +@math{-@infinity{}}, depending on the signs of the operands. + +@item Overflow +This exception is raised whenever the result cannot be represented +as a finite value in the precision format of the destination. If no trap +occurs the result depends on the sign of the intermediate result and the +current rounding mode (@w{IEEE 754}, @w{section 7.3}): +@enumerate +@item +Round to nearest carries all overflows to @math{@infinity{}} +with the sign of the intermediate result. +@item +Round toward @math{0} carries all overflows to the largest representable +finite number with the sign of the intermediate result. +@item +Round toward @math{-@infinity{}} carries positive overflows to the +largest representable finite number and negative overflows to +@math{-@infinity{}}. + +@item +Round toward @math{@infinity{}} carries negative overflows to the +most negative representable finite number and positive overflows +to @math{@infinity{}}. +@end enumerate + +Whenever the overflow exception is raised, the inexact exception is also +raised. + +@item Underflow +The underflow exception is raised when an intermediate result is too +small to be calculated accurately, or if the operation's result rounded +to the destination precision is too small to be normalized. + +When no trap is installed for the underflow exception, underflow is +signaled (via the underflow flag) only when both tininess and loss of +accuracy have been detected. If no trap handler is installed the +operation continues with an imprecise small value, or zero if the +destination precision cannot hold the small exact result. + +@item Inexact +This exception is signalled if a rounded result is not exact (such as +when calculating the square root of two) or a result overflows without +an overflow trap. +@end table + +@node Infinity and NaN +@subsection Infinity and NaN +@cindex infinity +@cindex not a number +@cindex NaN + +@w{IEEE 754} floating point numbers can represent positive or negative +infinity, and @dfn{NaN} (not a number). These three values arise from +calculations whose result is undefined or cannot be represented +accurately. You can also deliberately set a floating-point variable to +any of them, which is sometimes useful. Some examples of calculations +that produce infinity or NaN: + +@ifnottex +@smallexample +@math{1/0 = @infinity{}} +@math{log (0) = -@infinity{}} +@math{sqrt (-1) = NaN} +@end smallexample +@end ifnottex +@tex +$${1\over0} = \infty$$ +$$\log 0 = -\infty$$ +$$\sqrt{-1} = \hbox{NaN}$$ +@end tex + +When a calculation produces any of these values, an exception also +occurs; see @ref{FP Exceptions}. + +The basic operations and math functions all accept infinity and NaN and +produce sensible output. Infinities propagate through calculations as +one would expect: for example, @math{2 + @infinity{} = @infinity{}}, +@math{4/@infinity{} = 0}, atan @math{(@infinity{}) = @pi{}/2}. NaN, on +the other hand, infects any calculation that involves it. Unless the +calculation would produce the same result no matter what real value +replaced NaN, the result is NaN. + +In comparison operations, positive infinity is larger than all values +except itself and NaN, and negative infinity is smaller than all values +except itself and NaN. NaN is @dfn{unordered}: it is not equal to, +greater than, or less than anything, @emph{including itself}. @code{x == +x} is false if the value of @code{x} is NaN. You can use this to test +whether a value is NaN or not, but the recommended way to test for NaN +is with the @code{isnan} function (@pxref{Floating Point Classes}). In +addition, @code{<}, @code{>}, @code{<=}, and @code{>=} will raise an +exception when applied to NaNs. + +@file{math.h} defines macros that allow you to explicitly set a variable +to infinity or NaN. + +@comment math.h +@comment ISO +@deftypevr Macro float INFINITY +An expression representing positive infinity. It is equal to the value +produced by mathematical operations like @code{1.0 / 0.0}. +@code{-INFINITY} represents negative infinity. + +You can test whether a floating-point value is infinite by comparing it +to this macro. However, this is not recommended; you should use the +@code{isfinite} macro instead. @xref{Floating Point Classes}. + +This macro was introduced in the @w{ISO C99} standard. +@end deftypevr + +@comment math.h +@comment GNU +@deftypevr Macro float NAN +An expression representing a value which is ``not a number''. This +macro is a GNU extension, available only on machines that support the +``not a number'' value---that is to say, on all machines that support +IEEE floating point. + +You can use @samp{#ifdef NAN} to test whether the machine supports +NaN. (Of course, you must arrange for GNU extensions to be visible, +such as by defining @code{_GNU_SOURCE}, and then you must include +@file{math.h}.) +@end deftypevr + +@comment math.h +@comment ISO +@deftypevr Macro float SNANF +@deftypevrx Macro double SNAN +@deftypevrx Macro {long double} SNANL +These macros, defined by TS 18661-1:2014, are constant expressions for +signaling NaNs. +@end deftypevr + +@comment fenv.h +@comment ISO +@deftypevr Macro int FE_SNANS_ALWAYS_SIGNAL +This macro, defined by TS 18661-1:2014, is defined to @code{1} in +@file{fenv.h} to indicate that functions and operations with signaling +NaN inputs and floating-point results always raise the invalid +exception and return a quiet NaN, even in cases (such as @code{fmax}, +@code{hypot} and @code{pow}) where a quiet NaN input can produce a +non-NaN result. Because some compiler optimizations may not handle +signaling NaNs correctly, this macro is only defined if compiler +support for signaling NaNs is enabled. That support can be enabled +with the GCC option @option{-fsignaling-nans}. +@end deftypevr + +@w{IEEE 754} also allows for another unusual value: negative zero. This +value is produced when you divide a positive number by negative +infinity, or when a negative result is smaller than the limits of +representation. + +@node Status bit operations +@subsection Examining the FPU status word + +@w{ISO C99} defines functions to query and manipulate the +floating-point status word. You can use these functions to check for +untrapped exceptions when it's convenient, rather than worrying about +them in the middle of a calculation. + +These constants represent the various @w{IEEE 754} exceptions. Not all +FPUs report all the different exceptions. Each constant is defined if +and only if the FPU you are compiling for supports that exception, so +you can test for FPU support with @samp{#ifdef}. They are defined in +@file{fenv.h}. + +@vtable @code +@comment fenv.h +@comment ISO +@item FE_INEXACT + The inexact exception. +@comment fenv.h +@comment ISO +@item FE_DIVBYZERO + The divide by zero exception. +@comment fenv.h +@comment ISO +@item FE_UNDERFLOW + The underflow exception. +@comment fenv.h +@comment ISO +@item FE_OVERFLOW + The overflow exception. +@comment fenv.h +@comment ISO +@item FE_INVALID + The invalid exception. +@end vtable + +The macro @code{FE_ALL_EXCEPT} is the bitwise OR of all exception macros +which are supported by the FP implementation. + +These functions allow you to clear exception flags, test for exceptions, +and save and restore the set of exceptions flagged. + +@comment fenv.h +@comment ISO +@deftypefun int feclearexcept (int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{@assposix{}}@acsafe{@acsposix{}}} +@c The other functions in this section that modify FP status register +@c mostly do so with non-atomic load-modify-store sequences, but since +@c the register is thread-specific, this should be fine, and safe for +@c cancellation. As long as the FP environment is restored before the +@c signal handler returns control to the interrupted thread (like any +@c kernel should do), the functions are also safe for use in signal +@c handlers. +This function clears all of the supported exception flags indicated by +@var{excepts}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int feraiseexcept (int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function raises the supported exceptions indicated by +@var{excepts}. If more than one exception bit in @var{excepts} is set +the order in which the exceptions are raised is undefined except that +overflow (@code{FE_OVERFLOW}) or underflow (@code{FE_UNDERFLOW}) are +raised before inexact (@code{FE_INEXACT}). Whether for overflow or +underflow the inexact exception is also raised is also implementation +dependent. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int fesetexcept (int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function sets the supported exception flags indicated by +@var{excepts}, like @code{feraiseexcept}, but without causing enabled +traps to be taken. @code{fesetexcept} is from TS 18661-1:2014. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int fetestexcept (int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Test whether the exception flags indicated by the parameter @var{except} +are currently set. If any of them are, a nonzero value is returned +which specifies which exceptions are set. Otherwise the result is zero. +@end deftypefun + +To understand these functions, imagine that the status word is an +integer variable named @var{status}. @code{feclearexcept} is then +equivalent to @samp{status &= ~excepts} and @code{fetestexcept} is +equivalent to @samp{(status & excepts)}. The actual implementation may +be very different, of course. + +Exception flags are only cleared when the program explicitly requests it, +by calling @code{feclearexcept}. If you want to check for exceptions +from a set of calculations, you should clear all the flags first. Here +is a simple example of the way to use @code{fetestexcept}: + +@smallexample +@{ + double f; + int raised; + feclearexcept (FE_ALL_EXCEPT); + f = compute (); + raised = fetestexcept (FE_OVERFLOW | FE_INVALID); + if (raised & FE_OVERFLOW) @{ /* @dots{} */ @} + if (raised & FE_INVALID) @{ /* @dots{} */ @} + /* @dots{} */ +@} +@end smallexample + +You cannot explicitly set bits in the status word. You can, however, +save the entire status word and restore it later. This is done with the +following functions: + +@comment fenv.h +@comment ISO +@deftypefun int fegetexceptflag (fexcept_t *@var{flagp}, int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function stores in the variable pointed to by @var{flagp} an +implementation-defined value representing the current setting of the +exception flags indicated by @var{excepts}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int fesetexceptflag (const fexcept_t *@var{flagp}, int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function restores the flags for the exceptions indicated by +@var{excepts} to the values stored in the variable pointed to by +@var{flagp}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +Note that the value stored in @code{fexcept_t} bears no resemblance to +the bit mask returned by @code{fetestexcept}. The type may not even be +an integer. Do not attempt to modify an @code{fexcept_t} variable. + +@comment fenv.h +@comment ISO +@deftypefun int fetestexceptflag (const fexcept_t *@var{flagp}, int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Test whether the exception flags indicated by the parameter +@var{excepts} are set in the variable pointed to by @var{flagp}. If +any of them are, a nonzero value is returned which specifies which +exceptions are set. Otherwise the result is zero. +@code{fetestexceptflag} is from TS 18661-1:2014. +@end deftypefun + +@node Math Error Reporting +@subsection Error Reporting by Mathematical Functions +@cindex errors, mathematical +@cindex domain error +@cindex range error + +Many of the math functions are defined only over a subset of the real or +complex numbers. Even if they are mathematically defined, their result +may be larger or smaller than the range representable by their return +type without loss of accuracy. These are known as @dfn{domain errors}, +@dfn{overflows}, and +@dfn{underflows}, respectively. Math functions do several things when +one of these errors occurs. In this manual we will refer to the +complete response as @dfn{signalling} a domain error, overflow, or +underflow. + +When a math function suffers a domain error, it raises the invalid +exception and returns NaN. It also sets @var{errno} to @code{EDOM}; +this is for compatibility with old systems that do not support @w{IEEE +754} exception handling. Likewise, when overflow occurs, math +functions raise the overflow exception and, in the default rounding +mode, return @math{@infinity{}} or @math{-@infinity{}} as appropriate +(in other rounding modes, the largest finite value of the appropriate +sign is returned when appropriate for that rounding mode). They also +set @var{errno} to @code{ERANGE} if returning @math{@infinity{}} or +@math{-@infinity{}}; @var{errno} may or may not be set to +@code{ERANGE} when a finite value is returned on overflow. When +underflow occurs, the underflow exception is raised, and zero +(appropriately signed) or a subnormal value, as appropriate for the +mathematical result of the function and the rounding mode, is +returned. @var{errno} may be set to @code{ERANGE}, but this is not +guaranteed; it is intended that @theglibc{} should set it when the +underflow is to an appropriately signed zero, but not necessarily for +other underflows. + +When a math function has an argument that is a signaling NaN, +@theglibc{} does not consider this a domain error, so @code{errno} is +unchanged, but the invalid exception is still raised (except for a few +functions that are specified to handle signaling NaNs differently). + +Some of the math functions are defined mathematically to result in a +complex value over parts of their domains. The most familiar example of +this is taking the square root of a negative number. The complex math +functions, such as @code{csqrt}, will return the appropriate complex value +in this case. The real-valued functions, such as @code{sqrt}, will +signal a domain error. + +Some older hardware does not support infinities. On that hardware, +overflows instead return a particular very large number (usually the +largest representable number). @file{math.h} defines macros you can use +to test for overflow on both old and new hardware. + +@comment math.h +@comment ISO +@deftypevr Macro double HUGE_VAL +@comment math.h +@comment ISO +@deftypevrx Macro float HUGE_VALF +@comment math.h +@comment ISO +@deftypevrx Macro {long double} HUGE_VALL +An expression representing a particular very large number. On machines +that use @w{IEEE 754} floating point format, @code{HUGE_VAL} is infinity. +On other machines, it's typically the largest positive number that can +be represented. + +Mathematical functions return the appropriately typed version of +@code{HUGE_VAL} or @code{@minus{}HUGE_VAL} when the result is too large +to be represented. +@end deftypevr + +@node Rounding +@section Rounding Modes + +Floating-point calculations are carried out internally with extra +precision, and then rounded to fit into the destination type. This +ensures that results are as precise as the input data. @w{IEEE 754} +defines four possible rounding modes: + +@table @asis +@item Round to nearest. +This is the default mode. It should be used unless there is a specific +need for one of the others. In this mode results are rounded to the +nearest representable value. If the result is midway between two +representable values, the even representable is chosen. @dfn{Even} here +means the lowest-order bit is zero. This rounding mode prevents +statistical bias and guarantees numeric stability: round-off errors in a +lengthy calculation will remain smaller than half of @code{FLT_EPSILON}. + +@c @item Round toward @math{+@infinity{}} +@item Round toward plus Infinity. +All results are rounded to the smallest representable value +which is greater than the result. + +@c @item Round toward @math{-@infinity{}} +@item Round toward minus Infinity. +All results are rounded to the largest representable value which is less +than the result. + +@item Round toward zero. +All results are rounded to the largest representable value whose +magnitude is less than that of the result. In other words, if the +result is negative it is rounded up; if it is positive, it is rounded +down. +@end table + +@noindent +@file{fenv.h} defines constants which you can use to refer to the +various rounding modes. Each one will be defined if and only if the FPU +supports the corresponding rounding mode. + +@vtable @code +@comment fenv.h +@comment ISO +@item FE_TONEAREST +Round to nearest. + +@comment fenv.h +@comment ISO +@item FE_UPWARD +Round toward @math{+@infinity{}}. + +@comment fenv.h +@comment ISO +@item FE_DOWNWARD +Round toward @math{-@infinity{}}. + +@comment fenv.h +@comment ISO +@item FE_TOWARDZERO +Round toward zero. +@end vtable + +Underflow is an unusual case. Normally, @w{IEEE 754} floating point +numbers are always normalized (@pxref{Floating Point Concepts}). +Numbers smaller than @math{2^r} (where @math{r} is the minimum exponent, +@code{FLT_MIN_RADIX-1} for @var{float}) cannot be represented as +normalized numbers. Rounding all such numbers to zero or @math{2^r} +would cause some algorithms to fail at 0. Therefore, they are left in +denormalized form. That produces loss of precision, since some bits of +the mantissa are stolen to indicate the decimal point. + +If a result is too small to be represented as a denormalized number, it +is rounded to zero. However, the sign of the result is preserved; if +the calculation was negative, the result is @dfn{negative zero}. +Negative zero can also result from some operations on infinity, such as +@math{4/-@infinity{}}. + +At any time, one of the above four rounding modes is selected. You can +find out which one with this function: + +@comment fenv.h +@comment ISO +@deftypefun int fegetround (void) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Returns the currently selected rounding mode, represented by one of the +values of the defined rounding mode macros. +@end deftypefun + +@noindent +To change the rounding mode, use this function: + +@comment fenv.h +@comment ISO +@deftypefun int fesetround (int @var{round}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Changes the currently selected rounding mode to @var{round}. If +@var{round} does not correspond to one of the supported rounding modes +nothing is changed. @code{fesetround} returns zero if it changed the +rounding mode, or a nonzero value if the mode is not supported. +@end deftypefun + +You should avoid changing the rounding mode if possible. It can be an +expensive operation; also, some hardware requires you to compile your +program differently for it to work. The resulting code may run slower. +See your compiler documentation for details. +@c This section used to claim that functions existed to round one number +@c in a specific fashion. I can't find any functions in the library +@c that do that. -zw + +@node Control Functions +@section Floating-Point Control Functions + +@w{IEEE 754} floating-point implementations allow the programmer to +decide whether traps will occur for each of the exceptions, by setting +bits in the @dfn{control word}. In C, traps result in the program +receiving the @code{SIGFPE} signal; see @ref{Signal Handling}. + +@strong{NB:} @w{IEEE 754} says that trap handlers are given details of +the exceptional situation, and can set the result value. C signals do +not provide any mechanism to pass this information back and forth. +Trapping exceptions in C is therefore not very useful. + +It is sometimes necessary to save the state of the floating-point unit +while you perform some calculation. The library provides functions +which save and restore the exception flags, the set of exceptions that +generate traps, and the rounding mode. This information is known as the +@dfn{floating-point environment}. + +The functions to save and restore the floating-point environment all use +a variable of type @code{fenv_t} to store information. This type is +defined in @file{fenv.h}. Its size and contents are +implementation-defined. You should not attempt to manipulate a variable +of this type directly. + +To save the state of the FPU, use one of these functions: + +@comment fenv.h +@comment ISO +@deftypefun int fegetenv (fenv_t *@var{envp}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Store the floating-point environment in the variable pointed to by +@var{envp}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int feholdexcept (fenv_t *@var{envp}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Store the current floating-point environment in the object pointed to by +@var{envp}. Then clear all exception flags, and set the FPU to trap no +exceptions. Not all FPUs support trapping no exceptions; if +@code{feholdexcept} cannot set this mode, it returns nonzero value. If it +succeeds, it returns zero. +@end deftypefun + +The functions which restore the floating-point environment can take these +kinds of arguments: + +@itemize @bullet +@item +Pointers to @code{fenv_t} objects, which were initialized previously by a +call to @code{fegetenv} or @code{feholdexcept}. +@item +@vindex FE_DFL_ENV +The special macro @code{FE_DFL_ENV} which represents the floating-point +environment as it was available at program start. +@item +Implementation defined macros with names starting with @code{FE_} and +having type @code{fenv_t *}. + +@vindex FE_NOMASK_ENV +If possible, @theglibc{} defines a macro @code{FE_NOMASK_ENV} +which represents an environment where every exception raised causes a +trap to occur. You can test for this macro using @code{#ifdef}. It is +only defined if @code{_GNU_SOURCE} is defined. + +Some platforms might define other predefined environments. +@end itemize + +@noindent +To set the floating-point environment, you can use either of these +functions: + +@comment fenv.h +@comment ISO +@deftypefun int fesetenv (const fenv_t *@var{envp}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Set the floating-point environment to that described by @var{envp}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int feupdateenv (const fenv_t *@var{envp}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Like @code{fesetenv}, this function sets the floating-point environment +to that described by @var{envp}. However, if any exceptions were +flagged in the status word before @code{feupdateenv} was called, they +remain flagged after the call. In other words, after @code{feupdateenv} +is called, the status word is the bitwise OR of the previous status word +and the one saved in @var{envp}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@noindent +TS 18661-1:2014 defines additional functions to save and restore +floating-point control modes (such as the rounding mode and whether +traps are enabled) while leaving other status (such as raised flags) +unchanged. + +@vindex FE_DFL_MODE +The special macro @code{FE_DFL_MODE} may be passed to +@code{fesetmode}. It represents the floating-point control modes at +program start. + +@comment fenv.h +@comment ISO +@deftypefun int fegetmode (femode_t *@var{modep}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Store the floating-point control modes in the variable pointed to by +@var{modep}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@comment fenv.h +@comment ISO +@deftypefun int fesetmode (const femode_t *@var{modep}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +Set the floating-point control modes to those described by +@var{modep}. + +The function returns zero in case the operation was successful, a +non-zero value otherwise. +@end deftypefun + +@noindent +To control for individual exceptions if raising them causes a trap to +occur, you can use the following two functions. + +@strong{Portability Note:} These functions are all GNU extensions. + +@comment fenv.h +@comment GNU +@deftypefun int feenableexcept (int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function enables traps for each of the exceptions as indicated by +the parameter @var{excepts}. The individual exceptions are described in +@ref{Status bit operations}. Only the specified exceptions are +enabled, the status of the other exceptions is not changed. + +The function returns the previous enabled exceptions in case the +operation was successful, @code{-1} otherwise. +@end deftypefun + +@comment fenv.h +@comment GNU +@deftypefun int fedisableexcept (int @var{excepts}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function disables traps for each of the exceptions as indicated by +the parameter @var{excepts}. The individual exceptions are described in +@ref{Status bit operations}. Only the specified exceptions are +disabled, the status of the other exceptions is not changed. + +The function returns the previous enabled exceptions in case the +operation was successful, @code{-1} otherwise. +@end deftypefun + +@comment fenv.h +@comment GNU +@deftypefun int fegetexcept (void) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The function returns a bitmask of all currently enabled exceptions. It +returns @code{-1} in case of failure. +@end deftypefun + +@node Arithmetic Functions +@section Arithmetic Functions + +The C library provides functions to do basic operations on +floating-point numbers. These include absolute value, maximum and minimum, +normalization, bit twiddling, rounding, and a few others. + +@menu +* Absolute Value:: Absolute values of integers and floats. +* Normalization Functions:: Extracting exponents and putting them back. +* Rounding Functions:: Rounding floats to integers. +* Remainder Functions:: Remainders on division, precisely defined. +* FP Bit Twiddling:: Sign bit adjustment. Adding epsilon. +* FP Comparison Functions:: Comparisons without risk of exceptions. +* Misc FP Arithmetic:: Max, min, positive difference, multiply-add. +@end menu + +@node Absolute Value +@subsection Absolute Value +@cindex absolute value functions + +These functions are provided for obtaining the @dfn{absolute value} (or +@dfn{magnitude}) of a number. The absolute value of a real number +@var{x} is @var{x} if @var{x} is positive, @minus{}@var{x} if @var{x} is +negative. For a complex number @var{z}, whose real part is @var{x} and +whose imaginary part is @var{y}, the absolute value is @w{@code{sqrt +(@var{x}*@var{x} + @var{y}*@var{y})}}. + +@pindex math.h +@pindex stdlib.h +Prototypes for @code{abs}, @code{labs} and @code{llabs} are in @file{stdlib.h}; +@code{imaxabs} is declared in @file{inttypes.h}; +@code{fabs}, @code{fabsf} and @code{fabsl} are declared in @file{math.h}. +@code{cabs}, @code{cabsf} and @code{cabsl} are declared in @file{complex.h}. + +@comment stdlib.h +@comment ISO +@deftypefun int abs (int @var{number}) +@comment stdlib.h +@comment ISO +@deftypefunx {long int} labs (long int @var{number}) +@comment stdlib.h +@comment ISO +@deftypefunx {long long int} llabs (long long int @var{number}) +@comment inttypes.h +@comment ISO +@deftypefunx intmax_t imaxabs (intmax_t @var{number}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the absolute value of @var{number}. + +Most computers use a two's complement integer representation, in which +the absolute value of @code{INT_MIN} (the smallest possible @code{int}) +cannot be represented; thus, @w{@code{abs (INT_MIN)}} is not defined. + +@code{llabs} and @code{imaxdiv} are new to @w{ISO C99}. + +See @ref{Integers} for a description of the @code{intmax_t} type. + +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double fabs (double @var{number}) +@comment math.h +@comment ISO +@deftypefunx float fabsf (float @var{number}) +@comment math.h +@comment ISO +@deftypefunx {long double} fabsl (long double @var{number}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function returns the absolute value of the floating-point number +@var{number}. +@end deftypefun + +@comment complex.h +@comment ISO +@deftypefun double cabs (complex double @var{z}) +@comment complex.h +@comment ISO +@deftypefunx float cabsf (complex float @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {long double} cabsl (complex long double @var{z}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the absolute value of the complex number @var{z} +(@pxref{Complex Numbers}). The absolute value of a complex number is: + +@smallexample +sqrt (creal (@var{z}) * creal (@var{z}) + cimag (@var{z}) * cimag (@var{z})) +@end smallexample + +This function should always be used instead of the direct formula +because it takes special care to avoid losing precision. It may also +take advantage of hardware support for this operation. See @code{hypot} +in @ref{Exponents and Logarithms}. +@end deftypefun + +@node Normalization Functions +@subsection Normalization Functions +@cindex normalization functions (floating-point) + +The functions described in this section are primarily provided as a way +to efficiently perform certain low-level manipulations on floating point +numbers that are represented internally using a binary radix; +see @ref{Floating Point Concepts}. These functions are required to +have equivalent behavior even if the representation does not use a radix +of 2, but of course they are unlikely to be particularly efficient in +those cases. + +@pindex math.h +All these functions are declared in @file{math.h}. + +@comment math.h +@comment ISO +@deftypefun double frexp (double @var{value}, int *@var{exponent}) +@comment math.h +@comment ISO +@deftypefunx float frexpf (float @var{value}, int *@var{exponent}) +@comment math.h +@comment ISO +@deftypefunx {long double} frexpl (long double @var{value}, int *@var{exponent}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are used to split the number @var{value} +into a normalized fraction and an exponent. + +If the argument @var{value} is not zero, the return value is @var{value} +times a power of two, and its magnitude is always in the range 1/2 +(inclusive) to 1 (exclusive). The corresponding exponent is stored in +@code{*@var{exponent}}; the return value multiplied by 2 raised to this +exponent equals the original number @var{value}. + +For example, @code{frexp (12.8, &exponent)} returns @code{0.8} and +stores @code{4} in @code{exponent}. + +If @var{value} is zero, then the return value is zero and +zero is stored in @code{*@var{exponent}}. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double ldexp (double @var{value}, int @var{exponent}) +@comment math.h +@comment ISO +@deftypefunx float ldexpf (float @var{value}, int @var{exponent}) +@comment math.h +@comment ISO +@deftypefunx {long double} ldexpl (long double @var{value}, int @var{exponent}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the result of multiplying the floating-point +number @var{value} by 2 raised to the power @var{exponent}. (It can +be used to reassemble floating-point numbers that were taken apart +by @code{frexp}.) + +For example, @code{ldexp (0.8, 4)} returns @code{12.8}. +@end deftypefun + +The following functions, which come from BSD, provide facilities +equivalent to those of @code{ldexp} and @code{frexp}. See also the +@w{ISO C} function @code{logb} which originally also appeared in BSD. + +@comment math.h +@comment BSD +@deftypefun double scalb (double @var{value}, double @var{exponent}) +@comment math.h +@comment BSD +@deftypefunx float scalbf (float @var{value}, float @var{exponent}) +@comment math.h +@comment BSD +@deftypefunx {long double} scalbl (long double @var{value}, long double @var{exponent}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{scalb} function is the BSD name for @code{ldexp}. +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun double scalbn (double @var{x}, int @var{n}) +@comment math.h +@comment BSD +@deftypefunx float scalbnf (float @var{x}, int @var{n}) +@comment math.h +@comment BSD +@deftypefunx {long double} scalbnl (long double @var{x}, int @var{n}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@code{scalbn} is identical to @code{scalb}, except that the exponent +@var{n} is an @code{int} instead of a floating-point number. +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun double scalbln (double @var{x}, long int @var{n}) +@comment math.h +@comment BSD +@deftypefunx float scalblnf (float @var{x}, long int @var{n}) +@comment math.h +@comment BSD +@deftypefunx {long double} scalblnl (long double @var{x}, long int @var{n}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@code{scalbln} is identical to @code{scalb}, except that the exponent +@var{n} is a @code{long int} instead of a floating-point number. +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun double significand (double @var{x}) +@comment math.h +@comment BSD +@deftypefunx float significandf (float @var{x}) +@comment math.h +@comment BSD +@deftypefunx {long double} significandl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@code{significand} returns the mantissa of @var{x} scaled to the range +@math{[1, 2)}. +It is equivalent to @w{@code{scalb (@var{x}, (double) -ilogb (@var{x}))}}. + +This function exists mainly for use in certain standardized tests +of @w{IEEE 754} conformance. +@end deftypefun + +@node Rounding Functions +@subsection Rounding Functions +@cindex converting floats to integers + +@pindex math.h +The functions listed here perform operations such as rounding and +truncation of floating-point values. Some of these functions convert +floating point numbers to integer values. They are all declared in +@file{math.h}. + +You can also convert floating-point numbers to integers simply by +casting them to @code{int}. This discards the fractional part, +effectively rounding towards zero. However, this only works if the +result can actually be represented as an @code{int}---for very large +numbers, this is impossible. The functions listed here return the +result as a @code{double} instead to get around this problem. + +The @code{fromfp} functions use the following macros, from TS +18661-1:2014, to specify the direction of rounding. These correspond +to the rounding directions defined in IEEE 754-2008. + +@vtable @code +@comment math.h +@comment ISO +@item FP_INT_UPWARD +Round toward @math{+@infinity{}}. + +@comment math.h +@comment ISO +@item FP_INT_DOWNWARD +Round toward @math{-@infinity{}}. + +@comment math.h +@comment ISO +@item FP_INT_TOWARDZERO +Round toward zero. + +@comment math.h +@comment ISO +@item FP_INT_TONEARESTFROMZERO +Round to nearest, ties round away from zero. + +@comment math.h +@comment ISO +@item FP_INT_TONEAREST +Round to nearest, ties round to even. +@end vtable + +@comment math.h +@comment ISO +@deftypefun double ceil (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float ceilf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} ceill (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions round @var{x} upwards to the nearest integer, +returning that value as a @code{double}. Thus, @code{ceil (1.5)} +is @code{2.0}. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double floor (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float floorf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} floorl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions round @var{x} downwards to the nearest +integer, returning that value as a @code{double}. Thus, @code{floor +(1.5)} is @code{1.0} and @code{floor (-1.5)} is @code{-2.0}. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double trunc (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float truncf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} truncl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{trunc} functions round @var{x} towards zero to the nearest +integer (returned in floating-point format). Thus, @code{trunc (1.5)} +is @code{1.0} and @code{trunc (-1.5)} is @code{-1.0}. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double rint (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float rintf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} rintl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions round @var{x} to an integer value according to the +current rounding mode. @xref{Floating Point Parameters}, for +information about the various rounding modes. The default +rounding mode is to round to the nearest integer; some machines +support other modes, but round-to-nearest is always used unless +you explicitly select another. + +If @var{x} was not initially an integer, these functions raise the +inexact exception. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double nearbyint (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float nearbyintf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} nearbyintl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the same value as the @code{rint} functions, but +do not raise the inexact exception if @var{x} is not an integer. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double round (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float roundf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} roundl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are similar to @code{rint}, but they round halfway +cases away from zero instead of to the nearest integer (or other +current rounding mode). +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double roundeven (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float roundevenf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} roundevenl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions, from TS 18661-1:2014, are similar to @code{round}, +but they round halfway cases to even instead of away from zero. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun {long int} lrint (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long int} lrintf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long int} lrintl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are just like @code{rint}, but they return a +@code{long int} instead of a floating-point number. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun {long long int} llrint (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long long int} llrintf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long long int} llrintl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are just like @code{rint}, but they return a +@code{long long int} instead of a floating-point number. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun {long int} lround (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long int} lroundf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long int} lroundl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are just like @code{round}, but they return a +@code{long int} instead of a floating-point number. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun {long long int} llround (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long long int} llroundf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long long int} llroundl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are just like @code{round}, but they return a +@code{long long int} instead of a floating-point number. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun intmax_t fromfp (double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx intmax_t fromfpf (float @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx intmax_t fromfpl (long double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx uintmax_t ufromfp (double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx uintmax_t ufromfpf (float @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx uintmax_t ufromfpl (long double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx intmax_t fromfpx (double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx intmax_t fromfpxf (float @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx intmax_t fromfpxl (long double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx uintmax_t ufromfpx (double @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx uintmax_t ufromfpxf (float @var{x}, int @var{round}, unsigned int @var{width}) +@comment math.h +@comment ISO +@deftypefunx uintmax_t ufromfpxl (long double @var{x}, int @var{round}, unsigned int @var{width}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions, from TS 18661-1:2014, convert a floating-point number +to an integer according to the rounding direction @var{round} (one of +the @code{FP_INT_*} macros). If the integer is outside the range of a +signed or unsigned (depending on the return type of the function) type +of width @var{width} bits (or outside the range of the return type, if +@var{width} is larger), or if @var{x} is infinite or NaN, or if +@var{width} is zero, a domain error occurs and an unspecified value is +returned. The functions with an @samp{x} in their names raise the +inexact exception when a domain error does not occur and the argument +is not an integer; the other functions do not raise the inexact +exception. +@end deftypefun + + +@comment math.h +@comment ISO +@deftypefun double modf (double @var{value}, double *@var{integer-part}) +@comment math.h +@comment ISO +@deftypefunx float modff (float @var{value}, float *@var{integer-part}) +@comment math.h +@comment ISO +@deftypefunx {long double} modfl (long double @var{value}, long double *@var{integer-part}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions break the argument @var{value} into an integer part and a +fractional part (between @code{-1} and @code{1}, exclusive). Their sum +equals @var{value}. Each of the parts has the same sign as @var{value}, +and the integer part is always rounded toward zero. + +@code{modf} stores the integer part in @code{*@var{integer-part}}, and +returns the fractional part. For example, @code{modf (2.5, &intpart)} +returns @code{0.5} and stores @code{2.0} into @code{intpart}. +@end deftypefun + +@node Remainder Functions +@subsection Remainder Functions + +The functions in this section compute the remainder on division of two +floating-point numbers. Each is a little different; pick the one that +suits your problem. + +@comment math.h +@comment ISO +@deftypefun double fmod (double @var{numerator}, double @var{denominator}) +@comment math.h +@comment ISO +@deftypefunx float fmodf (float @var{numerator}, float @var{denominator}) +@comment math.h +@comment ISO +@deftypefunx {long double} fmodl (long double @var{numerator}, long double @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions compute the remainder from the division of +@var{numerator} by @var{denominator}. Specifically, the return value is +@code{@var{numerator} - @w{@var{n} * @var{denominator}}}, where @var{n} +is the quotient of @var{numerator} divided by @var{denominator}, rounded +towards zero to an integer. Thus, @w{@code{fmod (6.5, 2.3)}} returns +@code{1.9}, which is @code{6.5} minus @code{4.6}. + +The result has the same sign as the @var{numerator} and has magnitude +less than the magnitude of the @var{denominator}. + +If @var{denominator} is zero, @code{fmod} signals a domain error. +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun double drem (double @var{numerator}, double @var{denominator}) +@comment math.h +@comment BSD +@deftypefunx float dremf (float @var{numerator}, float @var{denominator}) +@comment math.h +@comment BSD +@deftypefunx {long double} dreml (long double @var{numerator}, long double @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are like @code{fmod} except that they round the +internal quotient @var{n} to the nearest integer instead of towards zero +to an integer. For example, @code{drem (6.5, 2.3)} returns @code{-0.4}, +which is @code{6.5} minus @code{6.9}. + +The absolute value of the result is less than or equal to half the +absolute value of the @var{denominator}. The difference between +@code{fmod (@var{numerator}, @var{denominator})} and @code{drem +(@var{numerator}, @var{denominator})} is always either +@var{denominator}, minus @var{denominator}, or zero. + +If @var{denominator} is zero, @code{drem} signals a domain error. +@end deftypefun + +@comment math.h +@comment BSD +@deftypefun double remainder (double @var{numerator}, double @var{denominator}) +@comment math.h +@comment BSD +@deftypefunx float remainderf (float @var{numerator}, float @var{denominator}) +@comment math.h +@comment BSD +@deftypefunx {long double} remainderl (long double @var{numerator}, long double @var{denominator}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function is another name for @code{drem}. +@end deftypefun + +@node FP Bit Twiddling +@subsection Setting and modifying single bits of FP values +@cindex FP arithmetic + +There are some operations that are too complicated or expensive to +perform by hand on floating-point numbers. @w{ISO C99} defines +functions to do these operations, which mostly involve changing single +bits. + +@comment math.h +@comment ISO +@deftypefun double copysign (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float copysignf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} copysignl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return @var{x} but with the sign of @var{y}. They work +even if @var{x} or @var{y} are NaN or zero. Both of these can carry a +sign (although not all implementations support it) and this is one of +the few operations that can tell the difference. + +@code{copysign} never raises an exception. +@c except signalling NaNs + +This function is defined in @w{IEC 559} (and the appendix with +recommended functions in @w{IEEE 754}/@w{IEEE 854}). +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun int signbit (@emph{float-type} @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@code{signbit} is a generic macro which can work on all floating-point +types. It returns a nonzero value if the value of @var{x} has its sign +bit set. + +This is not the same as @code{x < 0.0}, because @w{IEEE 754} floating +point allows zero to be signed. The comparison @code{-0.0 < 0.0} is +false, but @code{signbit (-0.0)} will return a nonzero value. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double nextafter (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float nextafterf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} nextafterl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{nextafter} function returns the next representable neighbor of +@var{x} in the direction towards @var{y}. The size of the step between +@var{x} and the result depends on the type of the result. If +@math{@var{x} = @var{y}} the function simply returns @var{y}. If either +value is @code{NaN}, @code{NaN} is returned. Otherwise +a value corresponding to the value of the least significant bit in the +mantissa is added or subtracted, depending on the direction. +@code{nextafter} will signal overflow or underflow if the result goes +outside of the range of normalized numbers. + +This function is defined in @w{IEC 559} (and the appendix with +recommended functions in @w{IEEE 754}/@w{IEEE 854}). +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double nexttoward (double @var{x}, long double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float nexttowardf (float @var{x}, long double @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} nexttowardl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions are identical to the corresponding versions of +@code{nextafter} except that their second argument is a @code{long +double}. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double nextup (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float nextupf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} nextupl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{nextup} function returns the next representable neighbor of @var{x} +in the direction of positive infinity. If @var{x} is the smallest negative +subnormal number in the type of @var{x} the function returns @code{-0}. If +@math{@var{x} = @code{0}} the function returns the smallest positive subnormal +number in the type of @var{x}. If @var{x} is NaN, NaN is returned. +If @var{x} is @math{+@infinity{}}, @math{+@infinity{}} is returned. +@code{nextup} is from TS 18661-1:2014. +@code{nextup} never raises an exception except for signaling NaNs. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double nextdown (double @var{x}) +@comment math.h +@comment ISO +@deftypefunx float nextdownf (float @var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} nextdownl (long double @var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{nextdown} function returns the next representable neighbor of @var{x} +in the direction of negative infinity. If @var{x} is the smallest positive +subnormal number in the type of @var{x} the function returns @code{+0}. If +@math{@var{x} = @code{0}} the function returns the smallest negative subnormal +number in the type of @var{x}. If @var{x} is NaN, NaN is returned. +If @var{x} is @math{-@infinity{}}, @math{-@infinity{}} is returned. +@code{nextdown} is from TS 18661-1:2014. +@code{nextdown} never raises an exception except for signaling NaNs. +@end deftypefun + +@cindex NaN +@comment math.h +@comment ISO +@deftypefun double nan (const char *@var{tagp}) +@comment math.h +@comment ISO +@deftypefunx float nanf (const char *@var{tagp}) +@comment math.h +@comment ISO +@deftypefunx {long double} nanl (const char *@var{tagp}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +@c The unsafe-but-ruled-safe locale use comes from strtod. +The @code{nan} function returns a representation of NaN, provided that +NaN is supported by the target platform. +@code{nan ("@var{n-char-sequence}")} is equivalent to +@code{strtod ("NAN(@var{n-char-sequence})")}. + +The argument @var{tagp} is used in an unspecified manner. On @w{IEEE +754} systems, there are many representations of NaN, and @var{tagp} +selects one. On other systems it may do nothing. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun int canonicalize (double *@var{cx}, const double *@var{x}) +@comment math.h +@comment ISO +@deftypefunx int canonicalizef (float *@var{cx}, const float *@var{x}) +@comment math.h +@comment ISO +@deftypefunx int canonicalizel (long double *@var{cx}, const long double *@var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +In some floating-point formats, some values have canonical (preferred) +and noncanonical encodings (for IEEE interchange binary formats, all +encodings are canonical). These functions, defined by TS +18661-1:2014, attempt to produce a canonical version of the +floating-point value pointed to by @var{x}; if that value is a +signaling NaN, they raise the invalid exception and produce a quiet +NaN. If a canonical value is produced, it is stored in the object +pointed to by @var{cx}, and these functions return zero. Otherwise +(if a canonical value could not be produced because the object pointed +to by @var{x} is not a valid representation of any floating-point +value), the object pointed to by @var{cx} is unchanged and a nonzero +value is returned. + +Note that some formats have multiple encodings of a value which are +all equally canonical; when such an encoding is used as an input to +this function, any such encoding of the same value (or of the +corresponding quiet NaN, if that value is a signaling NaN) may be +produced as output. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double getpayload (const double *@var{x}) +@comment math.h +@comment ISO +@deftypefunx float getpayloadf (const float *@var{x}) +@comment math.h +@comment ISO +@deftypefunx {long double} getpayloadl (const long double *@var{x}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +IEEE 754 defines the @dfn{payload} of a NaN to be an integer value +encoded in the representation of the NaN. Payloads are typically +propagated from NaN inputs to the result of a floating-point +operation. These functions, defined by TS 18661-1:2014, return the +payload of the NaN pointed to by @var{x} (returned as a positive +integer, or positive zero, represented as a floating-point number); if +@var{x} is not a NaN, they return an unspecified value. They raise no +floating-point exceptions even for signaling NaNs. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun int setpayload (double *@var{x}, double @var{payload}) +@comment math.h +@comment ISO +@deftypefunx int setpayloadf (float *@var{x}, float @var{payload}) +@comment math.h +@comment ISO +@deftypefunx int setpayloadl (long double *@var{x}, long double @var{payload}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions, defined by TS 18661-1:2014, set the object pointed to +by @var{x} to a quiet NaN with payload @var{payload} and a zero sign +bit and return zero. If @var{payload} is not a positive-signed +integer that is a valid payload for a quiet NaN of the given type, the +object pointed to by @var{x} is set to positive zero and a nonzero +value is returned. They raise no floating-point exceptions. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun int setpayloadsig (double *@var{x}, double @var{payload}) +@comment math.h +@comment ISO +@deftypefunx int setpayloadsigf (float *@var{x}, float @var{payload}) +@comment math.h +@comment ISO +@deftypefunx int setpayloadsigl (long double *@var{x}, long double @var{payload}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions, defined by TS 18661-1:2014, set the object pointed to +by @var{x} to a signaling NaN with payload @var{payload} and a zero +sign bit and return zero. If @var{payload} is not a positive-signed +integer that is a valid payload for a signaling NaN of the given type, +the object pointed to by @var{x} is set to positive zero and a nonzero +value is returned. They raise no floating-point exceptions. +@end deftypefun + +@node FP Comparison Functions +@subsection Floating-Point Comparison Functions +@cindex unordered comparison + +The standard C comparison operators provoke exceptions when one or other +of the operands is NaN. For example, + +@smallexample +int v = a < 1.0; +@end smallexample + +@noindent +will raise an exception if @var{a} is NaN. (This does @emph{not} +happen with @code{==} and @code{!=}; those merely return false and true, +respectively, when NaN is examined.) Frequently this exception is +undesirable. @w{ISO C99} therefore defines comparison functions that +do not raise exceptions when NaN is examined. All of the functions are +implemented as macros which allow their arguments to be of any +floating-point type. The macros are guaranteed to evaluate their +arguments only once. TS 18661-1:2014 adds such a macro for an +equality comparison that @emph{does} raise an exception for a NaN +argument; it also adds functions that provide a total ordering on all +floating-point values, including NaNs, without raising any exceptions +even for signaling NaNs. + +@comment math.h +@comment ISO +@deftypefn Macro int isgreater (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether the argument @var{x} is greater than +@var{y}. It is equivalent to @code{(@var{x}) > (@var{y})}, but no +exception is raised if @var{x} or @var{y} are NaN. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn Macro int isgreaterequal (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether the argument @var{x} is greater than or +equal to @var{y}. It is equivalent to @code{(@var{x}) >= (@var{y})}, but no +exception is raised if @var{x} or @var{y} are NaN. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn Macro int isless (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether the argument @var{x} is less than @var{y}. +It is equivalent to @code{(@var{x}) < (@var{y})}, but no exception is +raised if @var{x} or @var{y} are NaN. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn Macro int islessequal (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether the argument @var{x} is less than or equal +to @var{y}. It is equivalent to @code{(@var{x}) <= (@var{y})}, but no +exception is raised if @var{x} or @var{y} are NaN. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn Macro int islessgreater (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether the argument @var{x} is less or greater +than @var{y}. It is equivalent to @code{(@var{x}) < (@var{y}) || +(@var{x}) > (@var{y})} (although it only evaluates @var{x} and @var{y} +once), but no exception is raised if @var{x} or @var{y} are NaN. + +This macro is not equivalent to @code{@var{x} != @var{y}}, because that +expression is true if @var{x} or @var{y} are NaN. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn Macro int isunordered (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether its arguments are unordered. In other +words, it is true if @var{x} or @var{y} are NaN, and false otherwise. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefn Macro int iseqsig (@emph{real-floating} @var{x}, @emph{real-floating} @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This macro determines whether its arguments are equal. It is +equivalent to @code{(@var{x}) == (@var{y})}, but it raises the invalid +exception and sets @code{errno} to @code{EDOM} if either argument is a +NaN. +@end deftypefn + +@comment math.h +@comment ISO +@deftypefun int totalorder (double @var{x}, double @var{y}) +@comment ISO +@deftypefunx int totalorderf (float @var{x}, float @var{y}) +@comment ISO +@deftypefunx int totalorderl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions determine whether the total order relationship, +defined in IEEE 754-2008, is true for @var{x} and @var{y}, returning +nonzero if it is true and zero if it is false. No exceptions are +raised even for signaling NaNs. The relationship is true if they are +the same floating-point value (including sign for zero and NaNs, and +payload for NaNs), or if @var{x} comes before @var{y} in the following +order: negative quiet NaNs, in order of decreasing payload; negative +signaling NaNs, in order of decreasing payload; negative infinity; +finite numbers, in ascending order, with negative zero before positive +zero; positive infinity; positive signaling NaNs, in order of +increasing payload; positive quiet NaNs, in order of increasing +payload. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun int totalordermag (double @var{x}, double @var{y}) +@comment ISO +@deftypefunx int totalordermagf (float @var{x}, float @var{y}) +@comment ISO +@deftypefunx int totalordermagl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions determine whether the total order relationship, +defined in IEEE 754-2008, is true for the absolute values of @var{x} +and @var{y}, returning nonzero if it is true and zero if it is false. +No exceptions are raised even for signaling NaNs. +@end deftypefun + +Not all machines provide hardware support for these operations. On +machines that don't, the macros can be very slow. Therefore, you should +not use these functions when NaN is not a concern. + +@strong{NB:} There are no macros @code{isequal} or @code{isunequal}. +They are unnecessary, because the @code{==} and @code{!=} operators do +@emph{not} throw an exception if one or both of the operands are NaN. + +@node Misc FP Arithmetic +@subsection Miscellaneous FP arithmetic functions +@cindex minimum +@cindex maximum +@cindex positive difference +@cindex multiply-add + +The functions in this section perform miscellaneous but common +operations that are awkward to express with C operators. On some +processors these functions can use special machine instructions to +perform these operations faster than the equivalent C code. + +@comment math.h +@comment ISO +@deftypefun double fmin (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float fminf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} fminl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{fmin} function returns the lesser of the two values @var{x} +and @var{y}. It is similar to the expression +@smallexample +((x) < (y) ? (x) : (y)) +@end smallexample +except that @var{x} and @var{y} are only evaluated once. + +If an argument is NaN, the other argument is returned. If both arguments +are NaN, NaN is returned. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double fmax (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float fmaxf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} fmaxl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{fmax} function returns the greater of the two values @var{x} +and @var{y}. + +If an argument is NaN, the other argument is returned. If both arguments +are NaN, NaN is returned. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double fminmag (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float fminmagf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} fminmagl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions, from TS 18661-1:2014, return whichever of the two +values @var{x} and @var{y} has the smaller absolute value. If both +have the same absolute value, or either is NaN, they behave the same +as the @code{fmin} functions. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double fmaxmag (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float fmaxmagf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} fmaxmagl (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions, from TS 18661-1:2014, return whichever of the two +values @var{x} and @var{y} has the greater absolute value. If both +have the same absolute value, or either is NaN, they behave the same +as the @code{fmax} functions. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double fdim (double @var{x}, double @var{y}) +@comment math.h +@comment ISO +@deftypefunx float fdimf (float @var{x}, float @var{y}) +@comment math.h +@comment ISO +@deftypefunx {long double} fdiml (long double @var{x}, long double @var{y}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{fdim} function returns the positive difference between +@var{x} and @var{y}. The positive difference is @math{@var{x} - +@var{y}} if @var{x} is greater than @var{y}, and @math{0} otherwise. + +If @var{x}, @var{y}, or both are NaN, NaN is returned. +@end deftypefun + +@comment math.h +@comment ISO +@deftypefun double fma (double @var{x}, double @var{y}, double @var{z}) +@comment math.h +@comment ISO +@deftypefunx float fmaf (float @var{x}, float @var{y}, float @var{z}) +@comment math.h +@comment ISO +@deftypefunx {long double} fmal (long double @var{x}, long double @var{y}, long double @var{z}) +@cindex butterfly +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{fma} function performs floating-point multiply-add. This is +the operation @math{(@var{x} @mul{} @var{y}) + @var{z}}, but the +intermediate result is not rounded to the destination type. This can +sometimes improve the precision of a calculation. + +This function was introduced because some processors have a special +instruction to perform multiply-add. The C compiler cannot use it +directly, because the expression @samp{x*y + z} is defined to round the +intermediate result. @code{fma} lets you choose when you want to round +only once. + +@vindex FP_FAST_FMA +On processors which do not implement multiply-add in hardware, +@code{fma} can be very slow since it must avoid intermediate rounding. +@file{math.h} defines the symbols @code{FP_FAST_FMA}, +@code{FP_FAST_FMAF}, and @code{FP_FAST_FMAL} when the corresponding +version of @code{fma} is no slower than the expression @samp{x*y + z}. +In @theglibc{}, this always means the operation is implemented in +hardware. +@end deftypefun + +@node Complex Numbers +@section Complex Numbers +@pindex complex.h +@cindex complex numbers + +@w{ISO C99} introduces support for complex numbers in C. This is done +with a new type qualifier, @code{complex}. It is a keyword if and only +if @file{complex.h} has been included. There are three complex types, +corresponding to the three real types: @code{float complex}, +@code{double complex}, and @code{long double complex}. + +To construct complex numbers you need a way to indicate the imaginary +part of a number. There is no standard notation for an imaginary +floating point constant. Instead, @file{complex.h} defines two macros +that can be used to create complex numbers. + +@deftypevr Macro {const float complex} _Complex_I +This macro is a representation of the complex number ``@math{0+1i}''. +Multiplying a real floating-point value by @code{_Complex_I} gives a +complex number whose value is purely imaginary. You can use this to +construct complex constants: + +@smallexample +@math{3.0 + 4.0i} = @code{3.0 + 4.0 * _Complex_I} +@end smallexample + +Note that @code{_Complex_I * _Complex_I} has the value @code{-1}, but +the type of that value is @code{complex}. +@end deftypevr + +@c Put this back in when gcc supports _Imaginary_I. It's too confusing. +@ignore +@noindent +Without an optimizing compiler this is more expensive than the use of +@code{_Imaginary_I} but with is better than nothing. You can avoid all +the hassles if you use the @code{I} macro below if the name is not +problem. + +@deftypevr Macro {const float imaginary} _Imaginary_I +This macro is a representation of the value ``@math{1i}''. I.e., it is +the value for which + +@smallexample +_Imaginary_I * _Imaginary_I = -1 +@end smallexample + +@noindent +The result is not of type @code{float imaginary} but instead @code{float}. +One can use it to easily construct complex number like in + +@smallexample +3.0 - _Imaginary_I * 4.0 +@end smallexample + +@noindent +which results in the complex number with a real part of 3.0 and a +imaginary part -4.0. +@end deftypevr +@end ignore + +@noindent +@code{_Complex_I} is a bit of a mouthful. @file{complex.h} also defines +a shorter name for the same constant. + +@deftypevr Macro {const float complex} I +This macro has exactly the same value as @code{_Complex_I}. Most of the +time it is preferable. However, it causes problems if you want to use +the identifier @code{I} for something else. You can safely write + +@smallexample +#include <complex.h> +#undef I +@end smallexample + +@noindent +if you need @code{I} for your own purposes. (In that case we recommend +you also define some other short name for @code{_Complex_I}, such as +@code{J}.) + +@ignore +If the implementation does not support the @code{imaginary} types +@code{I} is defined as @code{_Complex_I} which is the second best +solution. It still can be used in the same way but requires a most +clever compiler to get the same results. +@end ignore +@end deftypevr + +@node Operations on Complex +@section Projections, Conjugates, and Decomposing of Complex Numbers +@cindex project complex numbers +@cindex conjugate complex numbers +@cindex decompose complex numbers +@pindex complex.h + +@w{ISO C99} also defines functions that perform basic operations on +complex numbers, such as decomposition and conjugation. The prototypes +for all these functions are in @file{complex.h}. All functions are +available in three variants, one for each of the three complex types. + +@comment complex.h +@comment ISO +@deftypefun double creal (complex double @var{z}) +@comment complex.h +@comment ISO +@deftypefunx float crealf (complex float @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {long double} creall (complex long double @var{z}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the real part of the complex number @var{z}. +@end deftypefun + +@comment complex.h +@comment ISO +@deftypefun double cimag (complex double @var{z}) +@comment complex.h +@comment ISO +@deftypefunx float cimagf (complex float @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {long double} cimagl (complex long double @var{z}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the imaginary part of the complex number @var{z}. +@end deftypefun + +@comment complex.h +@comment ISO +@deftypefun {complex double} conj (complex double @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {complex float} conjf (complex float @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {complex long double} conjl (complex long double @var{z}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the conjugate value of the complex number +@var{z}. The conjugate of a complex number has the same real part and a +negated imaginary part. In other words, @samp{conj(a + bi) = a + -bi}. +@end deftypefun + +@comment complex.h +@comment ISO +@deftypefun double carg (complex double @var{z}) +@comment complex.h +@comment ISO +@deftypefunx float cargf (complex float @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {long double} cargl (complex long double @var{z}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the argument of the complex number @var{z}. +The argument of a complex number is the angle in the complex plane +between the positive real axis and a line passing through zero and the +number. This angle is measured in the usual fashion and ranges from +@math{-@pi{}} to @math{@pi{}}. + +@code{carg} has a branch cut along the negative real axis. +@end deftypefun + +@comment complex.h +@comment ISO +@deftypefun {complex double} cproj (complex double @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {complex float} cprojf (complex float @var{z}) +@comment complex.h +@comment ISO +@deftypefunx {complex long double} cprojl (complex long double @var{z}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +These functions return the projection of the complex value @var{z} onto +the Riemann sphere. Values with an infinite imaginary part are projected +to positive infinity on the real axis, even if the real part is NaN. If +the real part is infinite, the result is equivalent to + +@smallexample +INFINITY + I * copysign (0.0, cimag (z)) +@end smallexample +@end deftypefun + +@node Parsing of Numbers +@section Parsing of Numbers +@cindex parsing numbers (in formatted input) +@cindex converting strings to numbers +@cindex number syntax, parsing +@cindex syntax, for reading numbers + +This section describes functions for ``reading'' integer and +floating-point numbers from a string. It may be more convenient in some +cases to use @code{sscanf} or one of the related functions; see +@ref{Formatted Input}. But often you can make a program more robust by +finding the tokens in the string by hand, then converting the numbers +one by one. + +@menu +* Parsing of Integers:: Functions for conversion of integer values. +* Parsing of Floats:: Functions for conversion of floating-point + values. +@end menu + +@node Parsing of Integers +@subsection Parsing of Integers + +@pindex stdlib.h +@pindex wchar.h +The @samp{str} functions are declared in @file{stdlib.h} and those +beginning with @samp{wcs} are declared in @file{wchar.h}. One might +wonder about the use of @code{restrict} in the prototypes of the +functions in this section. It is seemingly useless but the @w{ISO C} +standard uses it (for the functions defined there) so we have to do it +as well. + +@comment stdlib.h +@comment ISO +@deftypefun {long int} strtol (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +@c strtol uses the thread-local pointer to the locale in effect, and +@c strtol_l loads the LC_NUMERIC locale data from it early on and once, +@c but if the locale is the global locale, and another thread calls +@c setlocale in a way that modifies the pointer to the LC_CTYPE locale +@c category, the behavior of e.g. IS*, TOUPPER will vary throughout the +@c execution of the function, because they re-read the locale data from +@c the given locale pointer. We solved this by documenting setlocale as +@c MT-Unsafe. +The @code{strtol} (``string-to-long'') function converts the initial +part of @var{string} to a signed integer, which is returned as a value +of type @code{long int}. + +This function attempts to decompose @var{string} as follows: + +@itemize @bullet +@item +A (possibly empty) sequence of whitespace characters. Which characters +are whitespace is determined by the @code{isspace} function +(@pxref{Classification of Characters}). These are discarded. + +@item +An optional plus or minus sign (@samp{+} or @samp{-}). + +@item +A nonempty sequence of digits in the radix specified by @var{base}. + +If @var{base} is zero, decimal radix is assumed unless the series of +digits begins with @samp{0} (specifying octal radix), or @samp{0x} or +@samp{0X} (specifying hexadecimal radix); in other words, the same +syntax used for integer constants in C. + +Otherwise @var{base} must have a value between @code{2} and @code{36}. +If @var{base} is @code{16}, the digits may optionally be preceded by +@samp{0x} or @samp{0X}. If base has no legal value the value returned +is @code{0l} and the global variable @code{errno} is set to @code{EINVAL}. + +@item +Any remaining characters in the string. If @var{tailptr} is not a null +pointer, @code{strtol} stores a pointer to this tail in +@code{*@var{tailptr}}. +@end itemize + +If the string is empty, contains only whitespace, or does not contain an +initial substring that has the expected syntax for an integer in the +specified @var{base}, no conversion is performed. In this case, +@code{strtol} returns a value of zero and the value stored in +@code{*@var{tailptr}} is the value of @var{string}. + +In a locale other than the standard @code{"C"} locale, this function +may recognize additional implementation-dependent syntax. + +If the string has valid syntax for an integer but the value is not +representable because of overflow, @code{strtol} returns either +@code{LONG_MAX} or @code{LONG_MIN} (@pxref{Range of Type}), as +appropriate for the sign of the value. It also sets @code{errno} +to @code{ERANGE} to indicate there was overflow. + +You should not check for errors by examining the return value of +@code{strtol}, because the string might be a valid representation of +@code{0l}, @code{LONG_MAX}, or @code{LONG_MIN}. Instead, check whether +@var{tailptr} points to what you expect after the number +(e.g. @code{'\0'} if the string should end after the number). You also +need to clear @var{errno} before the call and check it afterward, in +case there was overflow. + +There is an example at the end of this section. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun {long int} wcstol (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstol} function is equivalent to the @code{strtol} function +in nearly all aspects but handles wide character strings. + +The @code{wcstol} function was introduced in @w{Amendment 1} of @w{ISO C90}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun {unsigned long int} strtoul (const char *retrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{strtoul} (``string-to-unsigned-long'') function is like +@code{strtol} except it converts to an @code{unsigned long int} value. +The syntax is the same as described above for @code{strtol}. The value +returned on overflow is @code{ULONG_MAX} (@pxref{Range of Type}). + +If @var{string} depicts a negative number, @code{strtoul} acts the same +as @var{strtol} but casts the result to an unsigned integer. That means +for example that @code{strtoul} on @code{"-1"} returns @code{ULONG_MAX} +and an input more negative than @code{LONG_MIN} returns +(@code{ULONG_MAX} + 1) / 2. + +@code{strtoul} sets @var{errno} to @code{EINVAL} if @var{base} is out of +range, or @code{ERANGE} on overflow. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun {unsigned long int} wcstoul (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstoul} function is equivalent to the @code{strtoul} function +in nearly all aspects but handles wide character strings. + +The @code{wcstoul} function was introduced in @w{Amendment 1} of @w{ISO C90}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun {long long int} strtoll (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{strtoll} function is like @code{strtol} except that it returns +a @code{long long int} value, and accepts numbers with a correspondingly +larger range. + +If the string has valid syntax for an integer but the value is not +representable because of overflow, @code{strtoll} returns either +@code{LLONG_MAX} or @code{LLONG_MIN} (@pxref{Range of Type}), as +appropriate for the sign of the value. It also sets @code{errno} to +@code{ERANGE} to indicate there was overflow. + +The @code{strtoll} function was introduced in @w{ISO C99}. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun {long long int} wcstoll (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstoll} function is equivalent to the @code{strtoll} function +in nearly all aspects but handles wide character strings. + +The @code{wcstoll} function was introduced in @w{Amendment 1} of @w{ISO C90}. +@end deftypefun + +@comment stdlib.h +@comment BSD +@deftypefun {long long int} strtoq (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +@code{strtoq} (``string-to-quad-word'') is the BSD name for @code{strtoll}. +@end deftypefun + +@comment wchar.h +@comment GNU +@deftypefun {long long int} wcstoq (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstoq} function is equivalent to the @code{strtoq} function +in nearly all aspects but handles wide character strings. + +The @code{wcstoq} function is a GNU extension. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun {unsigned long long int} strtoull (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{strtoull} function is related to @code{strtoll} the same way +@code{strtoul} is related to @code{strtol}. + +The @code{strtoull} function was introduced in @w{ISO C99}. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun {unsigned long long int} wcstoull (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstoull} function is equivalent to the @code{strtoull} function +in nearly all aspects but handles wide character strings. + +The @code{wcstoull} function was introduced in @w{Amendment 1} of @w{ISO C90}. +@end deftypefun + +@comment stdlib.h +@comment BSD +@deftypefun {unsigned long long int} strtouq (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +@code{strtouq} is the BSD name for @code{strtoull}. +@end deftypefun + +@comment wchar.h +@comment GNU +@deftypefun {unsigned long long int} wcstouq (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstouq} function is equivalent to the @code{strtouq} function +in nearly all aspects but handles wide character strings. + +The @code{wcstouq} function is a GNU extension. +@end deftypefun + +@comment inttypes.h +@comment ISO +@deftypefun intmax_t strtoimax (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{strtoimax} function is like @code{strtol} except that it returns +a @code{intmax_t} value, and accepts numbers of a corresponding range. + +If the string has valid syntax for an integer but the value is not +representable because of overflow, @code{strtoimax} returns either +@code{INTMAX_MAX} or @code{INTMAX_MIN} (@pxref{Integers}), as +appropriate for the sign of the value. It also sets @code{errno} to +@code{ERANGE} to indicate there was overflow. + +See @ref{Integers} for a description of the @code{intmax_t} type. The +@code{strtoimax} function was introduced in @w{ISO C99}. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun intmax_t wcstoimax (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstoimax} function is equivalent to the @code{strtoimax} function +in nearly all aspects but handles wide character strings. + +The @code{wcstoimax} function was introduced in @w{ISO C99}. +@end deftypefun + +@comment inttypes.h +@comment ISO +@deftypefun uintmax_t strtoumax (const char *restrict @var{string}, char **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{strtoumax} function is related to @code{strtoimax} +the same way that @code{strtoul} is related to @code{strtol}. + +See @ref{Integers} for a description of the @code{intmax_t} type. The +@code{strtoumax} function was introduced in @w{ISO C99}. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun uintmax_t wcstoumax (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}, int @var{base}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstoumax} function is equivalent to the @code{strtoumax} function +in nearly all aspects but handles wide character strings. + +The @code{wcstoumax} function was introduced in @w{ISO C99}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun {long int} atol (const char *@var{string}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +This function is similar to the @code{strtol} function with a @var{base} +argument of @code{10}, except that it need not detect overflow errors. +The @code{atol} function is provided mostly for compatibility with +existing code; using @code{strtol} is more robust. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun int atoi (const char *@var{string}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +This function is like @code{atol}, except that it returns an @code{int}. +The @code{atoi} function is also considered obsolete; use @code{strtol} +instead. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun {long long int} atoll (const char *@var{string}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +This function is similar to @code{atol}, except it returns a @code{long +long int}. + +The @code{atoll} function was introduced in @w{ISO C99}. It too is +obsolete (despite having just been added); use @code{strtoll} instead. +@end deftypefun + +All the functions mentioned in this section so far do not handle +alternative representations of characters as described in the locale +data. Some locales specify thousands separator and the way they have to +be used which can help to make large numbers more readable. To read +such numbers one has to use the @code{scanf} functions with the @samp{'} +flag. + +Here is a function which parses a string as a sequence of integers and +returns the sum of them: + +@smallexample +int +sum_ints_from_string (char *string) +@{ + int sum = 0; + + while (1) @{ + char *tail; + int next; + + /* @r{Skip whitespace by hand, to detect the end.} */ + while (isspace (*string)) string++; + if (*string == 0) + break; + + /* @r{There is more nonwhitespace,} */ + /* @r{so it ought to be another number.} */ + errno = 0; + /* @r{Parse it.} */ + next = strtol (string, &tail, 0); + /* @r{Add it in, if not overflow.} */ + if (errno) + printf ("Overflow\n"); + else + sum += next; + /* @r{Advance past it.} */ + string = tail; + @} + + return sum; +@} +@end smallexample + +@node Parsing of Floats +@subsection Parsing of Floats + +@pindex stdlib.h +The @samp{str} functions are declared in @file{stdlib.h} and those +beginning with @samp{wcs} are declared in @file{wchar.h}. One might +wonder about the use of @code{restrict} in the prototypes of the +functions in this section. It is seemingly useless but the @w{ISO C} +standard uses it (for the functions defined there) so we have to do it +as well. + +@comment stdlib.h +@comment ISO +@deftypefun double strtod (const char *restrict @var{string}, char **restrict @var{tailptr}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +@c Besides the unsafe-but-ruled-safe locale uses, this uses a lot of +@c mpn, but it's all safe. +@c +@c round_and_return +@c get_rounding_mode ok +@c mpn_add_1 ok +@c mpn_rshift ok +@c MPN_ZERO ok +@c MPN2FLOAT -> mpn_construct_(float|double|long_double) ok +@c str_to_mpn +@c mpn_mul_1 -> umul_ppmm ok +@c mpn_add_1 ok +@c mpn_lshift_1 -> mpn_lshift ok +@c STRTOF_INTERNAL +@c MPN_VAR ok +@c SET_MANTISSA ok +@c STRNCASECMP ok, wide and narrow +@c round_and_return ok +@c mpn_mul ok +@c mpn_addmul_1 ok +@c ... mpn_sub +@c mpn_lshift ok +@c udiv_qrnnd ok +@c count_leading_zeros ok +@c add_ssaaaa ok +@c sub_ddmmss ok +@c umul_ppmm ok +@c mpn_submul_1 ok +The @code{strtod} (``string-to-double'') function converts the initial +part of @var{string} to a floating-point number, which is returned as a +value of type @code{double}. + +This function attempts to decompose @var{string} as follows: + +@itemize @bullet +@item +A (possibly empty) sequence of whitespace characters. Which characters +are whitespace is determined by the @code{isspace} function +(@pxref{Classification of Characters}). These are discarded. + +@item +An optional plus or minus sign (@samp{+} or @samp{-}). + +@item A floating point number in decimal or hexadecimal format. The +decimal format is: +@itemize @minus + +@item +A nonempty sequence of digits optionally containing a decimal-point +character---normally @samp{.}, but it depends on the locale +(@pxref{General Numeric}). + +@item +An optional exponent part, consisting of a character @samp{e} or +@samp{E}, an optional sign, and a sequence of digits. + +@end itemize + +The hexadecimal format is as follows: +@itemize @minus + +@item +A 0x or 0X followed by a nonempty sequence of hexadecimal digits +optionally containing a decimal-point character---normally @samp{.}, but +it depends on the locale (@pxref{General Numeric}). + +@item +An optional binary-exponent part, consisting of a character @samp{p} or +@samp{P}, an optional sign, and a sequence of digits. + +@end itemize + +@item +Any remaining characters in the string. If @var{tailptr} is not a null +pointer, a pointer to this tail of the string is stored in +@code{*@var{tailptr}}. +@end itemize + +If the string is empty, contains only whitespace, or does not contain an +initial substring that has the expected syntax for a floating-point +number, no conversion is performed. In this case, @code{strtod} returns +a value of zero and the value returned in @code{*@var{tailptr}} is the +value of @var{string}. + +In a locale other than the standard @code{"C"} or @code{"POSIX"} locales, +this function may recognize additional locale-dependent syntax. + +If the string has valid syntax for a floating-point number but the value +is outside the range of a @code{double}, @code{strtod} will signal +overflow or underflow as described in @ref{Math Error Reporting}. + +@code{strtod} recognizes four special input strings. The strings +@code{"inf"} and @code{"infinity"} are converted to @math{@infinity{}}, +or to the largest representable value if the floating-point format +doesn't support infinities. You can prepend a @code{"+"} or @code{"-"} +to specify the sign. Case is ignored when scanning these strings. + +The strings @code{"nan"} and @code{"nan(@var{chars@dots{}})"} are converted +to NaN. Again, case is ignored. If @var{chars@dots{}} are provided, they +are used in some unspecified fashion to select a particular +representation of NaN (there can be several). + +Since zero is a valid result as well as the value returned on error, you +should check for errors in the same way as for @code{strtol}, by +examining @var{errno} and @var{tailptr}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun float strtof (const char *@var{string}, char **@var{tailptr}) +@comment stdlib.h +@comment ISO +@deftypefunx {long double} strtold (const char *@var{string}, char **@var{tailptr}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +These functions are analogous to @code{strtod}, but return @code{float} +and @code{long double} values respectively. They report errors in the +same way as @code{strtod}. @code{strtof} can be substantially faster +than @code{strtod}, but has less precision; conversely, @code{strtold} +can be much slower but has more precision (on systems where @code{long +double} is a separate type). + +These functions have been GNU extensions and are new to @w{ISO C99}. +@end deftypefun + +@comment wchar.h +@comment ISO +@deftypefun double wcstod (const wchar_t *restrict @var{string}, wchar_t **restrict @var{tailptr}) +@comment stdlib.h +@comment ISO +@deftypefunx float wcstof (const wchar_t *@var{string}, wchar_t **@var{tailptr}) +@comment stdlib.h +@comment ISO +@deftypefunx {long double} wcstold (const wchar_t *@var{string}, wchar_t **@var{tailptr}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +The @code{wcstod}, @code{wcstof}, and @code{wcstol} functions are +equivalent in nearly all aspect to the @code{strtod}, @code{strtof}, and +@code{strtold} functions but it handles wide character string. + +The @code{wcstod} function was introduced in @w{Amendment 1} of @w{ISO +C90}. The @code{wcstof} and @code{wcstold} functions were introduced in +@w{ISO C99}. +@end deftypefun + +@comment stdlib.h +@comment ISO +@deftypefun double atof (const char *@var{string}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} +This function is similar to the @code{strtod} function, except that it +need not detect overflow and underflow errors. The @code{atof} function +is provided mostly for compatibility with existing code; using +@code{strtod} is more robust. +@end deftypefun + +@Theglibc{} also provides @samp{_l} versions of these functions, +which take an additional argument, the locale to use in conversion. + +See also @ref{Parsing of Integers}. + +@node Printing of Floats +@section Printing of Floats + +@pindex stdlib.h +The @samp{strfrom} functions are declared in @file{stdlib.h}. + +@comment stdlib.h +@comment ISO/IEC TS 18661-1 +@deftypefun int strfromd (char *restrict @var{string}, size_t @var{size}, const char *restrict @var{format}, double @var{value}) +@deftypefunx int strfromf (char *restrict @var{string}, size_t @var{size}, const char *restrict @var{format}, float @var{value}) +@deftypefunx int strfroml (char *restrict @var{string}, size_t @var{size}, const char *restrict @var{format}, long double @var{value}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} +@comment these functions depend on __printf_fp and __printf_fphex, which are +@comment AS-unsafe (ascuheap) and AC-unsafe (acsmem). +The functions @code{strfromd} (``string-from-double''), @code{strfromf} +(``string-from-float''), and @code{strfroml} (``string-from-long-double'') +convert the floating-point number @var{value} to a string of characters and +stores them into the area pointed to by @var{string}. The conversion +writes at most @var{size} characters and respects the format specified by +@var{format}. + +The format string must start with the character @samp{%}. An optional +precision follows, which starts with a period, @samp{.}, and may be +followed by a decimal integer, representing the precision. If a decimal +integer is not specified after the period, the precision is taken to be +zero. The character @samp{*} is not allowed. Finally, the format string +ends with one of the following conversion specifiers: @samp{a}, @samp{A}, +@samp{e}, @samp{E}, @samp{f}, @samp{F}, @samp{g} or @samp{G} (@pxref{Table +of Output Conversions}). Invalid format strings result in undefined +behavior. + +These functions return the number of characters that would have been +written to @var{string} had @var{size} been sufficiently large, not +counting the terminating null character. Thus, the null-terminated output +has been completely written if and only if the returned value is less than +@var{size}. + +These functions were introduced by ISO/IEC TS 18661-1. +@end deftypefun + +@node System V Number Conversion +@section Old-fashioned System V number-to-string functions + +The old @w{System V} C library provided three functions to convert +numbers to strings, with unusual and hard-to-use semantics. @Theglibc{} +also provides these functions and some natural extensions. + +These functions are only available in @theglibc{} and on systems descended +from AT&T Unix. Therefore, unless these functions do precisely what you +need, it is better to use @code{sprintf}, which is standard. + +All these functions are defined in @file{stdlib.h}. + +@comment stdlib.h +@comment SVID, Unix98 +@deftypefun {char *} ecvt (double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}) +@safety{@prelim{}@mtunsafe{@mtasurace{:ecvt}}@asunsafe{}@acsafe{}} +The function @code{ecvt} converts the floating-point number @var{value} +to a string with at most @var{ndigit} decimal digits. The +returned string contains no decimal point or sign. The first digit of +the string is non-zero (unless @var{value} is actually zero) and the +last digit is rounded to nearest. @code{*@var{decpt}} is set to the +index in the string of the first digit after the decimal point. +@code{*@var{neg}} is set to a nonzero value if @var{value} is negative, +zero otherwise. + +If @var{ndigit} decimal digits would exceed the precision of a +@code{double} it is reduced to a system-specific value. + +The returned string is statically allocated and overwritten by each call +to @code{ecvt}. + +If @var{value} is zero, it is implementation defined whether +@code{*@var{decpt}} is @code{0} or @code{1}. + +For example: @code{ecvt (12.3, 5, &d, &n)} returns @code{"12300"} +and sets @var{d} to @code{2} and @var{n} to @code{0}. +@end deftypefun + +@comment stdlib.h +@comment SVID, Unix98 +@deftypefun {char *} fcvt (double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}) +@safety{@prelim{}@mtunsafe{@mtasurace{:fcvt}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} +The function @code{fcvt} is like @code{ecvt}, but @var{ndigit} specifies +the number of digits after the decimal point. If @var{ndigit} is less +than zero, @var{value} is rounded to the @math{@var{ndigit}+1}'th place to the +left of the decimal point. For example, if @var{ndigit} is @code{-1}, +@var{value} will be rounded to the nearest 10. If @var{ndigit} is +negative and larger than the number of digits to the left of the decimal +point in @var{value}, @var{value} will be rounded to one significant digit. + +If @var{ndigit} decimal digits would exceed the precision of a +@code{double} it is reduced to a system-specific value. + +The returned string is statically allocated and overwritten by each call +to @code{fcvt}. +@end deftypefun + +@comment stdlib.h +@comment SVID, Unix98 +@deftypefun {char *} gcvt (double @var{value}, int @var{ndigit}, char *@var{buf}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@c gcvt calls sprintf, that ultimately calls vfprintf, which malloc()s +@c args_value if it's too large, but gcvt never exercises this path. +@code{gcvt} is functionally equivalent to @samp{sprintf(buf, "%*g", +ndigit, value}. It is provided only for compatibility's sake. It +returns @var{buf}. + +If @var{ndigit} decimal digits would exceed the precision of a +@code{double} it is reduced to a system-specific value. +@end deftypefun + +As extensions, @theglibc{} provides versions of these three +functions that take @code{long double} arguments. + +@comment stdlib.h +@comment GNU +@deftypefun {char *} qecvt (long double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}) +@safety{@prelim{}@mtunsafe{@mtasurace{:qecvt}}@asunsafe{}@acsafe{}} +This function is equivalent to @code{ecvt} except that it takes a +@code{long double} for the first parameter and that @var{ndigit} is +restricted by the precision of a @code{long double}. +@end deftypefun + +@comment stdlib.h +@comment GNU +@deftypefun {char *} qfcvt (long double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}) +@safety{@prelim{}@mtunsafe{@mtasurace{:qfcvt}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} +This function is equivalent to @code{fcvt} except that it +takes a @code{long double} for the first parameter and that @var{ndigit} is +restricted by the precision of a @code{long double}. +@end deftypefun + +@comment stdlib.h +@comment GNU +@deftypefun {char *} qgcvt (long double @var{value}, int @var{ndigit}, char *@var{buf}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function is equivalent to @code{gcvt} except that it takes a +@code{long double} for the first parameter and that @var{ndigit} is +restricted by the precision of a @code{long double}. +@end deftypefun + + +@cindex gcvt_r +The @code{ecvt} and @code{fcvt} functions, and their @code{long double} +equivalents, all return a string located in a static buffer which is +overwritten by the next call to the function. @Theglibc{} +provides another set of extended functions which write the converted +string into a user-supplied buffer. These have the conventional +@code{_r} suffix. + +@code{gcvt_r} is not necessary, because @code{gcvt} already uses a +user-supplied buffer. + +@comment stdlib.h +@comment GNU +@deftypefun int ecvt_r (double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}, char *@var{buf}, size_t @var{len}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{ecvt_r} function is the same as @code{ecvt}, except +that it places its result into the user-specified buffer pointed to by +@var{buf}, with length @var{len}. The return value is @code{-1} in +case of an error and zero otherwise. + +This function is a GNU extension. +@end deftypefun + +@comment stdlib.h +@comment SVID, Unix98 +@deftypefun int fcvt_r (double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}, char *@var{buf}, size_t @var{len}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{fcvt_r} function is the same as @code{fcvt}, except that it +places its result into the user-specified buffer pointed to by +@var{buf}, with length @var{len}. The return value is @code{-1} in +case of an error and zero otherwise. + +This function is a GNU extension. +@end deftypefun + +@comment stdlib.h +@comment GNU +@deftypefun int qecvt_r (long double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}, char *@var{buf}, size_t @var{len}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{qecvt_r} function is the same as @code{qecvt}, except +that it places its result into the user-specified buffer pointed to by +@var{buf}, with length @var{len}. The return value is @code{-1} in +case of an error and zero otherwise. + +This function is a GNU extension. +@end deftypefun + +@comment stdlib.h +@comment GNU +@deftypefun int qfcvt_r (long double @var{value}, int @var{ndigit}, int *@var{decpt}, int *@var{neg}, char *@var{buf}, size_t @var{len}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +The @code{qfcvt_r} function is the same as @code{qfcvt}, except +that it places its result into the user-specified buffer pointed to by +@var{buf}, with length @var{len}. The return value is @code{-1} in +case of an error and zero otherwise. + +This function is a GNU extension. +@end deftypefun |