diff options
Diffstat (limited to 'manual/=limits.texinfo')
-rw-r--r-- | manual/=limits.texinfo | 593 |
1 files changed, 0 insertions, 593 deletions
diff --git a/manual/=limits.texinfo b/manual/=limits.texinfo deleted file mode 100644 index 7b55d70465..0000000000 --- a/manual/=limits.texinfo +++ /dev/null @@ -1,593 +0,0 @@ -@node Representation Limits, System Configuration Limits, System Information, Top -@chapter Representation Limits - -This chapter contains information about constants and parameters that -characterize the representation of the various integer and -floating-point types supported by the GNU C library. - -@menu -* Integer Representation Limits:: Determining maximum and minimum - representation values of - various integer subtypes. -* Floating-Point Limits :: Parameters which characterize - supported floating-point - representations on a particular - system. -@end menu - -@node Integer Representation Limits, Floating-Point Limits , , Representation Limits -@section Integer Representation Limits -@cindex integer representation limits -@cindex representation limits, integer -@cindex limits, integer representation - -Sometimes it is necessary for programs to know about the internal -representation of various integer subtypes. For example, if you want -your program to be careful not to overflow an @code{int} counter -variable, you need to know what the largest representable value that -fits in an @code{int} is. These kinds of parameters can vary from -compiler to compiler and machine to machine. Another typical use of -this kind of parameter is in conditionalizing data structure definitions -with @samp{#ifdef} to select the most appropriate integer subtype that -can represent the required range of values. - -Macros representing the minimum and maximum limits of the integer types -are defined in the header file @file{limits.h}. The values of these -macros are all integer constant expressions. -@pindex limits.h - -@comment limits.h -@comment ISO -@deftypevr Macro int CHAR_BIT -This is the number of bits in a @code{char}, usually eight. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int SCHAR_MIN -This is the minimum value that can be represented by a @code{signed char}. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int SCHAR_MAX -This is the maximum value that can be represented by a @code{signed char}. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int UCHAR_MAX -This is the maximum value that can be represented by a @code{unsigned char}. -(The minimum value of an @code{unsigned char} is zero.) -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int CHAR_MIN -This is the minimum value that can be represented by a @code{char}. -It's equal to @code{SCHAR_MIN} if @code{char} is signed, or zero -otherwise. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int CHAR_MAX -This is the maximum value that can be represented by a @code{char}. -It's equal to @code{SCHAR_MAX} if @code{char} is signed, or -@code{UCHAR_MAX} otherwise. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int SHRT_MIN -This is the minimum value that can be represented by a @code{signed -short int}. On most machines that the GNU C library runs on, -@code{short} integers are 16-bit quantities. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int SHRT_MAX -This is the maximum value that can be represented by a @code{signed -short int}. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int USHRT_MAX -This is the maximum value that can be represented by an @code{unsigned -short int}. (The minimum value of an @code{unsigned short int} is zero.) -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int INT_MIN -This is the minimum value that can be represented by a @code{signed -int}. On most machines that the GNU C system runs on, an @code{int} is -a 32-bit quantity. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro int INT_MAX -This is the maximum value that can be represented by a @code{signed -int}. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro {unsigned int} UINT_MAX -This is the maximum value that can be represented by an @code{unsigned -int}. (The minimum value of an @code{unsigned int} is zero.) -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro {long int} LONG_MIN -This is the minimum value that can be represented by a @code{signed long -int}. On most machines that the GNU C system runs on, @code{long} -integers are 32-bit quantities, the same size as @code{int}. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro {long int} LONG_MAX -This is the maximum value that can be represented by a @code{signed long -int}. -@end deftypevr - -@comment limits.h -@comment ISO -@deftypevr Macro {unsigned long int} ULONG_MAX -This is the maximum value that can be represented by an @code{unsigned -long int}. (The minimum value of an @code{unsigned long int} is zero.) -@end deftypevr - -@strong{Incomplete:} There should be corresponding limits for the GNU -C Compiler's @code{long long} type, too. (But they are not now present -in the header file.) - -The header file @file{limits.h} also defines some additional constants -that parameterize various operating system and file system limits. These -constants are described in @ref{System Parameters} and @ref{File System -Parameters}. -@pindex limits.h - - -@node Floating-Point Limits , , Integer Representation Limits, Representation Limits -@section Floating-Point Limits -@cindex floating-point number representation -@cindex representation, floating-point number -@cindex limits, floating-point representation - -Because floating-point numbers are represented internally as approximate -quantities, algorithms for manipulating floating-point data often need -to be parameterized in terms of the accuracy of the representation. -Some of the functions in the C library itself need this information; for -example, the algorithms for printing and reading floating-point numbers -(@pxref{I/O on Streams}) and for calculating trigonometric and -irrational functions (@pxref{Mathematics}) use information about the -underlying floating-point representation to avoid round-off error and -loss of accuracy. User programs that implement numerical analysis -techniques also often need to be parameterized in this way in order to -minimize or compute error bounds. - -The specific representation of floating-point numbers varies from -machine to machine. The GNU C library defines a set of parameters which -characterize each of the supported floating-point representations on a -particular system. - -@menu -* Floating-Point Representation:: Definitions of terminology. -* Floating-Point Parameters:: Descriptions of the library - facilities. -* IEEE Floating Point:: An example of a common - representation. -@end menu - -@node Floating-Point Representation, Floating-Point Parameters, , Floating-Point Limits -@subsection Floating-Point Representation - -This section introduces the terminology used to characterize the -representation of floating-point numbers. - -You are probably already familiar with most of these concepts in terms -of scientific or exponential notation for floating-point numbers. For -example, the number @code{123456.0} could be expressed in exponential -notation as @code{1.23456e+05}, a shorthand notation indicating that the -mantissa @code{1.23456} is multiplied by the base @code{10} raised to -power @code{5}. - -More formally, the internal representation of a floating-point number -can be characterized in terms of the following parameters: - -@itemize @bullet -@item -The @dfn{sign} is either @code{-1} or @code{1}. -@cindex sign (of floating-point number) - -@item -The @dfn{base} or @dfn{radix} for exponentiation; an integer greater -than @code{1}. This is a constant for the particular representation. -@cindex base (of floating-point number) -@cindex radix (of floating-point number) - -@item -The @dfn{exponent} to which the base is raised. The upper and lower -bounds of the exponent value are constants for the particular -representation. -@cindex exponent (of floating-point number) - -Sometimes, in the actual bits representing the floating-point number, -the exponent is @dfn{biased} by adding a constant to it, to make it -always be represented as an unsigned quantity. This is only important -if you have some reason to pick apart the bit fields making up the -floating-point number by hand, which is something for which the GNU -library provides no support. So this is ignored in the discussion that -follows. -@cindex bias (of floating-point number exponent) - -@item -The value of the @dfn{mantissa} or @dfn{significand}, which is an -unsigned integer. -@cindex mantissa (of floating-point number) -@cindex significand (of floating-point number) - -@item -The @dfn{precision} of the mantissa. If the base of the representation -is @var{b}, then the precision is the number of base-@var{b} digits in -the mantissa. This is a constant for the particular representation. - -Many floating-point representations have an implicit @dfn{hidden bit} in -the mantissa. Any such hidden bits are counted in the precision. -Again, the GNU library provides no facilities for dealing with such low-level -aspects of the representation. -@cindex precision (of floating-point number) -@cindex hidden bit (of floating-point number mantissa) -@end itemize - -The mantissa of a floating-point number actually represents an implicit -fraction whose denominator is the base raised to the power of the -precision. Since the largest representable mantissa is one less than -this denominator, the value of the fraction is always strictly less than -@code{1}. The mathematical value of a floating-point number is then the -product of this fraction; the sign; and the base raised to the exponent. - -If the floating-point number is @dfn{normalized}, the mantissa is also -greater than or equal to the base raised to the power of one less -than the precision (unless the number represents a floating-point zero, -in which case the mantissa is zero). The fractional quantity is -therefore greater than or equal to @code{1/@var{b}}, where @var{b} is -the base. -@cindex normalized floating-point number - -@node Floating-Point Parameters, IEEE Floating Point, Floating-Point Representation, Floating-Point Limits -@subsection Floating-Point Parameters - -@strong{Incomplete:} This section needs some more concrete examples -of what these parameters mean and how to use them in a program. - -These macro definitions can be accessed by including the header file -@file{float.h} in your program. -@pindex float.h - -Macro names starting with @samp{FLT_} refer to the @code{float} type, -while names beginning with @samp{DBL_} refer to the @code{double} type -and names beginning with @samp{LDBL_} refer to the @code{long double} -type. (In implementations that do not support @code{long double} as -a distinct data type, the values for those constants are the same -as the corresponding constants for the @code{double} type.)@refill -@cindex @code{float} representation limits -@cindex @code{double} representation limits -@cindex @code{long double} representation limits - -Of these macros, only @code{FLT_RADIX} is guaranteed to be a constant -expression. The other macros listed here cannot be reliably used in -places that require constant expressions, such as @samp{#if} -preprocessing directives or array size specifications. - -Although the @w{ISO C} standard specifies minimum and maximum values for -most of these parameters, the GNU C implementation uses whatever -floating-point representations are supported by the underlying hardware. -So whether GNU C actually satisfies the @w{ISO C} requirements depends on -what machine it is running on. - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_ROUNDS -This value characterizes the rounding mode for floating-point addition. -The following values indicate standard rounding modes: - -@table @code -@item -1 -The mode is indeterminable. -@item 0 -Rounding is towards zero. -@item 1 -Rounding is to the nearest number. -@item 2 -Rounding is towards positive infinity. -@item 3 -Rounding is towards negative infinity. -@end table - -@noindent -Any other value represents a machine-dependent nonstandard rounding -mode. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_RADIX -This is the value of the base, or radix, of exponent representation. -This is guaranteed to be a constant expression, unlike the other macros -described in this section. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_MANT_DIG -This is the number of base-@code{FLT_RADIX} digits in the floating-point -mantissa for the @code{float} data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int DBL_MANT_DIG -This is the number of base-@code{FLT_RADIX} digits in the floating-point -mantissa for the @code{double} data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int LDBL_MANT_DIG -This is the number of base-@code{FLT_RADIX} digits in the floating-point -mantissa for the @code{long double} data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_DIG -This is the number of decimal digits of precision for the @code{float} -data type. Technically, if @var{p} and @var{b} are the precision and -base (respectively) for the representation, then the decimal precision -@var{q} is the maximum number of decimal digits such that any floating -point number with @var{q} base 10 digits can be rounded to a floating -point number with @var{p} base @var{b} digits and back again, without -change to the @var{q} decimal digits. - -The value of this macro is guaranteed to be at least @code{6}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int DBL_DIG -This is similar to @code{FLT_DIG}, but is for the @code{double} data -type. The value of this macro is guaranteed to be at least @code{10}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int LDBL_DIG -This is similar to @code{FLT_DIG}, but is for the @code{long double} -data type. The value of this macro is guaranteed to be at least -@code{10}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_MIN_EXP -This is the minimum negative integer such that the mathematical value -@code{FLT_RADIX} raised to this power minus 1 can be represented as a -normalized floating-point number of type @code{float}. In terms of the -actual implementation, this is just the smallest value that can be -represented in the exponent field of the number. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int DBL_MIN_EXP -This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data -type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int LDBL_MIN_EXP -This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double} -data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_MIN_10_EXP -This is the minimum negative integer such that the mathematical value -@code{10} raised to this power minus 1 can be represented as a -normalized floating-point number of type @code{float}. This is -guaranteed to be no greater than @code{-37}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int DBL_MIN_10_EXP -This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double} -data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int LDBL_MIN_10_EXP -This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long -double} data type. -@end deftypevr - - - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_MAX_EXP -This is the maximum negative integer such that the mathematical value -@code{FLT_RADIX} raised to this power minus 1 can be represented as a -floating-point number of type @code{float}. In terms of the actual -implementation, this is just the largest value that can be represented -in the exponent field of the number. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int DBL_MAX_EXP -This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data -type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int LDBL_MAX_EXP -This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double} -data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int FLT_MAX_10_EXP -This is the maximum negative integer such that the mathematical value -@code{10} raised to this power minus 1 can be represented as a -normalized floating-point number of type @code{float}. This is -guaranteed to be at least @code{37}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int DBL_MAX_10_EXP -This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double} -data type. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro int LDBL_MAX_10_EXP -This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long -double} data type. -@end deftypevr - - -@comment float.h -@comment ISO -@deftypevr Macro double FLT_MAX -The value of this macro is the maximum representable floating-point -number of type @code{float}, and is guaranteed to be at least -@code{1E+37}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro double DBL_MAX -The value of this macro is the maximum representable floating-point -number of type @code{double}, and is guaranteed to be at least -@code{1E+37}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro {long double} LDBL_MAX -The value of this macro is the maximum representable floating-point -number of type @code{long double}, and is guaranteed to be at least -@code{1E+37}. -@end deftypevr - - -@comment float.h -@comment ISO -@deftypevr Macro double FLT_MIN -The value of this macro is the minimum normalized positive -floating-point number that is representable by type @code{float}, and is -guaranteed to be no more than @code{1E-37}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro double DBL_MIN -The value of this macro is the minimum normalized positive -floating-point number that is representable by type @code{double}, and -is guaranteed to be no more than @code{1E-37}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro {long double} LDBL_MIN -The value of this macro is the minimum normalized positive -floating-point number that is representable by type @code{long double}, -and is guaranteed to be no more than @code{1E-37}. -@end deftypevr - - -@comment float.h -@comment ISO -@deftypevr Macro double FLT_EPSILON -This is the minimum positive floating-point number of type @code{float} -such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to -be no greater than @code{1E-5}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro double DBL_EPSILON -This is similar to @code{FLT_EPSILON}, but is for the @code{double} -type. The maximum value is @code{1E-9}. -@end deftypevr - -@comment float.h -@comment ISO -@deftypevr Macro {long double} LDBL_EPSILON -This is similar to @code{FLT_EPSILON}, but is for the @code{long double} -type. The maximum value is @code{1E-9}. -@end deftypevr - - -@node IEEE Floating Point, , Floating-Point Parameters, Floating-Point Limits -@subsection IEEE Floating Point -@cindex IEEE floating-point representation -@cindex floating-point, IEEE -@cindex IEEE Std 754 - - -Here is an example showing how these parameters work for a common -floating point representation, specified by the @cite{IEEE Standard for -Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985 or ANSI/IEEE -Std 854-1987)}. Nearly all computers today use this format. - -The IEEE single-precision float representation uses a base of 2. There -is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total -precision is 24 base-2 digits), and an 8-bit exponent that can represent -values in the range -125 to 128, inclusive. - -So, for an implementation that uses this representation for the -@code{float} data type, appropriate values for the corresponding -parameters are: - -@example -FLT_RADIX 2 -FLT_MANT_DIG 24 -FLT_DIG 6 -FLT_MIN_EXP -125 -FLT_MIN_10_EXP -37 -FLT_MAX_EXP 128 -FLT_MAX_10_EXP +38 -FLT_MIN 1.17549435E-38F -FLT_MAX 3.40282347E+38F -FLT_EPSILON 1.19209290E-07F -@end example - -Here are the values for the @code{double} data type: - -@example -DBL_MANT_DIG 53 -DBL_DIG 15 -DBL_MIN_EXP -1021 -DBL_MIN_10_EXP -307 -DBL_MAX_EXP 1024 -DBL_MAX_10_EXP 308 -DBL_MAX 1.7976931348623157E+308 -DBL_MIN 2.2250738585072014E-308 -DBL_EPSILON 2.2204460492503131E-016 -@end example |