@node Representation Limits @chapter Representation Limits This chapter contains information about constants and parameters that characterize the representation of the various integer and floating-point types supported by the GNU C library. @menu * Integer Representation Limits:: * Floating-Point Limits:: @end menu @node Integer Representation Limits @section Integer Representation Limits @cindex integer representation limits @cindex representation limits, integer @cindex limits, integer representation Sometimes it is necessary for programs to know about the internal representation of various integer subtypes. For example, if you want your program to be careful not to overflow an @code{int} counter variable, you need to know what the largest representable value that fits in an @code{int} is. These kinds of parameters can vary from compiler to compiler and machine to machine. Another typical use of this kind of parameter is in conditionalizing data structure definitions with @samp{#ifdef} to select the most appropriate integer subtype that can represent the required range of values. Macros representing the minimum and maximum limits of the integer types are defined in the header file @file{limits.h}. The values of these macros are all integer constant expressions. @pindex limits.h @comment limits.h @comment ANSI @deftypevr Macro int CHAR_BIT This is the number of bits in a @code{char}, usually eight. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int SCHAR_MIN This is the minimum value that can be represented by a @code{signed char}. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int SCHAR_MAX This is the maximum value that can be represented by a @code{signed char}. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int UCHAR_MAX This is the maximum value that can be represented by a @code{unsigned char}. (The minimum value of an @code{unsigned char} is zero.) @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int CHAR_MIN This is the minimum value that can be represented by a @code{char}. It's equal to @code{SCHAR_MIN} if @code{char} is signed, or zero otherwise. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int CHAR_MAX This is the maximum value that can be represented by a @code{char}. It's equal to @code{SCHAR_MAX} if @code{char} is signed, or @code{UCHAR_MAX} otherwise. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int SHRT_MIN This is the minimum value that can be represented by a @code{signed short int}. On most machines that the GNU C library runs on, @code{short} integers are 16-bit quantities. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int SHRT_MAX This is the maximum value that can be represented by a @code{signed short int}. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int USHRT_MAX This is the maximum value that can be represented by an @code{unsigned short int}. (The minimum value of an @code{unsigned short int} is zero.) @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int INT_MIN This is the minimum value that can be represented by a @code{signed int}. On most machines that the GNU C system runs on, an @code{int} is a 32-bit quantity. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro int INT_MAX This is the maximum value that can be represented by a @code{signed int}. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro {unsigned int} UINT_MAX This is the maximum value that can be represented by an @code{unsigned int}. (The minimum value of an @code{unsigned int} is zero.) @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro {long int} LONG_MIN This is the minimum value that can be represented by a @code{signed long int}. On most machines that the GNU C system runs on, @code{long} integers are 32-bit quantities, the same size as @code{int}. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro {long int} LONG_MAX This is the maximum value that can be represented by a @code{signed long int}. @end deftypevr @comment limits.h @comment ANSI @deftypevr Macro {unsigned long int} ULONG_MAX This is the maximum value that can be represented by an @code{unsigned long int}. (The minimum value of an @code{unsigned long int} is zero.) @end deftypevr @strong{Incomplete:} There should be corresponding limits for the GNU C Compiler's @code{long long} type, too. (But they are not now present in the header file.) The header file @file{limits.h} also defines some additional constants that parameterize various operating system and file system limits. These constants are described in @ref{System Parameters} and @ref{File System Parameters}. @pindex limits.h @node Floating-Point Limits @section Floating-Point Limits @cindex floating-point number representation @cindex representation, floating-point number @cindex limits, floating-point representation Because floating-point numbers are represented internally as approximate quantities, algorithms for manipulating floating-point data often need to be parameterized in terms of the accuracy of the representation. Some of the functions in the C library itself need this information; for example, the algorithms for printing and reading floating-point numbers (@pxref{Input/Output on Streams}) and for calculating trigonometric and irrational functions (@pxref{Mathematics}) use information about the underlying floating-point representation to avoid round-off error and loss of accuracy. User programs that implement numerical analysis techniques also often need to be parameterized in this way in order to minimize or compute error bounds. The specific representation of floating-point numbers varies from machine to machine. The GNU C library defines a set of parameters which characterize each of the supported floating-point representations on a particular system. @menu * Floating-Point Representation:: Definitions of terminology. * Floating-Point Parameters:: Descriptions of the library facilities. * IEEE Floating Point:: An example of a common representation. @end menu @node Floating-Point Representation @subsection Floating-Point Representation This section introduces the terminology used to characterize the representation of floating-point numbers. You are probably already familiar with most of these concepts in terms of scientific or exponential notation for floating-point numbers. For example, the number @code{123456.0} could be expressed in exponential notation as @code{1.23456e+05}, a shorthand notation indicating that the mantissa @code{1.23456} is multiplied by the base @code{10} raised to power @code{5}. More formally, the internal representation of a floating-point number can be characterized in terms of the following parameters: @itemize @bullet @item The @dfn{sign} is either @code{-1} or @code{1}. @cindex sign (of floating-point number) @item The @dfn{base} or @dfn{radix} for exponentiation; an integer greater than @code{1}. This is a constant for the particular representation. @cindex base (of floating-point number) @cindex radix (of floating-point number) @item The @dfn{exponent} to which the base is raised. The upper and lower bounds of the exponent value are constants for the particular representation. @cindex exponent (of floating-point number) Sometimes, in the actual bits representing the floating-point number, the exponent is @dfn{biased} by adding a constant to it, to make it always be represented as an unsigned quantity. This is only important if you have some reason to pick apart the bit fields making up the floating-point number by hand, which is something for which the GNU library provides no support. So this is ignored in the discussion that follows. @cindex bias (of floating-point number exponent) @item The value of the @dfn{mantissa} or @dfn{significand}, which is an unsigned integer. @cindex mantissa (of floating-point number) @cindex significand (of floating-point number) @item The @dfn{precision} of the mantissa. If the base of the representation is @var{b}, then the precision is the number of base-@var{b} digits in the mantissa. This is a constant for the particular representation. Many floating-point representations have an implicit @dfn{hidden bit} in the mantissa. Any such hidden bits are counted in the precision. Again, the GNU library provides no facilities for dealing with such low-level aspects of the representation. @cindex precision (of floating-point number) @cindex hidden bit (of floating-point number mantissa) @end itemize The mantissa of a floating-point number actually represents an implicit fraction whose denominator is the base raised to the power of the precision. Since the largest representable mantissa is one less than this denominator, the value of the fraction is always strictly less than @code{1}. The mathematical value of a floating-point number is then the product of this fraction; the sign; and the base raised to the exponent. If the floating-point number is @dfn{normalized}, the mantissa is also greater than or equal to the base raised to the power of one less than the precision (unless the number represents a floating-point zero, in which case the mantissa is zero). The fractional quantity is therefore greater than or equal to @code{1/@var{b}}, where @var{b} is the base. @cindex normalized floating-point number @node Floating-Point Parameters @subsection Floating-Point Parameters @strong{Incomplete:} This section needs some more concrete examples of what these parameters mean and how to use them in a program. These macro definitions can be accessed by including the header file @file{float.h} in your program. @pindex float.h Macro names starting with @samp{FLT_} refer to the @code{float} type, while names beginning with @samp{DBL_} refer to the @code{double} type and names beginning with @samp{LDBL_} refer to the @code{long double} type. (In implementations that do not support @code{long double} as a distinct data type, the values for those constants are the same as the corresponding constants for the @code{double} type.)@refill @cindex @code{float} representation limits @cindex @code{double} representation limits @cindex @code{long double} representation limits Of these macros, only @code{FLT_RADIX} is guaranteed to be a constant expression. The other macros listed here cannot be reliably used in places that require constant expressions, such as @samp{#if} preprocessing directives or array size specifications. Although the ANSI C standard specifies minimum and maximum values for most of these parameters, the GNU C implementation uses whatever floating-point representations are supported by the underlying hardware. So whether GNU C actually satisfies the ANSI C requirements depends on what machine it is running on. @comment float.h @comment ANSI @deftypevr Macro int FLT_ROUNDS This value characterizes the rounding mode for floating-point addition. The following values indicate standard rounding modes: @table @code @item -1 The mode is indeterminable. @item 0 Rounding is towards zero. @item 1 Rounding is to the nearest number. @item 2 Rounding is towards positive infinity. @item 3 Rounding is towards negative infinity. @end table @noindent Any other value represents a machine-dependent nonstandard rounding mode. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_RADIX This is the value of the base, or radix, of exponent representation. This is guaranteed to be a constant expression, unlike the other macros described in this section. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_MANT_DIG This is the number of base-@code{FLT_RADIX} digits in the floating-point mantissa for the @code{float} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int DBL_MANT_DIG This is the number of base-@code{FLT_RADIX} digits in the floating-point mantissa for the @code{double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int LDBL_MANT_DIG This is the number of base-@code{FLT_RADIX} digits in the floating-point mantissa for the @code{long double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_DIG This is the number of decimal digits of precision for the @code{float} data type. Technically, if @var{p} and @var{b} are the precision and base (respectively) for the representation, then the decimal precision @var{q} is the maximum number of decimal digits such that any floating point number with @var{q} base 10 digits can be rounded to a floating point number with @var{p} base @var{b} digits and back again, without change to the @var{q} decimal digits. The value of this macro is guaranteed to be at least @code{6}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int DBL_DIG This is similar to @code{FLT_DIG}, but is for the @code{double} data type. The value of this macro is guaranteed to be at least @code{10}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int LDBL_DIG This is similar to @code{FLT_DIG}, but is for the @code{long double} data type. The value of this macro is guaranteed to be at least @code{10}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_MIN_EXP This is the minimum negative integer such that the mathematical value @code{FLT_RADIX} raised to this power minus 1 can be represented as a normalized floating-point number of type @code{float}. In terms of the actual implementation, this is just the smallest value that can be represented in the exponent field of the number. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int DBL_MIN_EXP This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int LDBL_MIN_EXP This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_MIN_10_EXP This is the minimum negative integer such that the mathematical value @code{10} raised to this power minus 1 can be represented as a normalized floating-point number of type @code{float}. This is guaranteed to be no greater than @code{-37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int DBL_MIN_10_EXP This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int LDBL_MIN_10_EXP This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_MAX_EXP This is the maximum negative integer such that the mathematical value @code{FLT_RADIX} raised to this power minus 1 can be represented as a floating-point number of type @code{float}. In terms of the actual implementation, this is just the largest value that can be represented in the exponent field of the number. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int DBL_MAX_EXP This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int LDBL_MAX_EXP This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int FLT_MAX_10_EXP This is the maximum negative integer such that the mathematical value @code{10} raised to this power minus 1 can be represented as a normalized floating-point number of type @code{float}. This is guaranteed to be at least @code{37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int DBL_MAX_10_EXP This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro int LDBL_MAX_10_EXP This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long double} data type. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro double FLT_MAX The value of this macro is the maximum representable floating-point number of type @code{float}, and is guaranteed to be at least @code{1E+37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro double DBL_MAX The value of this macro is the maximum representable floating-point number of type @code{double}, and is guaranteed to be at least @code{1E+37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro {long double} LDBL_MAX The value of this macro is the maximum representable floating-point number of type @code{long double}, and is guaranteed to be at least @code{1E+37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro double FLT_MIN The value of this macro is the minimum normalized positive floating-point number that is representable by type @code{float}, and is guaranteed to be no more than @code{1E-37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro double DBL_MIN The value of this macro is the minimum normalized positive floating-point number that is representable by type @code{double}, and is guaranteed to be no more than @code{1E-37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro {long double} LDBL_MIN The value of this macro is the minimum normalized positive floating-point number that is representable by type @code{long double}, and is guaranteed to be no more than @code{1E-37}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro double FLT_EPSILON This is the minimum positive floating-point number of type @code{float} such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to be no greater than @code{1E-5}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro double DBL_EPSILON This is similar to @code{FLT_EPSILON}, but is for the @code{double} type. The maximum value is @code{1E-9}. @end deftypevr @comment float.h @comment ANSI @deftypevr Macro {long double} LDBL_EPSILON This is similar to @code{FLT_EPSILON}, but is for the @code{long double} type. The maximum value is @code{1E-9}. @end deftypevr @node IEEE Floating Point @subsection IEEE Floating Point @cindex IEEE floating-point representation @cindex floating-point, IEEE @cindex IEEE Std 754 Here is an example showing how these parameters work for a common floating point representation, specified by the @cite{IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}. Nearly all computers today use this format. The IEEE single-precision float representation uses a base of 2. There is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total precision is 24 base-2 digits), and an 8-bit exponent that can represent values in the range -125 to 128, inclusive. So, for an implementation that uses this representation for the @code{float} data type, appropriate values for the corresponding parameters are: @example FLT_RADIX 2 FLT_MANT_DIG 24 FLT_DIG 6 FLT_MIN_EXP -125 FLT_MIN_10_EXP -37 FLT_MAX_EXP 128 FLT_MAX_10_EXP +38 FLT_MIN 1.17549435E-38F FLT_MAX 3.40282347E+38F FLT_EPSILON 1.19209290E-07F @end example Here are the values for the @code{double} data type: @example DBL_MANT_DIG 53 DBL_DIG 15 DBL_MIN_EXP -1021 DBL_MIN_10_EXP -307 DBL_MAX_EXP 1024 DBL_MAX_10_EXP 308 DBL_MAX 1.7976931348623157E+308 DBL_MIN 2.2250738585072014E-308 DBL_EPSILON 2.2204460492503131E-016 @end example