about summary refs log tree commit diff
path: root/manual/io.texi
diff options
context:
space:
mode:
Diffstat (limited to 'manual/io.texi')
-rw-r--r--manual/io.texi396
1 files changed, 396 insertions, 0 deletions
diff --git a/manual/io.texi b/manual/io.texi
new file mode 100644
index 0000000000..84fd0a9e44
--- /dev/null
+++ b/manual/io.texi
@@ -0,0 +1,396 @@
+@node I/O Overview, I/O on Streams, Pattern Matching, Top
+@chapter Input/Output Overview
+
+Most programs need to do either input (reading data) or output (writing
+data), or most frequently both, in order to do anything useful.  The GNU
+C library provides such a large selection of input and output functions
+that the hardest part is often deciding which function is most
+appropriate!
+
+This chapter introduces concepts and terminology relating to input
+and output.  Other chapters relating to the GNU I/O facilities are:
+
+@itemize @bullet
+@item
+@ref{I/O on Streams}, which covers the high-level functions
+that operate on streams, including formatted input and output.
+
+@item
+@ref{Low-Level I/O}, which covers the basic I/O and control
+functions on file descriptors.
+
+@item
+@ref{File System Interface}, which covers functions for operating on
+directories and for manipulating file attributes such as access modes
+and ownership.
+
+@item
+@ref{Pipes and FIFOs}, which includes information on the basic interprocess
+communication facilities.
+
+@item
+@ref{Sockets}, which covers a more complicated interprocess communication
+facility with support for networking.
+
+@item
+@ref{Low-Level Terminal Interface}, which covers functions for changing
+how input and output to terminal or other serial devices are processed.
+@end itemize
+
+
+@menu
+* I/O Concepts::       Some basic information and terminology.
+* File Names::         How to refer to a file.
+@end menu
+
+@node I/O Concepts, File Names,  , I/O Overview
+@section Input/Output Concepts
+
+Before you can read or write the contents of a file, you must establish
+a connection or communications channel to the file.  This process is
+called @dfn{opening} the file.  You can open a file for reading, writing,
+or both.
+@cindex opening a file
+
+The connection to an open file is represented either as a stream or as a
+file descriptor.  You pass this as an argument to the functions that do
+the actual read or write operations, to tell them which file to operate
+on.  Certain functions expect streams, and others are designed to
+operate on file descriptors.
+
+When you have finished reading to or writing from the file, you can
+terminate the connection by @dfn{closing} the file.  Once you have
+closed a stream or file descriptor, you cannot do any more input or
+output operations on it.
+
+@menu
+* Streams and File Descriptors::    The GNU Library provides two ways
+			             to access the contents of files.
+* File Position::                   The number of bytes from the
+                                     beginning of the file.
+@end menu
+
+@node Streams and File Descriptors, File Position,  , I/O Concepts
+@subsection Streams and File Descriptors
+
+When you want to do input or output to a file, you have a choice of two
+basic mechanisms for representing the connection between your program
+and the file: file descriptors and streams.  File descriptors are
+represented as objects of type @code{int}, while streams are represented
+as @code{FILE *} objects.
+
+File descriptors provide a primitive, low-level interface to input and
+output operations.  Both file descriptors and streams can represent a
+connection to a device (such as a terminal), or a pipe or socket for
+communicating with another process, as well as a normal file.  But, if
+you want to do control operations that are specific to a particular kind
+of device, you must use a file descriptor; there are no facilities to
+use streams in this way.  You must also use file descriptors if your
+program needs to do input or output in special modes, such as
+nonblocking (or polled) input (@pxref{File Status Flags}).
+
+Streams provide a higher-level interface, layered on top of the
+primitive file descriptor facilities.  The stream interface treats all
+kinds of files pretty much alike---the sole exception being the three
+styles of buffering that you can choose (@pxref{Stream Buffering}).
+
+The main advantage of using the stream interface is that the set of
+functions for performing actual input and output operations (as opposed
+to control operations) on streams is much richer and more powerful than
+the corresponding facilities for file descriptors.  The file descriptor
+interface provides only simple functions for transferring blocks of
+characters, but the stream interface also provides powerful formatted
+input and output functions (@code{printf} and @code{scanf}) as well as
+functions for character- and line-oriented input and output.
+@c !!! glibc has dprintf, which lets you do printf on an fd.
+
+Since streams are implemented in terms of file descriptors, you can
+extract the file descriptor from a stream and perform low-level
+operations directly on the file descriptor.  You can also initially open
+a connection as a file descriptor and then make a stream associated with
+that file descriptor.
+
+In general, you should stick with using streams rather than file
+descriptors, unless there is some specific operation you want to do that
+can only be done on a file descriptor.  If you are a beginning
+programmer and aren't sure what functions to use, we suggest that you
+concentrate on the formatted input functions (@pxref{Formatted Input})
+and formatted output functions (@pxref{Formatted Output}).
+
+If you are concerned about portability of your programs to systems other
+than GNU, you should also be aware that file descriptors are not as
+portable as streams.  You can expect any system running ANSI C to
+support streams, but non-GNU systems may not support file descriptors at
+all, or may only implement a subset of the GNU functions that operate on
+file descriptors.  Most of the file descriptor functions in the GNU
+library are included in the POSIX.1 standard, however.
+
+@node File Position,  , Streams and File Descriptors, I/O Concepts
+@subsection File Position 
+
+One of the attributes of an open file is its @dfn{file position} that
+keeps track of where in the file the next character is to be read or
+written.  In the GNU system, and all POSIX.1 systems, the file position
+is simply an integer representing the number of bytes from the beginning
+of the file.
+
+The file position is normally set to the beginning of the file when it
+is opened, and each time a character is read or written, the file
+position is incremented.  In other words, access to the file is normally
+@dfn{sequential}.
+@cindex file position
+@cindex sequential-access files
+
+Ordinary files permit read or write operations at any position within
+the file.  Some other kinds of files may also permit this.  Files which
+do permit this are sometimes referred to as @dfn{random-access} files.
+You can change the file position using the @code{fseek} function on a
+stream (@pxref{File Positioning}) or the @code{lseek} function on a file
+descriptor (@pxref{I/O Primitives}).  If you try to change the file
+position on a file that doesn't support random access, you get the
+@code{ESPIPE} error.
+@cindex random-access files
+
+Streams and descriptors that are opened for @dfn{append access} are
+treated specially for output: output to such files is @emph{always}
+appended sequentially to the @emph{end} of the file, regardless of the
+file position.  However, the file position is still used to control where in
+the file reading is done.
+@cindex append-access files
+
+If you think about it, you'll realize that several programs can read a
+given file at the same time.  In order for each program to be able to
+read the file at its own pace, each program must have its own file
+pointer, which is not affected by anything the other programs do.
+
+In fact, each opening of a file creates a separate file position.  
+Thus, if you open a file twice even in the same program, you get two
+streams or descriptors with independent file positions.
+
+By contrast, if you open a descriptor and then duplicate it to get 
+another descriptor, these two descriptors share the same file position:
+changing the file position of one descriptor will affect the other.
+
+@node File Names,  , I/O Concepts, I/O Overview
+@section File Names
+
+In order to open a connection to a file, or to perform other operations
+such as deleting a file, you need some way to refer to the file.  Nearly
+all files have names that are strings---even files which are actually
+devices such as tape drives or terminals.  These strings are called
+@dfn{file names}.  You specify the file name to say which file you want
+to open or operate on.
+
+This section describes the conventions for file names and how the
+operating system works with them.
+@cindex file name
+
+@menu
+* Directories::                 Directories contain entries for files.
+* File Name Resolution::        A file name specifies how to look up a file.
+* File Name Errors::            Error conditions relating to file names.
+* File Name Portability::       File name portability and syntax issues.
+@end menu
+
+
+@node Directories, File Name Resolution,  , File Names
+@subsection Directories
+
+In order to understand the syntax of file names, you need to understand
+how the file system is organized into a hierarchy of directories.
+
+@cindex directory
+@cindex link
+@cindex directory entry
+A @dfn{directory} is a file that contains information to associate other
+files with names; these associations are called @dfn{links} or
+@dfn{directory entries}.  Sometimes, people speak of ``files in a
+directory'', but in reality, a directory only contains pointers to
+files, not the files themselves.
+
+@cindex file name component
+The name of a file contained in a directory entry is called a @dfn{file
+name component}.  In general, a file name consists of a sequence of one
+or more such components, separated by the slash character (@samp{/}).  A
+file name which is just one component names a file with respect to its
+directory.  A file name with multiple components names a directory, and
+then a file in that directory, and so on.
+
+Some other documents, such as the POSIX standard, use the term
+@dfn{pathname} for what we call a file name, and either @dfn{filename}
+or @dfn{pathname component} for what this manual calls a file name
+component.  We don't use this terminology because a ``path'' is
+something completely different (a list of directories to search), and we
+think that ``pathname'' used for something else will confuse users.  We
+always use ``file name'' and ``file name component'' (or sometimes just
+``component'', where the context is obvious) in GNU documentation.  Some
+macros use the POSIX terminology in their names, such as
+@code{PATH_MAX}.  These macros are defined by the POSIX standard, so we
+cannot change their names.
+
+You can find more detailed information about operations on directories
+in @ref{File System Interface}.
+
+@node File Name Resolution, File Name Errors, Directories, File Names
+@subsection File Name Resolution
+
+A file name consists of file name components separated by slash
+(@samp{/}) characters.  On the systems that the GNU C library supports,
+multiple successive @samp{/} characters are equivalent to a single
+@samp{/} character.
+
+@cindex file name resolution
+The process of determining what file a file name refers to is called
+@dfn{file name resolution}.  This is performed by examining the
+components that make up a file name in left-to-right order, and locating
+each successive component in the directory named by the previous
+component.  Of course, each of the files that are referenced as
+directories must actually exist, be directories instead of regular
+files, and have the appropriate permissions to be accessible by the
+process; otherwise the file name resolution fails.
+
+@cindex root directory
+@cindex absolute file name
+If a file name begins with a @samp{/}, the first component in the file
+name is located in the @dfn{root directory} of the process (usually all
+processes on the system have the same root directory).  Such a file name
+is called an @dfn{absolute file name}.
+@c !!! xref here to chroot, if we ever document chroot. -rm
+
+@cindex relative file name
+Otherwise, the first component in the file name is located in the
+current working directory (@pxref{Working Directory}).  This kind of
+file name is called a @dfn{relative file name}.
+
+@cindex parent directory
+The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
+have special meanings.  Every directory has entries for these file name
+components.  The file name component @file{.} refers to the directory
+itself, while the file name component @file{..} refers to its
+@dfn{parent directory} (the directory that contains the link for the
+directory in question).  As a special case, @file{..} in the root
+directory refers to the root directory itself, since it has no parent;
+thus @file{/..} is the same as @file{/}.
+
+Here are some examples of file names:
+
+@table @file
+@item /a
+The file named @file{a}, in the root directory.
+
+@item /a/b
+The file named @file{b}, in the directory named @file{a} in the root directory.
+
+@item a
+The file named @file{a}, in the current working directory.
+
+@item /a/./b
+This is the same as @file{/a/b}.  
+
+@item ./a
+The file named @file{a}, in the current working directory.
+
+@item ../a
+The file named @file{a}, in the parent directory of the current working
+directory.
+@end table
+
+@c An empty string may ``work'', but I think it's confusing to 
+@c try to describe it.  It's not a useful thing for users to use--rms.
+A file name that names a directory may optionally end in a @samp{/}.
+You can specify a file name of @file{/} to refer to the root directory,
+but the empty string is not a meaningful file name.  If you want to
+refer to the current working directory, use a file name of @file{.} or
+@file{./}.
+
+Unlike some other operating systems, the GNU system doesn't have any
+built-in support for file types (or extensions) or file versions as part
+of its file name syntax.  Many programs and utilities use conventions
+for file names---for example, files containing C source code usually
+have names suffixed with @samp{.c}---but there is nothing in the file
+system itself that enforces this kind of convention.
+
+@node File Name Errors, File Name Portability, File Name Resolution, File Names
+@subsection File Name Errors
+
+@cindex file name errors
+@cindex usual file name errors
+
+Functions that accept file name arguments usually detect these
+@code{errno} error conditions relating to the file name syntax or
+trouble finding the named file.  These errors are referred to throughout
+this manual as the @dfn{usual file name errors}.
+
+@table @code
+@item EACCES
+The process does not have search permission for a directory component 
+of the file name.
+
+@item ENAMETOOLONG
+This error is used when either the the total length of a file name is
+greater than @code{PATH_MAX}, or when an individual file name component
+has a length greater than @code{NAME_MAX}.  @xref{Limits for Files}.
+
+In the GNU system, there is no imposed limit on overall file name
+length, but some file systems may place limits on the length of a
+component.
+
+@item ENOENT
+This error is reported when a file referenced as a directory component
+in the file name doesn't exist, or when a component is a symbolic link
+whose target file does not exist.  @xref{Symbolic Links}.
+
+@item ENOTDIR
+A file that is referenced as a directory component in the file name
+exists, but it isn't a directory.
+
+@item ELOOP
+Too many symbolic links were resolved while trying to look up the file
+name.  The system has an arbitrary limit on the number of symbolic links
+that may be resolved in looking up a single file name, as a primitive
+way to detect loops.  @xref{Symbolic Links}.
+@end table
+
+
+@node File Name Portability,  , File Name Errors, File Names
+@subsection Portability of File Names
+
+The rules for the syntax of file names discussed in @ref{File Names},
+are the rules normally used by the GNU system and by other POSIX
+systems.  However, other operating systems may use other conventions.
+
+There are two reasons why it can be important for you to be aware of
+file name portability issues:
+
+@itemize @bullet
+@item 
+If your program makes assumptions about file name syntax, or contains
+embedded literal file name strings, it is more difficult to get it to
+run under other operating systems that use different syntax conventions.
+
+@item
+Even if you are not concerned about running your program on machines
+that run other operating systems, it may still be possible to access
+files that use different naming conventions.  For example, you may be
+able to access file systems on another computer running a different
+operating system over a network, or read and write disks in formats used
+by other operating systems.
+@end itemize
+
+The ANSI C standard says very little about file name syntax, only that
+file names are strings.  In addition to varying restrictions on the
+length of file names and what characters can validly appear in a file
+name, different operating systems use different conventions and syntax
+for concepts such as structured directories and file types or
+extensions.  Some concepts such as file versions might be supported in
+some operating systems and not by others.
+
+The POSIX.1 standard allows implementations to put additional
+restrictions on file name syntax, concerning what characters are
+permitted in file names and on the length of file name and file name
+component strings.  However, in the GNU system, you do not need to worry
+about these restrictions; any character except the null character is
+permitted in a file name string, and there are no limits on the length
+of file name strings.
+
+