summary refs log tree commit diff
path: root/manual/io.texi
blob: 180571f1e8855bf6a8c36666204be7b1cd8ac80b (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
@node Input/Output Overview
@chapter Input/Output Overview

Most programs need to do either input (reading data) or output (writing
data), or most frequently both, in order to do anything useful.  The GNU
C library provides such a large selection of input and output functions
that the hardest part is often deciding which function is most
appropriate!

This chapter introduces concepts and terminology relating to input
and output.  Other chapters relating to the GNU I/O facilities are:

@itemize @bullet
@item
@ref{Input/Output on Streams}, which covers the high-level functions
that operate on streams, including formatted input and output.

@item
@ref{Low-Level Input/Output}, which covers the basic I/O and control
functions on file descriptors.

@item
@ref{File System Interface}, which covers functions for operating on
directories and for manipulating file attributes such as access modes
and ownership.

@item
@ref{Pipes and FIFOs}, which includes information on the basic interprocess
communication facilities.

@item
@ref{Sockets}, covering a more complicated interprocess communication
facility with support for networking.

@item
@ref{Low-Level Terminal Interface}, which covers functions for changing
how input and output to terminal or other serial devices are processed.
@end itemize


@menu
* Input/Output Concepts::		Some basic information and terminology.
* File Names::				How to refer to a file.
@end menu

@node Input/Output Concepts
@section Input/Output Concepts

Before you can read or write the contents of a file, you must establish
a connection or communications channel to the file.  This process is
called @dfn{opening} the file.  You can open a file for reading, writing,
or both.
@cindex opening a file

The connection to an open file is represented either as a stream or as a
file descriptor.  You pass this as an argument to the functions that do
the actual read or write operations, to tell them which file to operate
on.  Certain functions expect streams, and others are designed to
operate on file descriptors.

When you have finished reading to or writing from the file, you can
terminate the connection by @dfn{closing} the file.  Once you have
closed a stream or file descriptor, you cannot do any more input or
output operations on it.

@menu
* Streams and File Descriptors::	The GNU Library provides two ways
					 to access the contents of files.
* File Position::
@end menu

@node Streams and File Descriptors
@subsection Streams and File Descriptors

When you want to do input or output to a file, you have a choice of two
basic mechanisms for representing the connection between your program
and the file: file descriptors and streams.  File descriptors are
represented as objects of type @code{int}, while streams are represented
as @code{FILE *} objects.

File descriptors provide a primitive, low-level interface to input and
output operations.  Both file descriptors and streams can represent a
connection to a device (such as a terminal), or a pipe or socket for
communicating with another process, as well as a normal file.  But, if
you want to do control operations that are specific to a particular kind
of device, you must use a file descriptor; there are no facilities to
use streams in this way.  You must also use file descriptors if your
program needs to do input or output in special modes, such as
nonblocking (or polled) input (@pxref{File Status Flags}).

Streams provide a higher-level interface, layered on top of the
primitive file descriptor facilities.  The stream interface treats all
kinds of files pretty much alike---the sole exception being the three
styles of buffering that you can choose (@pxref{Stream Buffering}).

The main advantage of using the stream interface is that the set of
functions for performing actual input and output operations (as opposed
to control operations) on streams is much richer and more powerful than
the corresponding facilities for file descriptors.  The file descriptor
interface provides only simple functions for transferring blocks of
characters, but the stream interface also provides powerful formatted
input and output functions (@code{printf} and @code{scanf}) as well as
functions for character- and line-oriented input and output.

Since streams are implemented in terms of file descriptors, you can
extract the file descriptor from a stream and perform low-level
operations directly on the file descriptor.  You can also initially open
a connection as a file descriptor and then make a stream associated with
that file descriptor.

In general, you should stick with using streams rather than file
descriptors, unless there is some specific operation you want to do that
can only be done on a file descriptor.  If you are a beginning
programmer and aren't sure what functions to use, we suggest that you
concentrate on the formatted input functions (@pxref{Formatted Input})
and formatted output functions (@pxref{Formatted Output}).

If you are concerned about portability of your programs to systems other
than GNU, you should also be aware that file descriptors are not as
portable as streams.  You can expect any system running ANSI C to
support streams, but non-GNU systems may not support file descriptors at
all, or may only implement a subset of the GNU functions that operate on
file descriptors.  Most of the file descriptor functions in the GNU
library are included in the POSIX.1 standard, however.

@node File Position
@subsection File Position 

One of the attributes of an open file is its @dfn{file position}
that keeps track of where in the file the next character is to be read
or written.  In the GNU system, the file position is simply an integer
representing the number of bytes from the beginning of the file.

The file position is normally set to the beginning of the file when it
is opened, and each time a character is read or written, the file
position is incremented.  In other words, access to the file is normally
@dfn{sequential}.
@cindex file position
@cindex sequential-access files

Ordinary files permit read or write operations at any position within
the file.  Some other kinds of files may also permit this.  Files which
do permit this are sometimes referred to as @dfn{random-access} files.
You can change the file position using the @code{fseek} function on a
stream (@pxref{File Positioning}) or the @code{lseek} function on a file
descriptor (@pxref{I/O Primitives}).  If you try to change the file
position on a file that doesn't support random access, you get an error.
@cindex random-access files

Streams and descriptors that are opened for @dfn{append access} are
treated specially for output: output to such files is @emph{always}
appended sequentially to the @emph{end} of the file, regardless of the
file position.  But, the file position is still used to control where in
the file reading is done.
@cindex append-access files

If you'll think about it, you'll realize that several programs can read
a given file at the same time.  In order for each program to be able to
read the file at its own pace, each program must have its own file
pointer, which is not affected by anything the other programs do.

In fact, each opening of  a file creates a separate file position.  
Thus, if you open a file twice even in the same program, you get two
streams or descriptors with independent file positions.

By contrast, if you open a descriptor and then duplicate it to get 
another descriptor, these two descriptors share the same file position:
changing the file position of one descriptor will affect the other.

@node File Names
@section File Names

In order to open a connection to a file, or to perform other operations
such as deleting a file, you need some way to refer to the file.  Nearly
all files have names that are strings---even files which are actually
devices such as tape drives or terminals.  These strings are called
@dfn{file names}.  You specify the file name to say which file you want
to open or operate on.

This section describes the conventions for file names and how the
operating system works with them.
@cindex file name

@menu
* Directories::			Directories contain entries for files.
* File Name Resolution::	A file name specifies how to look up a file.
* File Name Errors::		Error conditions relating to file names.
* Portability of File Names::
@end menu


@node Directories
@subsection Directories

In order to understand the syntax of file names, you need to understand
how the file system is organized into a hierarchy of directories.

@cindex directory
@cindex link
@cindex directory entry
A @dfn{directory} is a file that contains information to associate other
files with names; these associations are called @dfn{links} or
@dfn{directory entries}.  Sometimes, people speak of ``files in a
directory'', but in reality, a directory only contains pointers to
files, not the files themselves.

@cindex file name component
The name of a file contained in a directory entry is called a @dfn{file
name component}.  In general, a file name consists of a sequence of one
or more such components, separated by the slash character (@samp{/}).  A
file name which is just one component names a file with respect to its
directory.  A file name with multiple components names a directory, and
then a file in that directory, and so on.

Some other documents, such as the POSIX standard, use the term
@dfn{pathname} for what we call a file name, and either
@dfn{filename} or @dfn{pathname component} for what this manual calls a
file name component.  We don't use this terminology because a ``path''
is something completely different (a list of directories to search), and
we think that ``pathname'' used for something else will confuse users.
We always use ``file name'' and ``file name component'' (or sometimes
just ``component'', where the context is obvious) in GNU documentation.

You can find more detailed information about operations on directories
in @ref{File System Interface}.

@node File Name Resolution
@subsection File Name Resolution

A file name consists of file name components separated by slash
(@samp{/}) characters.  Multiple successive @samp{/} characters are
equivalent to a single @samp{/} character.

@cindex file name resolution
The process of determining what file a file name refers to is called
@dfn{file name resolution}.  This is performed by examining the
components that make up a file name in left-to-right order, and locating
each successive component in the directory named by the previous
component.  Of course, each of the files that are referenced as
directories must actually exist, be directories instead of regular
files, and have the appropriate permissions to be accessible by the
process; otherwise the file name resolution fails.

@cindex root directory
@cindex absolute file name
If a file name begins with a @samp{/}, the first component in the file
name is located in the @dfn{root directory} of the process.  Such a file
name is called an @dfn{absolute file name}.

@cindex relative file name
Otherwise, the first component in the file name is located in the
current working directory (@pxref{Working Directory}).  This kind of
file name is called a @dfn{relative file name}.

@cindex parent directory
The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
have special meanings.  Every directory has entries for these file name
components.  The file name component @file{.} refers to the directory
itself, while the file name component @file{..} refers to its
@dfn{parent directory} (the directory that contains the link for the
directory in question).

Here are some examples of file names:

@table @file
@item /a
The file named @file{a}, in the root directory.

@item /a/b
The file named @file{b}, in the directory named @file{a} in the root directory.

@item a
The file named @file{a}, in the current working directory.

@item /a/./b
This is the same as @file{/a/b}.  

@item ./a
The file named @file{a}, in the current working directory.

@item ../a
The file named @file{a}, in the parent directory of the current working
directory.
@end table

A file name that names a directory may optionally end in a @samp{/}.  You
can specify a file name of @file{/} to refer to the root directory, but
you can't have an empty file name.  If you want to refer to the current
working directory, use a file name of @file{.} or @file{./}.

Unlike some other operating systems, the GNU system doesn't have any
built-in support for file types (or extensions) or file versions as part
of its file name syntax.  Many programs and utilities use conventions
for file names---for example, files containing C source code usually
have names suffixed with @samp{.c}---but there is nothing in the file
system itself that enforces this kind of convention.

@node File Name Errors
@subsection File Name Errors

@cindex file name syntax errors
@cindex usual file name syntax errors

Functions that accept file name arguments usually detect these
@code{errno} error conditions relating to file name syntax.  These
errors are referred to throughout this manual as the @dfn{usual file
name syntax errors}.

@table @code
@item EACCES
The process does not have search permission for a directory component 
of the file name.

@item ENAMETOOLONG
This error is used when either the the total length of a file name is
greater than @code{PATH_MAX}, or when an individual file name component
has a length greater than @code{NAME_MAX}.  @xref{File System Parameters}.
@c ??? Do we really have these limits?

@item ENOENT
This error is reported when a file referenced as a directory component
in the file name doesn't exist.  It also is used when an empty file name
string is supplied.

@item ENOTDIR
A file that is referenced as a directory component in the file name
exists, but it isn't a directory.
@end table


@node Portability of File Names
@subsection Portability of File Names

The rules for the syntax of file names discussed in @ref{File Names},
are the rules normally used by the GNU system and by other POSIX
systems.  However, other operating systems may use other conventions.

There are two reasons why it can be important for you to be aware of
file name portability issues:

@itemize @bullet
@item 
If your program makes assumptions about file name syntax, or contains
embedded literal file name strings, it is more difficult to get it to
run under other operating systems that use different syntax conventions.

@item
Even if you are not concerned about running your program on machines
that run other operating systems, it may still be possible to access
files that use different naming conventions.  For example, you may be
able to access file systems on another computer running a different
operating system over a network, or read and write disks in formats used
by other operating systems.
@end itemize

The ANSI C standard says very little about file name syntax, only that
file names are strings.  In addition to varying restrictions on the
length of file names and what characters can validly appear in a file
name, different operating systems use different conventions and syntax
for concepts such as structured directories and file types or
extensions.  Some concepts such as file versions might be supported in
some operating systems and not by others.

The POSIX.1 standard allows implementations to put additional
restrictions on file name syntax, concerning what characters are
permitted in file names and on the length of file name and file name
component strings.  However, in the GNU system, you do not need to worry
about these restrictions; any character except the null character is
permitted in a file name string, and there are no limits on the length
of file name strings.