about summary refs log tree commit diff
path: root/doc/tipideed.html
blob: 5d9a6003d9965b594cf492c5e192530155c1b6c4 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>tipidee: the tipideed program</title>
    <meta name="Description" content="tipidee: the tipideed program" />
    <meta name="Keywords" content="tipidee tipideed web server serving http skarnet.org skarnet software httpd" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">tipidee</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> The <tt>tipideed</tt> program </h1>

<p>
 <tt>tipideed</tt> is the binary that actually does what you want from
a web server package: it serves files over HTTP.
</p>

<div id="interface">
<h2> Interface </h2>
</div>

<pre>
     tipideed [ -f <em>cdbfile</em> ] [ -d <em>basedir</em> ] [ -R ] [ -U ]
</pre>

<ul>
 <li> tipideed reads a stream of HTTP (1.0 or 1.1) requests on its stdin, and tries
to fulfill them, sending answers to stdout and logs to stderr. </li>
 <li> tipideed only speaks plaintext HTTP. It supports HTTPS, but the TLS layer
must be handled upstream by a program such as
<a href="//skarnet.org/software/s6-networking/s6-tlsd.html">s6-tlsd</a>. </li>
 <li> tipideed stays alive until the client closes the connection, or times out,
or (in HTTP 1.1) sends a request with a <tt>Connection: close</tt> header; or an error
occurs that makes it nonsensical to keep the connection open. </li>
 <li> The documents it serves must be in subdirectories of its
current working directory, one subdirectory for every domain it hosts. </li>
</ul>

<div id="commonusage">
<h2> Common usage </h2>
</div>

<p>
 tipideed is intended to be run under a TCP super-server such as
<a href="//skarnet.org/software/s6-networking/s6-tcpserver.html">s6-tcpserver</a>,
for plain text HTTP, or
<a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a>,
for HTTPS. It delegates to the super-server the job of binding and listening to
the socket, accepting connections, spawning a separate process to handle a
given connection, and potentially establishing a TLS tunnel with the client for
secure communication.
</p>

<p>
 As such, a command line for tipideed, running as user <tt>www</tt>, listening
on address <tt>${ip}</tt>, would typically look like this, for HTTP:
</p>

<pre>
     s6-envuidgid www s6-tcpserver -U -- ${ip} 80 s6-tcpserver-access -- tipideed
</pre>

<p>
 or, for HTTPS:
</p>

<pre>
     s6-envuidgid www env KEYFILE=/path/to/private/key CERTFILE=/path/to/certificate s6-tlsserver -U -- ${ip} 443 tipideed
</pre>

<p>
 Most users will want to run these command lines as <em>services</em>, i.e. daemons
run in the background when the machine starts. The <tt>examples/</tt> subdirectory
of the tipidee package provides service templates to help you run tipideed under
<a href="https://wiki.gentoo.org/wiki/OpenRC">OpenRC</a>,
<a href="//skarnet.org/software/s6/">s6</a> and
<a href="//skarnet.org/software/s6-rc/">s6-rc</a>.
</p>

<div id="exitcodes">
<h2> Exit codes </h2>
</div>

<dl>
 <dt> 0 </dt> <dd> Clean exit. There was a successful series of HTTP exchanges,
that either tipideed or the client decided to end in a way that is
permitted by HTTP. </dd>
 <dt> 1 </dt> <dd> Illicit client behaviour. tipideed exited because it could
not serve the client in good faith. </dd>
 <dt> 2 </dt> <dd> Illicit CGI script behaviour. tipideed exited because the invoked
CGI script made it impossible to continue. Before exiting, tipideed likely has
sent a 502 (Bad Gateway) response to the client. </dd>
 <dt> 100 </dt> <dd> Bad usage. tipideed was run in an incorrect way: bad command
line options, or missing environment variables, etc. </dd>
 <dt> 101 </dt> <dd> Cannot happen. This signals a bug in tipideed, and comes with an
error message asking you to report the bug. Please do so, on the
<a href="//skarnet.org/lists/#skaware">skaware mailing-list</a>. </dd>
 <dt> 102 </dt> <dd> Misconfiguration. tipideed found something in its configuration
data or in the document layout that it does not like. This can happen, for
instance, when a document is a symbolic link pointing outside of the server's
root. </dd>
 <dt> 111 </dt> <dd> System call failed. This usually signals an issue with the
underlying operating system. Before exiting, if in the middle of processing
a request, tipideed likely has sent a 500 (Internal Server Error) response to
the client. </dd>
</dl>

<div id="environment">
<h2> Environment variables </h2>
</div>

<h3> Reading - mandatory </h3>

<p>
 tipideed expects the following variables in its environment, and will exit
with an error message if they are undefined. When tipideed is run under
<a href="//skarnet.org/software/s6-networking/s6-tcpserver.html">s6-tcpserver</a> or
<a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a>,
these variables are automatically set by the super-server. This is the way
tipidee gets its network information without having to perform network
operations itself.
</p>

<dl>
 <dt> PROTO </dt>
 <dd> The network protocol, normally <tt>TCP</tt>. </dd>

 <dt> TCPLOCALIP </dt>
 <dd> The IP address the server is bound to. It will be passed as <tt>SERVER_ADDR</tt>
to CGI scripts. </dd>

 <dt> TCPLOCALPORT </dt>
 <dd> The port the server is bound to. It will be passed as <tt>SERVER_PORT</tt>
to CGI scripts unless the requested URI explicitly mentions a different port. </dd>

 <dt> TCPREMOTEIP </dt>
 <dd> The IP address of the client. It will be passed as <tt>REMOTE_ADDR</tt>
to CGI scripts. </dd>

 <dt> TCPREMOTEPORT </dt>
 <dd> The port of the client socket. It will be passed as <tt>REMOTE_PORT</tt>
to CGI scripts. </dd>
</dl>

<h3> Reading - optional </h3>

<p>
 tipideed can function without these variables, but if they're present, it
uses them to get more information. They're typically obtained by calling
<a href="//skarnet.org/software/s6-networking/s6-tcpserver-access.html">s6-tcpserver-access</a>
before tipideed in the
<a href="//skarnet.org/software/s6-networking/s6-tcpserver.html">s6-tcpserver</a>
command line.
(For HTTPS, <a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a>
calls it implicitly.)
</p>

<dl>
 <dt> TCPLOCALHOST </dt>
 <dd> The default domain name associated to the local IP address. It will be
passed as <tt>SERVER_NAME</tt> to CGI scripts when the requested URI does
not mention a Host, i.e. in HTTP/1.0 requests without a full request URL.
If this variable is absent, the default will be set to the local IP address
itself (between square brackets if IPv6). </dd>

 <dt> TCPREMOTEHOST </dt>
 <dd> The domain name associated to the IP address of the client. It will
be passed as <tt>REMOTE_HOST</tt> to CGI scripts; if absent, the value of
<tt>TCPREMOTEIP</tt> will be used instead. </dd>

 <dt> TCPREMOTEINFO </dt>
 <dd> The name provided by an IDENT server running on the client, if any.
This is obsolete and not expected to be present; but if present, it will
be passed as <tt>REMOTE_IDENT</tt> to CGI scripts. </dd>

 <dt> SSL_PROTOCOL </dt>
 <dd> The version of the TLS protocol used to cipher communications between
the client and the server. If present, tipideed will assume that the client
connection is secure, and will pass <tt>HTTPS=on</tt> to CGI scripts;
otherwise, it will assume it is running plaintext HTTP. </dd>
</dl>

<h3> Writing </h3>

<p>
 When spawning a CGI or NPH script, tipideed clears all the previous variables,
so the passed environment is as close as possible to the environment of the
super-server; and it adds all the variables that are required by the
<a href="https://datatracker.ietf.org/doc/html/rfc3875#section-4.1">CGI 1.1
specification</a>. As an exception, it does not add PATH_TRANSLATED, which
cannot be used by CGI scripts in a portable way.
</p>

<div id="options">
<h2> Options </h2>
</div>

<dl>
 <dt> -f <em>file</em> </dt>
 <dd> Use <em>file</em> as the compiled configuration database, typically obtained
by running <tt><a href="tipidee-config.html">tipidee-config</a> -o <em>file</em></tt>.
The default is <tt>/etc/tipidee.conf.cdb</tt>; <tt>/etc</tt> may be something
else if the <tt>--sysconfdir</tt> option has been given to configure at
build time. </dd>

 <dt> -d <em>docroot</em> </dt>
 <dd> Change the working directory to <em>docroot</em> before serving. Default
is serving from the current working directory. Note that documents need to
be located in <strong>subdirectories</strong> of <em>docroot</em>, one subdirectory
per virtual domain tipideed is serving. </dd>

 <dt> -R </dt>
 <dd> chroot. If the underlying operating system has the
<a href="https://man7.org/linux/man-pages/man2/chroot.2.html">chroot()</a>
system call, use it before serving. This always happens <em>after</em> opening
the configuration database, <em>after</em> changing the working directory,
and <em>before</em> dropping privileges. The idea is that chrooting helps
with security, but the configuration database should be located outside of the
document space. </dd>

 <dt> -U </dt>
 <dd> Drop root privileges. If this option is given, tipideed expects two
additional environment variables, UID and GID, containing the uid and gid
it should run as; it will drop its privileges to $UID:$GID before serving.
This option is mainly useful when paired with <tt>-R</tt>, because chrooting
can only be performed as root, so root privileges need to be kept all the
way to tipideed then dropped after tipideed has chrooted. In a non-chrooted
setup, it is simpler and more secure to run the <em>super-server</em> with
the <tt>-U</tt> option instead: root privileges will be dropped as soon as
the super-server has bound to its socket, and all the subsequent operations,
including the spawning of tipideed processes, are performed as a normal user. </dd>
</dl>

<div id="docroot">
<h2> Document root </h2>
</div>

<p>
 The way to organize your documents so they can be served by tipideed
may look a little weird, but there's a logic to it.
</p>

<p>
 tipideed serves documents from subdirectories of its working directory,
and these subdirectories are named according to the host <em>and</em>
the port of the request.
</p>

<ul>
 <li> A request for <tt>https://example.com:1234/doc/u/ment</tt>
will result in a lookup in the filesystem for
<tt>./example.com:1234/doc/u/ment</tt>. </li>
 <li> A request for <tt>https://example.com/doc/u/ment</tt>
will result in a lookup in the filesystem for
<tt>./example.com:443/doc/u/ment</tt>. </li>
</ul>

<p>
The fact that the port is always specified allows you to have
different document sets for the same host on different ports:
more flexibility.
</p>

<p>
 However, most of the time, you <em>don't</em> want different
document sets for different ports. You want the same document
sets for ports 80 and 443, and that's it. And you don't want
to have both a <tt>domain example.com:80</tt> section and a
<tt>domain example.com:443</tt> section in your
<a href="tipidee.conf.html">/etc/tipidee.conf</a>, with
duplicate information.
</p>

<p>
 That is why you are allowed to make your document roots
<em>symbolic links</em>, and resource attributes declared in
the configuration file are always looked up with the
<em>canonical path</em>. In other words, the common case
would be:
</p>

<ul>
 <li> Have your document root in <tt>./example.com</tt>, a
real directory. </li>
 <li> Declare your resource attributes under a
<tt>domain example.com</tt> section in your configuration file. </li>
 <li> Have a <tt>./example.com:80</tt> symlink pointing to
<tt>example.com</tt>, if you want to serve <tt>example.com</tt>
under plaintext HTTP. </li>
 <li> Have a <tt>./example.com:443</tt> symlink pointing to
<tt>example.com</tt>, if you want to serve <tt>example.com</tt>
under HTTPS. </li>
</ul>

<p>
 This system allows you to share documents across virtual hosts
without fear of misconfiguration. You can symlink any document
under <tt>example.com</tt> to any name under <tt>example.org</tt>;
if the path via <tt>example.com</tt> is the canonical path, then
your resource will still get the correct attributes, defined in a
<tt>domain example.com</tt> section, even if it is accessed via an
<tt>example.org</tt> URL. You will not inadvertently expose source
code for CGI scripts, for instance.
</p>

<p>
 You can do wild things with symbolic links. However, anything
that does not resolve to a file in a document root under tipideed's
current working directory will be rejected. If an attacker symlinks
your <tt>/etc/passwd</tt> file, tipideed will keep it safe.
</p>

<p>
 HTTP/1.0 does not have the concepts of virtual hosts. For HTTP/1.0
requests that do not provide a full URL, tipideed will use the value
it reads from the TCPLOCALHOST variable, which is normally the result
of a reverse DNS lookup on the server's address. You can override the
lookup and provide your own value by giving the <tt>-l</tt> option to
<a href="//skarnet.org/software/s6-networking/s6-tcpserver-access.html">s6-tcpserver-access</a> or
<a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a>.
If TCPLOCALHOST does not exist or is empty, a fallback value of
<tt>@</tt> (at), will be used. So if you aren't calling
<a href="//skarnet.org/software/s6-networking/s6-tcpserver-access.html">s6-tcpserver-access</a>
at all, your documents will most likely be accessible for HTTP/1.0 clients under
<tt>@:80</tt> or <tt>@:443</tt>.
</p>

<div id="logging">
<h2> Logging </h2>
</div>

<ul>
 <li> tipideed uses stderr for all its logging. All its log lines are prefixed
with "<tt>tipideed: pid <em>pid</em>: </tt>". </li>
 <li> The log lines continue with "<tt>fatal: </tt>" for fatal error messages (meaning
that tipideed exits right after writing the message), or "<tt>warning: </tt>" for
warnings (meaning that tipideed continues operating after writing the message).
In normal operation, you should not see any fatal or warning line. </li>
 <li> In normal operation, tipideed can log informational lines, and the continuing
prefix is "<tt>info: </tt>". It can potentially log:
  <ul>
   <li> One line when it starts (i.e. a client has connected) </li>
   <li> Up to three lines for every request:
    <ul>
     <li> One when the request is received </li>
     <li> One when a suitable resource is found. In rare cases
(namely: the resource is a CGI script answering with a local redirection),
there may be more than one resource line. </li>
     <li> One when an answer is sent </li>
    </ul> </li>
   <li> One line when it exits normally </li>
  </ul> </li>
 <li> What to log is configured via the
<a href="tipidee.conf.html#log"><tt>log</tt></a> directive in the
<a href="tipidee.conf.html">configuration file</a>. By default, only
a minimal request line and an answer line are printed. </li>
 <li> The log format is designed to be readable by a human, but still
easily processable by automation. For instance, the regular prefix structure
makes it easy for <a href="//skarnet.org/software/s6/s6-log.html">s6-log</a>
to select different lines to send them to various backends for archiving or
processing. </li>
</ul>

<div id="details">
<h2> Detailed operation </h2>
</div>

<ul>
 <li> tipideed reads its <a href="tipidee-config.html">compiled</a>
configuration file. Then:
 <ul>
  <li> If the <tt>-d</tt> option has been given, it changes its working directory. </li>
  <li> If the <tt>-R</tt> option has been given, it chroots to its current directory. </li>
  <li> If the <tt>-U</tt> option has been given, it drops root privileges. </li>
 </ul> </li>
 <li> It checks that its environment is valid, and that its configuration has
some minimal defaults it can use. </li>
 <li> tipideed listens to a stream of HTTP requests on its standard input. For every
HTTP request:
  <ul>
   <li> It parses the request line and checks it's HTTP/1.0 or 1.1 </li>
   <li> It parses the headers into a quick access structure </li>
   <li> It checks header consistency with the request </li>
   <li> If the method is <tt>OPTIONS *</tt> or <tt>TRACE</tt>, it answers here
and continues the loop </li>
   <li> It reads the request body, if any </li>
   <li> It checks in its configuration if a redirection has been defined for
the wanted resource or a prefix (by directory) of the wanted resource. If it's
the case, it answers with that redirection and continues the loop. </li>
   <li> It looks for a suitable resource in the filesystem, completing the
request with index files if necessary, or extracting CGI PATH_INFO if
necessary </li>
   <li> It uses the canonical path of the resource in the filesystem to look
for resource attributes in its configuration. (Is this a CGI script? a NPH
script? Does it have a customized Content-Type? etc.) </li>
   <li> If the method is a targeted <tt>OPTIONS</tt>, it answers here and
continues the loop </li>
   <li> If the resource is a CGI script:
    <ul>
     <li> If it is an NPH script, tipideed execs into the script (possibly
after spawning a helper child if there is a request body to feed to the script)
with the appropriate environment;
and the connection will close when the script exits. </li>
     <li> Else, tipideed spawns the CGI script as a child with the appropriate
environment, feeds it the request body if any, reads its output, and answers
the client. </li>
     <li> If a problem occurs server-side, the client will receive a 502
answer ("Bad Gateway"), <em>and</em> tipideed will write an error message to
its stderr, so that administrators can see what went wrong with their setup.
tipideed trusts its CGI scripts more than its clients, but it does not give
them its full trust either &mdash; lots of sites are running third-party
backends. </li>
    </ul> </li>
   <li> Else, the resource is a regular ("static") file, and tipideed serves
it on its stdout, to the client. </li>
  </ul> </li>
 <li> tipideed exits on EOF (when the client closes the connection), or when
the client times out before sending a request, or after tipideed receives a
single HTTP/1.0 request, or when it has executed into an NPH script, or when
it has answered a request with a <tt>Connection: close</tt> header. It also
exits when it encounters an error making it likely that the client will have
no use for the connection anymore anyway and exiting is simpler and cheaper;
in which case tipideed adds <tt>Connection: close</tt> to its last answer. </li>
</ul>

<div id="performance">
<h2> Performance considerations </h2>
</div>

<p>
 On systems that implement
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html">posix_spawn()</a>,
the <a href="//skarnet.org/software/s6-networking/s6-tcpserver.html">s6-tcpserver</a>
super-server (and the
<a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a> one
as well, since both use the same underlying program) uses it instead of
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html">fork()</a>,
and that partly alleviates the performance penalty usually associated with servers
that spawn one process per connection.
</p>

<p>
 One of tipidee's stated goals is to explore what kind of performance is achievable for
a fully compliant Web server within the limits of that model. To that effect, tipideed
is meant to be <em>fast</em>. It should serve static files as fast as any server out
there, especially on Linux (or other systems supporting
<a href="https://man7.org/linux/man-pages/man2/sendfile.2.html">sendfile()</a> or
<a href="https://man7.org/linux/man-pages/man2/splice.2.html">splice()</a>) where it
uses zero-copy transfer. CGI performance should be limited by the performance of the
CGI script itself, never by tipideed.
</p>

<p>
 tipideed itself does not use
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html">fork()</a>
if the system supports
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html">posix_spawn()</a>
&mdash; with one exception, that you will not hit, and if you do, fork() will not
be the bottleneck. (Can you guess which case it is, without looking at the code?)
tipideed does not parse its configuration file itself, delegating the task to the
offline <a href="tipidee-config.html">tipidee-config</a> program and directly mapping
a binary file instead. To parse a client request, it uses a deterministic finite
automaton, only reading the request once, and only backtracking in pathological cases.
This should streamline request processing as much as possible.
</p>

<p>
 If you have benchmarks, results of comparative testing of tipideed against
other Web servers, please share them on the
<a href="//skarnet.org/lists/#skaware">skaware mailing-list</a>.
</p>

<div id="notes">
<h2> Notes </h2>
</div>

<ul>
 <li> tipideed sometimes answers 400, or even does not answer at all
(it just exits), when receiving some malformed or weirdly paced
client requests, despite what the
<a href="https://datatracker.ietf.org/doc/html/rfc9112">HTTP RFC</a> says.
This is on purpose. HTTP servers are very heavily solicited, they can run
very hot, the Web is a cesspool of badly written bots and bad actors, and every
legitimate browser knows how to speak HTTP properly and without abusing
corner cases in the protocol.
It makes no sense to try to follow the book to the letter, expending
precious resources, when the client can't even be bothered to pretend
it's legit. Knowing when to exit early is crucial for good resource
management. </li>
 <li> tipideed does not have any configuration defaults baked in; it
reads all its configuration values, including the defaults, from the cdb
file created by <a href="tipidee-config.html">tipidee-config</a>. That
is why having such a cdb file is mandatory for tipideed to run. </li>
 <li> <tt>tipideed</tt> is pronounced <em>tipi-deed</em>. You can say
<em>tipi-dee-dee</em>, but only if you're the type of person who also says
<em>PC computer</em>, <em>NIC card</em> or <em>ATM machine</em>. </li>
 <li> <tt>tipidee</tt> is the name of the <em>package</em>, the software suite
implementing a Web server. <tt>tipideed</tt> is the name of the <em>program</em>
doing the HTTP serving part. </li>
</ul>

</body>
</html>