about summary refs log tree commit diff
path: root/doc/overview.html
blob: bb7c7635618ea103a74ba5b656b5d39e95ebd562 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: an overview</title>
    <meta name="Description" content="s6: an overview" />
    <meta name="Keywords" content="s6 overview supervision init process unix" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> An overview of s6 </h1>

<p>
 s6 is a collection of utilities revolving around process supervision and
management, logging, and system initialization. This page is a high-level
description of the different parts of s6.
</p>

<h2> Process supervision </h2>

<p>
 At its core, s6 is a <em>process supervision suite</em>, like its ancestor
<a href="https://cr.yp.to/daemontools.html">daemontools</a> and its
close cousin
<a href="http://smarden.org/runit/">runit</a>.
</p>

<h3> Concept </h3>

<p>
 The concept of process supervision comes from several observations:
</p>

<ul>
 <li> Unix systems, even minimalistic ones, need to run
<em>long-lived processes</em>, aka <em>daemons</em>. That is one of the
core design principles of Unix: one service &rarr; one daemon. </li>
 <li> Daemons can die unexpectedly. Maybe they are missing a vital
resource and cannot handle a certain failure; maybe they tripped on a bug;
maybe a misconfigured administration program killed them; maybe the
kernel killed them. Processes are fragile, but daemons are vital to a
Unix system: a fundamental discrepancy that needs to be solved. </li>
 <li> Automatically restarting daemons when they die is generally a good
thing. In any case, sysadmin intervention is necessary, but at least the
daemon is providing service, or trying to, until the sysadmin can log in
and investigate the underlying problem. </li>
 <li> Ad-hoc shell scripts that restart daemons <strong>suck</strong>, for
several reasons that would each justify their own page. The difficulty of
keeping track of the PID, explained below, is one of those reasons. </li>
 <li> It is sometimes necessary to send signals to a daemon. To kill it,
of course, but also to make it read its config file again, for instance;
signalling a daemon is a natural and very common way of sending it
simple commands. </li>
 <li> Generally, to send a signal to a daemon, you need to know its PID.
Without a supervision suite, knowing the proper PID is hard. Most
non-supervision systems use a hack known as <em>.pid files</em>, i.e.
the script that starts the daemon stores its PID into a file, and other
scripts read that file. This is a bad mechanism for several reasons, and
the case against .pid files would also justify its own page; the most
important drawback of .pid files is that they create race conditions
and management scripts may kill the wrong process. </li>
 <li> Non-supervision systems provide scripts to start and stop daemons,
but those scripts may fail at boot time even though they work when run
manually,
and vice versa. If a sysadmin logs in and runs the script to restart a
daemon that has died, the result might not be the same as if the whole
system had been rebooted, and the daemon may exhibit strange behaviours!
This is because the boot-time environment and the restart-time environment
are not the same when the script is run; and a non-supervision system
just cannot ensure reproducibility of the environment. This is a core
problem of non-supervision systems: countless bugs have been falsely
reported because of simple environment differences or configuration errors,
countless man-hours have been wasted to try and understand what was
going on. </li>
</ul>

<p>
 A process supervision system organizes the process hierarchy in a
radically different way.
</p>

<ul>
 <li> A process supervision system starts an independent hierarchy of
processes at boot time, called a <em>supervision tree</em>. This
supervision tree never dies: when one of its components dies, it is
restarted automatically. To ensure availability of the supervision
tree at all times, it should be rooted in process 1, which cannot die. </li>
 <li> A daemon is never started, either manually or in a script, as a
scion of the script that starts it.
 Instead, to start a daemon, you configure a
specific directory which contains all the information about your daemon;
then you send a command to the supervision tree. The supervision tree
will start the daemon as a leaf. <strong>In a process supervision
system, daemons are always spawned by the supervision tree, and
never by an admin's shell.</strong> </li>
 <li> The parent of your daemon is a <em>supervisor</em>. Since your
daemon is its direct child, <strong>the supervisor always knows the
correct PID of your daemon</strong>. </li>
 <li> The supervisor watches your daemon and can restart it when it
dies, automatically. </li>
 <li> The supervision tree always has the same environment, so starting
conditions are reproducible. Your daemon will always be started with the
same environment, whether it is at boot time via init scripts or for the
100th automatic - or manual - restart. </li>
 <li> To send signals to your daemon, you send a command to its
supervisor, which will then send a signal to the daemon on your behalf.
Your daemon is identified by the directory containing its information,
which is stable, instead of by its PID, which is not stable; the supervisor
maintains the correct association without a race condition or the other
problems of .pid files. </li>
</ul>

<h3> Implementation </h3>

<p>
 s6 is a straightforward implementation of those concepts.
</p>

<ul>
 <li> The <a href="s6-svscan.html">s6-svscan</a> and
<a href="s6-supervise.html">s6-supervise</a> programs are the components
of the <em>supervision tree</em>. They are long-lived programs.
 <ul>
  <li> <a href="s6-supervise.html">s6-supervise</a> is a daemon's
<em>supervisor</em>, its direct parent. For every long-lived process on a
system, there is a corresponding <a href="s6-supervise.html">s6-supervise</a>
process watching it. This is okay, because every instance of
<a href="s6-supervise.html">s6-supervise</a> uses very few resources. </li>
  <li> <a href="s6-svscan.html">s6-svscan</a> is, in a manner of speaking,
a supervisor for the supervisors. It watches and maintains a collection of
<a href="s6-supervise.html">s6-supervise</a> processes: it is the branch
of the supervision tree that all supervisors are stemming from. It can be
run and
<a href="//skarnet.org/software/s6/s6-svscan-not-1.html">supervised
by your regular init process</a>, or it can
<a href="//skarnet.org/software/s6/s6-svscan-1.html">run as
process 1 itself</a>. Running s6-svscan as process 1 requires
some effort from the user, because of the inherent non-portability of
init processes; the
<a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
package automates that effort and allows users to run s6 as an init
replacement. </li>
  <li> The configuration of a daemon to be supervised by
<a href="s6-supervise.html">s6-supervise</a> is done via a
<a href="servicedir.html">service directory</a>. </li>
  <li> The place to gather all service directories to be watched by a
<a href="s6-svscan.html">s6-svscan</a> instance is called a
<a href="scandir.html">scan directory</a>. </li>
 </ul>
 <li> The command that controls a single supervisor, and allows you to
send signals to a daemon, is
<a href="s6-svc.html">s6-svc</a>. It is a short-lived program. </li>
 <li> The command that controls a set of supervisors, and allows you to
start and stop supervision trees, is
<a href="s6-svscanctl.html">s6-svscanctl</a>. It is a short-lived
program. </li>
</ul>

<p>
 These four programs,
<a href="s6-svscan.html">s6-svscan</a>,
<a href="s6-supervise.html">s6-supervise</a>,
<a href="s6-svscanctl.html">s6-svscanctl</a> and
<a href="s6-svc.html">s6-svc</a>,
are the very core of s6. Technically, once you have them, you have a
functional s6 installation, and the other utilities are just a bonus.
</p>

<h3> Practical usage </h3>

<p>
 To use s6's supervision features, you need to perform the following steps:
</p>

<ul>
 <li> For every daemon you potentially want supervised, write a
<a href="servicedir.html">service directory</a>. Make sure that
your daemon does not background itself when started in the
<tt>./run</tt> script! Auto-backgrounding is a historical hack
that was implemented when supervision suites did not exist; since
you're using a supervision suite, auto-backgrounding is unnecessary
and in this case detrimental. </li>
 <li> Write a single <a href="scandir.html">scan directory</a> for
the set of daemons you want to actually run. This set can be modified
at run time. </li>
 <li> At some point in your initialization scripts, run
<a href="s6-svscan.html">s6-svscan</a> on the scan directory. This will
start the supervision tree, including your set of daemons. The exact
way of running s6-svscan depends on your system: it is not quite the same
when you want to run it as process 1 on a real machine, or under another
init on a real machine, or as process 1 in a
<a href="https://www.docker.com/">Docker</a> container, or in another
context entirely. </li>
 <li> Alternatively, you can start <a href="s6-svscan.html">s6-svscan</a>
on an empty scan directory, then populate it step by step and send an
update command to s6-svscan via
<a href="s6-svscanctl.html">s6-svscanctl</a> whenever the supervision
tree should pick up the differences and start the services you added. </li>
 <li> That's it, your services are running. To control them manually,
you can use the <a href="s6-svc.html">s6-svc</a> command. </li>
 <li> At the end of the system's lifetime, you can use
<a href="s6-svscanctl.html">s6-svscanctl</a> to bring down the supervision
tree. </li>
</ul>

<h2> Service-specific logging </h2>

<p>
<a href="s6-svscan.html">s6-svscan</a> can monitor a supervision tree,
but it can also do one more thing. It can ensure that a daemon's log,
i.e. what the daemon outputs to its stdout (or stderr if you redirect it),
gets processed by another, supervised, long-lived process, called a
<em>logger</em>; and it can make sure that the logs are never lost
between the daemon and the logger - even if the daemon dies, even if the
logger dies.
</p>

<p>
 If your daemon is outputting messages, you have a decision to make
about where to send them.
</p>

<ul>
 <li> You can do as non-supervision systems do, and send the messages
to syslog. It's entirely possible with a supervision system too.
However, like auto-backgrounding, syslog is a historical mechanism that
predates supervision suites, and is technically inferior; it is
recommended that you do not use it whenever you can avoid it. </li>
 <li> You can send them to the daemon's stdout/stderr and do nothing special
about it. The logs will then be sent to s6-svscan's stdout/stderr;
what mechanism will read them depends on how you started s6-svscan. </li>
 <li> You can use s6-svscan's service-specific logging mechanism and
dedicate a logger process to your daemon's messages. </li>
</ul>

<p>
 s6 provides you with a long-lived process to use as a logger:
<a href="s6-log.html">s6-log</a>. It will store your logs in one (or
more) specific directory of your choice, and rotate them automatically.
</p>

<h2> Helpers for run scripts </h2>

<p>
 Creating a working
<a href="servicedir.html">service directory</a>, and especially a good
<em>run script</em>, is the most important part of the work when
adapting a daemon to a supervision framework.
</p>

<p>
 If you can find your daemon's invocation script on a non-supervision system,
for instance a System V-style init script, you can see the exact
options that the daemon is being run with: environment variables,
uid and gid, open descriptors, etc. This is what you
need to replicate in your run script.
</p>

<p>
 (Do not replicate the auto-backgrounding, or things like
<a href="http://man.he.net/man8/start-stop-daemon">start-stop-daemon</a>
invocation: start-stop-daemon and its friends are hideous and kludgy
attempts to work around the lack of proper supervision mechanisms. Now
that you have s6, you should remove them from your system, throw them
into a bonfire, and dance and laugh while they burn. Generally speaking,
as a system administrator you want daemons that have been designed
following the principles described
<a href="https://jdebp.uk/FGA/unix-daemon-design-mistakes-to-avoid.html">here</a>,
or at least you want to use the command-line options that make them
behave in such a way.) 
</p>

<p>
 The vast majority of the tools provided by s6 are meant to be used in
run scripts: they help you control the process state and
environment in your script before it executes into your daemon. Or,
sometimes, they are daemons themselves, designed to be supervised.
</p>

<p>
 s6, like other <a href="//skarnet.org/software/">skarnet.org
software</a>, makes heavy use of
<a href="https://en.wikipedia.org/wiki/Chain_loading#Chain_loading_in_Unix">chain
loading</a>, also known as "Bernstein chaining": a lot of s6 tools will
perform some action that changes the process state, then execute into the
rest of their command line. This allows the user to change the process state
in a very flexible way, by combining the right components in the right
order. Very often, a run script can be reduced to a single command line -
likely a long one, but still a single one. (That is the main reason why
using the
<a href="//skarnet.org/software/execline/">execline</a> language
to write run scripts is recommended: execline makes it natural to handle
long command lines made of massive amounts of chain loading. This is by no
means mandatory, though: a run script can be any executable file you want,
provided that running it eventually results in a long-lived process with
the same PID.)
</p>

<p>
 Some examples of s6 programs meant to be used in run scripts:
</p>

<ul>
 <li> The <a href="s6-log.html">s6-log</a> program is a long-lived
process. It is meant to be executed into by a <tt>./log/run</tt>
script: it will be supervised, and will process what it reads on
its stdin (i.e. the output of the <tt>./run</tt> daemon). </li>
 <li> The <a href="s6-envdir.html">s6-envdir</a> program is a
short-lived process that will update its current environment according
to what it reads in a given directory, then execute into the rest of its
command line. It is meant to be used in a run script to adjust the
environment with which the final daemon will be executed into. </li>
 <li> Similarly, the <a href="s6-softlimit.html">s6-softlimit</a> program
adjusts its resource limits, then executes into the rest of its command
line: it is meant to set the resources the final daemon will have
access to. </li>
 <li> The <a href="s6-applyuidgid.html">s6-applyuidgid</a> program,
part of the <tt>s6-*uidgid</tt> family, drops root privileges before
executing into the rest of its command line: it is meant to be used
in run scripts that need root privileges when starting but do not
need it for the execution of the long-lived process. </li>
 <li> <a href="s6-ipcserverd.html">s6-ipcserverd</a> is a daemon that
listens to a Unix socket and spawns a program for every connection.
It is meant to be supervised, so it should be used in a run script,
and it's also meant to be a flexible super-server that you can use
for different applications: so it is a building block that may appear in
several of your run scripts defining
<a href="localservice.html">local services</a>. </li>
</ul>

<h2> Readiness notification and dependency management </h2>

<p>
 Now that you have a supervision tree, and long-lived processes running
supervised, you may want to introduce dependencies between them: do not
perform an action (e.g. start (with <a href="s6-svc.html">s6-svc -u</a>)
the Web server connecting to a database)
before a given daemon is up and running (e.g. the database server).
s6 provides tools to do that:
</p>

<ul>
 <li> The <a href="s6-svwait.html">s6-svwait</a>,
<a href="s6-svlisten1.html">s6-svlisten1</a> and
<a href="s6-svlisten.html">s6-svlisten</a> programs will wait until a set of
daemons is up, ready, down (as soon as the <tt>./run</tt> process dies) or
really down (when the <tt>./finish</tt> process has also died). </li>
 <li> Unfortunately, a daemon being <em>up</em> does not mean that it is
<em>ready</em>:
<a href="notifywhenup.html">this page</a> goes into the details. s6
supports a simple mechanism: when a daemon wants to signal that it is
<em>ready</em>, it simply writes a newline to a file descriptor of its
choice, and <a href="s6-supervise.html">s6-supervise</a> will pick that
notification up and broadcast the information to processes waiting for
it. </li>
 <li> s6 also has a legacy mechanism for daemons that do not
notify their own readiness but provide a way for an external program
to check whether they're ready or not:
<a href="s6-notifyoncheck.html">s6-notifyoncheck</a>.
 This is polling, which is bad, but unfortunately necessary for
many daemons as of 2019. </li>
</ul>

<p>
 s6 does not provide a complete dependency management framework,
i.e. a program to automatically start (or stop) a set of services in a
specific order - that order being automatically computed from a graph of
dependencies between services.
 That functionality belongs to a <em>service manager</em>, and is
implemented for instance in the
<a href="//skarnet.org/software/s6-rc/">s6-rc</a> package.
</p>

<h2> Fine-grained control over services </h2>

<p>
 s6 provides you with a few more tools to control and monitor your
services. For instance:
</p>

<ul>
 <li> <a href="s6-svstat.html">s6-svstat</a> gives you access to
the detailed state of a service </li>
 <li> <a href="s6-svperms.html">s6-svperms</a> allows you to configure
what users can read that state, what users can send control
commands to your service, and what users can be notified of
service start/stop events </li>
 <li> <a href="s6-svdt.html">s6-svdt</a>
allows you to see what caused the latest deaths of a supervised
process </li>
</ul>

<p>
 These tools make s6 the most powerful and flexible of the existing
process supervision suites.
</p>

<h2> Additional utilities </h2>

<p>
 The other programs in the s6 package are various utilities that may be
useful in designing servers, and more generally multi-process software.
They can be used with or without a supervision environment, although
it is of course recommended to have one; but they are not part of the core s6
functionality, and you may safely ignore them for now if you are just getting
into the supervision world.
</p>

<h3> Generic inter-process notification </h3>

<p>
 The <tt>s6-ftrig*</tt> family of programs allows notifications between
unrelated processes: a set of processes can subscribe to a certain
channel - identified by a directory in the filesystem - and ask to be
notified of certain events on that channel; another set of processes can
send events to the channel.
</p>

<p>
 The underlying mechanism is the same as the one used by the supervision
tree for readiness notification, but the <tt>s6-ftrig*</tt> tools provide
a more generic access to that mechanism.
</p>

<h3> Helpers for designing local services </h3>

<p>
 Local services, i.e. daemons listening to a Unix domain socket, are a
powerful and flexible mechanism, especially with modern Unix systems
that allow client authentication. s6 includes tools to take advantage
of that mechanism.
</p>

<ul>
 <li> The <tt>s6-ipc*</tt> family of programs is about designing clients
or servers that communicate over Unix domain sockets. </li>
 <li> The <tt>s6-*access*</tt> and <a href="s6-connlimit.html">s6-connlimit</a>
family of programs is about client access control. </li>
 <li> The <tt>s6-sudo*</tt> family of programs is about using a local
service in order to give selected
clients the ability to run a command line with the privileges of the
server, without using suid programs. </li>
</ul>

<h3> Keeping file descriptors open </h3>

<p>
 Sometimes you want to keep a file descriptor open, even if the program
normally using it dies - so the program can restart and use the same
file descriptor without losing any data. To do that, you need to
<em>hold</em> the descriptor in another process, i.e. that process
should have it open but do nothing with it.
</p>

<p>
<a href="s6-svscan.html">s6-svscan</a>, for instance, holds the pipe
existing between a supervised daemon and its logger, so even if the
daemon or the logger dies while there are logs in the pipe, the pipe
remains open and the logs are not lost.
</p>

<p>
 s6 provides a mechanism to store and retrieve open file descriptors
in a totally generic way: the <tt>s6-fdholder*</tt> family of programs.
</p>

<ul>
 <li> The <a href="s6-fdholder-daemon.html">s6-fdholder-daemon</a> program
is a daemon (or, rather, executes into the
<a href="s6-fdholderd.html">s6-fdholderd</a> daemon), meant to be
supervised, that will hold file descriptors on its clients' behalf. </li>
 <li> Other programs in the family, such as
<a href="s6-fdholder-store.html">s6-fdholder-store</a>, are client
programs that interact with this daemon to store and retrieve file
descriptors. </li>
</ul>

<p>
 Note that "socket activation", one of the main advertised benefits of the
<a href="https://www.freedesktop.org/wiki/Software/systemd/">systemd</a>
init system, sounds similar to fd-holding.
The reality is that socket activation is a mixture of several different
mechanisms, one of which is fd-holding; s6 allows you to implement the
<a href="socket-activation.html">healthy parts</a> of socket activation.
</p>

<h3> Other miscellaneous utilities </h3>

<p>
 This page does not list or classify every s6 tool. Please
explore the "Reference" section of the
<a href="index.html">main s6 page</a> for details on a specific program.
</p>

</body>
</html>