about summary refs log tree commit diff
path: root/doc/servicedir.html
blob: f08c3a43398c7098e7bbf5a27d9b5da815370b16 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: service directories</title>
    <meta name="Description" content="s6: service directory" />
    <meta name="Keywords" content="s6 supervision supervise service directory run finish servicedir" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> Service directories </h1>

<p>
 A <em>service directory</em> is a directory containing all the information
related to a <em>service</em>, i.e. a long-running process maintained and
supervised by <a href="s6-supervise.html">s6-supervise</a>.
</p>

<p>
 (Strictly speaking, a <em>service</em> is not always equivalent to a
long-running process. Things like Ethernet interfaces fit the definition
of <em>services</em> one may want to supervise; however, s6 does not
provide <em>service supervision</em>; it provides <em>process supervision</em>,
and it is impractical to use the s6 architecture as is to supervise
services that are not equivalent to one long-running process. However,
we still use the terms <em>service</em> and <em>service directory</em>
for historical and compatibility reasons.)
</p>

<h2> Contents </h2>

 A service directory <em>foo</em> may contain the following elements:

<ul>
 <li style="margin-bottom:1em"> An executable file named <tt>run</tt>. It can be any executable
file (such as a binary file or a link to any other executable file),
but most of the time it will be a script, called <em>run script</em>.
This file is the most important one in your service directory: it
contains the commands that will setup and run your <em>foo</em> service.
 <ul>
  <li> It is spawned by <a href="s6-supervise.html">s6-supervise</a>
every time the service must be started, i.e. normally when
<a href="s6-supervise.html">s6-supervise</a> starts, and whenever
the service goes down when it is supposed to be up. </li>
  <li> It is given one argument, which is the same argument that the
<a href="s6-supervise.html">s6-supervise</a> process is running with,
i.e. the name of the service directory &mdash; or, if
<a href="s6-supervise.html">s6-supervise</a> is run under
<a href="s6-svscan.html">s6-svscan</a>, the name of the service directory
as seen by <a href="s6-svscan.html">s6-svscan</a> in its
<a href="scandir.html">scan directory</a>. That is, <tt><em>foo</em></tt>
or <tt><em>foo</em>/log</tt>, if <em>foo</em> is the name of the
<em>symbolic link</em> in the scan directory. </li> </ul>

<p> A run script should normally: </p>
 <ul>
  <li> adjust redirections for stdin, stdout and stderr. When a run
script starts, it inherits its standard file descriptors from
<a href="s6-supervise.html">s6-supervise</a>, which itself inherits them from
<a href="s6-svscan.html">s6-svscan</a>. stdin is normally <tt>/dev/null</tt>.
If s6-svscan was launched by another init system, stdout and stderr likely
point to that init system's default log (or <tt>/dev/null</tt> in the case
of sysvinit). If s6-svscan is running as pid 1 via the help of software like
<a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>, then its
stdout and stderr point to a <em>catch-all logger</em>, which catches and
logs any output of the supervision tree that has not been caught by a
dedicated logger. If the defaults provided by your installation are not
suitable for your run script, then your run script should perform the proper
redirections before executing into the final daemon. For instance, dedicated
logging mechanisms, such as the <tt>log</tt> subdirectory (see below) or the
<a href="//skarnet.org/software/s6-rc/">s6-rc</a> pipeline feature, pipe your
run script's <em>stdout</em> to the logging service, but chances are you want
to log <em>stderr</em> as well, so the run script should make sure that its
stderr goes into the log pipe. This
is achieved by <tt><a href="//skarnet.org/software/execline/fdmove.html">fdmove</a>
-c 2 1</tt> in <a href="//skarnet.org/software/execline/">execline</a>,
and <tt>exec 2&gt;&amp;1</tt> in <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html">shell</a>.
 </li>
<li> adjust the environment for your <em>foo</em> daemon. Normally the run script
inherits its environment from <a href="s6-supervise.html">s6-supervise</a>,
which normally inherits its environment from <a href="s6-svscan.html">s6-svscan</a>,
which normally inherits a minimal environment from the boot scripts.
Service-specific environment variables should be set in the run script. </li>
 <li> adjust other parameters for the <em>foo</em> daemon, such as its
uid and gid. Normally the supervision tree, i.e.
<a href="s6-svscan.html">s6-svscan</a> and the various
<a href="s6-supervise.html">s6-supervise</a> processes, is run as root, so
run scripts are also run as root; however, for security purposes, services
should not run as root if they don't need to. You can use the
<a href="s6-setuidgid.html">s6-setuidgid</a> utility in <em>foo</em><tt>/run</tt>
to lose privileges before executing into <em>foo</em>'s long-lived
process; or the <a href="s6-envuidgid.html">s6-envuidgid</a> utility if
your long-lived process needs root privileges at start time but can drop
them afterwards. </li>
 <li> execute into the long-lived process that is to be supervised by
<a href="s6-supervise.html">s6-supervise</a>, i.e. the real <em>foo</em>
daemon. That process must not "background itself": being run by a supervision
tree already makes it a "background" task. </li>
 </ul> </li>

 <li style="margin-bottom:1em"> An optional executable file named <tt>finish</tt>. Like <tt>run</tt>,
it can be any executable file. This <em>finish script</em>, if present,
is executed everytime the <tt>run</tt> script dies. Generally, its main
purpose is to clean up non-volatile data such as the filesystem after the supervised
process has been killed. If the <em>foo</em> service is supposed to be up,
<em>foo</em><tt>/run</tt> is restarted after <em>foo</em><tt>/finish</tt> dies.
 <ul>
  <li> By default, a finish script must do its work and exit in less than                              
5 seconds; if it takes more than that, it is killed. (The point is that the run
script, not the finish script, should be running; the finish script should really
be short-lived.) The maximum duration of a <tt>finish</tt> execution can be
configured via the <tt>timeout-finish</tt> file, see below. </li>
  <li> The finish script is executed with four arguments:
   <ol>
    <li> the exit code from the run script (resp. 256 if the run script was killed by a signal) </li>
    <li> an undefined number (resp. the number of the signal that killed the run script) </li>
    <li> the name of the service directory, the same that has been given to <tt>./run</tt> </li>
    <li> the process group id of the defunct run script. This is useful to clean up
services that leave children behind: for instance, <tt>if test "$1" -gt 255 ; then kill -9 -- -"$4" ; fi</tt>
in the finish script will SIGKILL all children processes if the service crashed.
This is not an entirely reliable mechanism, because an annoying service could spawn
children processes in a different process group, but it should catch most offenders. </li>
   </ol>
  <li> If the finish script exits 125, then <a href="s6-supervise.html">s6-supervise</a>
interprets this as a permanent failure for the service, and does not restart it,
as if an <a href="s6-svc.html">s6-svc -O</a> command had been sent. </li>
  <li> If <a href="s6-supervise.html">s6-supervise</a> has been instructed to exit after
the service dies, via a <tt>s6-svc -x</tt> command or a SIGHUP, then the next
invocation of <tt>finish</tt> will (obviously) be the last, and it will run with
stdin and stdout pointing to <tt>/dev/null</tt>. </li>
 </ul> </li>

 <li style="margin-bottom:1em"> A directory named <tt>supervise</tt>. It is automatically created by
<a href="s6-supervise.html">s6-supervise</a> if it does not exist. This is where
<a href="s6-supervise.html">s6-supervise</a> stores its internal information.
The directory must be writable. </li>

 <li style="margin-bottom:1em"> An optional, empty, regular file named <tt>down</tt>. If such a file exists,
the default state of the service is considered down, not up: s6-supervise will not
automatically start it until it receives a <tt>s6-svc -u</tt> command. If no
<tt>down</tt> file exists, the default state of the service is up. </li>

 <li style="margin-bottom:1em"> An optional regular file named <tt>notification-fd</tt>. If such a file
exists, it means that the service supports
<a href="notifywhenup.html">readiness notification</a>. The file must only
 contain an unsigned integer, which is the number of the file descriptor that
the service writes its readiness notification to. (For instance, it should
be 1 if the daemon is <a href="s6-ipcserverd.html">s6-ipcserverd</a> run with the
<tt>-1</tt> option.)
  When a service is started, or restarted, by s6-supervise, if this file
exists and contains a valid descriptor number, s6-supervise will wait for the
notification from the service and broadcast readiness, i.e. any
<a href="s6-svwait.html">s6-svwait -U</a>,
<a href="s6-svlisten1.html">s6-svlisten1 -U</a> or
<a href="s6-svlisten.html">s6-svlisten -U</a> processes will be
triggered. </li>

 <li style="margin-bottom:1em"> An optional regular file named <tt>lock-fd</tt>. If such a file
exists, it must contain an unsigned integer, representing a file descriptor that
will be open in the service. The service <em>should not write to that descriptor</em>
and <em>should not close it</em>. In other words, it should totally ignore it. That
file descriptor holds a lock, that will naturally be released when the service dies.
The point of this feature is to prevent s6-supervise from accidentally spawning several
copies of the service in case something goes wrong: for instance, the service
backgrounds itself (which it shouldn't do when running under a supervision suite), or
s6-supervise is killed, restarted by s6-svscan, and attempts to start another copy of
the service while the first copy is still alive. If s6-supervise detects that the lock
is held when it tries to start the service, it will print a warning message; the new
service instance will block until the lock is released, then proceed as usual. </li>

 <li style="margin-bottom:1em"> An optional regular file named <tt>timeout-kill</tt>. If such a file
exists, it must only contain an unsigned integer <em>t</em>. If <em>t</em>
is nonzero, then on receipt of an <a href="s6-svc.html">s6-svc -d</a> command,
which sends a SIGTERM (by default, see <tt>down-signal</tt> below) and a
SIGCONT to the service, a timeout of <em>t</em>
milliseconds is set; and if the service is still not dead after <em>t</em>
milliseconds, then it is sent a SIGKILL. If <tt>timeout-kill</tt> does not
exist, or contains 0 or an invalid value, then the service is never
forcibly killed (unless, of course, an <a href="s6-svc.html">s6-svc -k</a>
command is sent). </li>

 <li style="margin-bottom:1em"> An optional regular file named <tt>timeout-finish</tt>. If such a file
exists, it must only contain an unsigned integer, which is the number of
milliseconds after which the <tt>./finish</tt> script, if it exists, will
be killed with a SIGKILL. The default is 5000: finish scripts are killed
if they're still alive after 5 seconds. A value of 0 allows finish scripts
to run forever. </li>

 <li style="margin-bottom:1em"> An optional regular file named <tt>max-death-tally</tt>. If such a file
exists, it must only contain an unsigned integer, which is the maximum number of
service death events that s6-supervise will keep track of. If the service dies
more than this number of times, the oldest events will be forgotten. Tracking
death events is useful, for instance, when throttling service restarts. The
value cannot be greater than 4096. If the file does not exist, a default of 100
is used. </li>

 <li style="margin-bottom:1em"> An optional regular file named <tt>down-signal</tt>. If such a file
exists, it must only contain the name or number of a signal, followed by a
newline. This signal will be used to kill the supervised process when a
<a href="s6-svc.html">s6-svc -d</a> or <a href="s6-svc.html">s6-svc -r</a>
command is used. If the file does not exist, SIGTERM will be used by default. </li>

 <li style="margin-bottom:1em"> A <a href="fifodir.html">fifodir</a> named <tt>event</tt>. It is automatically
created by <a href="s6-supervise.html">s6-supervise</a> if it does not exist.
<em>foo</em><tt>/event</tt>
is the rendez-vous point for listeners, where <a href="s6-supervise.html">s6-supervise</a>
will send notifications when the service goes up or down. </li>

 <li style="margin-bottom:1em"> Optional directories named <tt>instance</tt>
and <tt>instances</tt>. Those are internal subdirectories created by
<a href="s6-instance-maker.html">s6-instance maker</a> in a templated service
directory. Outside of instanced services, these directories should never
appear, and you should never create them manually. </li>


 <li style="margin-bottom:1em"> An optional service directory named <tt>log</tt>. If it exists and <em>foo</em>
is in a <a href="scandir.html">scandir</a>, and <a href="s6-svscan.html">s6-svscan</a>
runs on that scandir, then <em>two</em> services are monitored: <em>foo</em> and
<em>foo</em><tt>/log</tt>. A pipe is open and maintained between <em>foo</em> and
<em>foo</em><tt>/log</tt>, i.e. everything that <em>foo</em><tt>/run</tt>
writes to its stdout will appear on <em>foo</em><tt>/log/run</tt>'s stdin. The <em>foo</em>
service is said to be <em>logged</em>; the <em>foo</em><tt>/log</tt> service is called
<em>foo</em>'s <em>logger</em>. A logger service cannot be logged: if
<em>foo</em><tt>/log/log</tt> exists, nothing special happens. </li>
</ul>

 <h3> Stability </h3>

<p>
 With the evolution of s6, it is possible that 
 <a href="s6-supervise.html">s6-supervise</a> configuration uses more and more
files in the service directory. The
<tt>notification-fd</tt> and <tt>timeout-finish</tt> files, for
instance, have appeared in 2015; users who previously had files
with the same name had to change them. There is no guarantee that
<a href="s6-supervise.html">s6-supervise</a> will not use additional
names in the service directory in the same fashion in the future.
</p>

<p>
 There <em>is</em>, however, a guarantee that
<a href="s6-supervise.html">s6-supervise</a> will never touch
subdirectories named <tt>data</tt> or <tt>env</tt>. So if you
need to store user information in the service directory with
the guarantee that it will never be mistaken for a configuration
file, no matter the version of s6, you should store that information in
the <tt>data</tt> or <tt>env</tt> subdirectories of the service
directory.
</p>

<a name="where">
 <h2> Where should I store my service directories? </h2>
</a>

<p>
 Service directories describe the way services are launched. Once they are
designed, they have little reason to change on a given machine. They can
theoretically reside on a read-only filesystem - for instance, the root
filesystem, to avoid problems with mounting failures.
</p>

<p>
 However, two subdirectories - namely <tt>supervise</tt> and <tt>event</tt> -
of every service directory need to be writable. So it has to be a bit more
complex. Here are a few possibilities.
</p>

<ul>
 <li> The laziest option: you're not using <a href="s6-svscan.html">s6-svscan</a>
as process 1, you're only using it to start a collection of services, and
your booting process is already handled by another init system. Then you can
just store your service directories and your <a href="scandir.html">scan
directory</a> on some read-write filesystem such as <tt>/var</tt>; and you
tell your init system to launch (and, if possible, maintain) s6-svscan on
the scan directory after that filesystem is mounted. </li>
 <li> The almost-as-lazy option: just have the service directories on the
root filesystem. Then your service directory collection is for instance in
<tt>/etc/services</tt> and you have a <tt>/service</tt>
<a href="scandir.html">scan directory</a> containing symlinks to that
collection. This is the easy setup, not requiring an external init system
to mount your filesystems - however, it requires your root filesystem to be
read-write, which is unacceptable if you are concerned with reliability - if
you are, for instance, designing an embedded platform. </li>
 <li> <a href="https://code.dogmap.org/">Some people</a> like to have
their service directories in a read-only filesystem, with <tt>supervise</tt>
symlinks pointing to various places in writable filesystems. This setup looks
a bit complex to me: it requires careful handling of the writable
filesystems, with not much room for error if the directory structure does not
match the symlinks (which are then dangling). But it works. </li>
 <li> Service directories are usually small; most daemons store their
information elsewhere. Even a complete set of service directories often
amounts to less than a megabyte of data - sometimes much less. Knowing this,
it makes sense to have an image of your service directories in the
(possibly read-only) root filesystem, and <em>copy it all</em>
to a scan directory located on a RAM filesystem that is mounted at boot time.
This is the setup I recommend, and the one used by the
<a href="//skarnet.org/software/s6-rc/">s6-rc</a> service manager.
 It has several advantages:
 <ul>
  <li> Your service directories reside on the root filesystem and are not
modified during the lifetime of the system. If your root filesystem is
read-only and you have a working set of service directories, you have the
guarantee that a reboot will set your system in a working state. </li>
 <li> Every boot system requires an early writeable filesystem, and many
create it in RAM. You can take advantage of this to copy your service
directories early and run s6-svscan early. </li>
 <li> No dangling symlinks or potential problems with unmounted
filesystems: this setup is robust. A simple <tt>/bin/cp -a</tt> or
<tt>tar -x</tt> is all it takes to get a working service infrastructure. </li>
 <li> You can make temporary modifications to your service directories
without affecting the main ones, safely stored on the disk. Conversely,
every boot ensures clean service directories - including freshly created
<tt>supervise</tt> and <tt>event</tt> subdirectories. No stale files can
make your system unstable. </li>
 </ul> </li>
</ul>

</body>
</html>