summary refs log tree commit diff
path: root/doc/s6-supervise.html
blob: fde1da7b838a0c9740a06f5595467a9e7de72aa7 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: the s6-supervise program</title>
    <meta name="Description" content="s6: the s6-supervise program" />
    <meta name="Keywords" content="s6 command s6-supervise servicedir supervision supervise" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> The s6-supervise program </h1>

<p>
s6-supervise monitors a long-lived process (or <em>service</em>), making sure it
stays alive, sending notifications to registered processes when it dies, and
providing an interface to control its state. s6-supervise is designed to be the
last non-leaf branch of a <em>supervision tree</em>, the supervised process
being a leaf.
</p>

<h2> Interface </h2>

<pre>
     s6-supervise <em>servicedir</em>
</pre>

<p>
 s6-supervise's behaviour is approximately the following:
</p>

<ul>
 <li> s6-supervise changes its current directory to <em>servicedir</em>. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> It forks and executes the <tt>./run</tt> file in the service directory.
 <li> <tt>./run</tt> should be a long-lived process: it can chain load (i.e. exec into
other binaries), but should not die. It's the daemon that s6-supervise monitors
and manages. </li>
 <li> When <tt>./run</tt> dies, s6-supervise spawns <tt>./finish</tt>, if it exists.
This script should be short-lived: it's meant to clean up application state, if
necessary, that has not been cleaned up by <tt>./run</tt> itself before dying. </li>
 <li> When <tt>./finish</tt> dies, s6-supervise spawns <tt>./run</tt> again. </li>
 <li> s6-supervise operation can be controlled by the <a href="s6-svc.html">s6-svc</a>
program. It can be sent commands like "restart the service", "bring the service down", etc. </li>
 <li> s6-supervise normally runs forever. If told to exit by <a href="s6-svc.html">s6-svc</a>,
it waits for the service to go down one last time, then exits 0. </li>
</ul>

<p>
 For a precise description of s6-supervise's behaviour, check the
<a href="#detailed">Detailed operation</a> section below, as well as
the <a href="servicedir.html">service directory</a> page:
s6-supervise operation can be extensively configured by the presence
of certain files in the service directory.
</p>

<h2> Options </h2>

<p>
 s6-supervise does not support options, because it is normally not run
manually via a command line; it is usually launched by its own
supervisor, <a href="s6-svscan.html">s6-svscan</a>. The way to
tune s6-supervise's behaviour is via files in the
<a href="servicedir.html">service directory</a>.
</p>

<h2> Readiness notification support </h2>

<p>
 If the <a href="servicedir.html">service directory</a> contains a valid
<tt>notification-fd</tt> file when the service is started, or restarted,
s6-supervise creates and listens to an additional pipe from the service
for <a href="notifywhenup.html">readiness notification</a>. When the
notification occurs, s6-supervise updates the <tt>./supervise/status</tt>
file accordingly, then sends
a <tt>'U'</tt> event to <tt>./event</tt>.
</p>

<p>
 If the service is logged, i.e. if the service directory has a
<tt>log</tt> subdirectory that is also a service directory, and the
s6-supervise process has been launched by
that is also <a href="s6-svscan.html">s6-svscan</a>, then by default
the service's stdout goes into the logging pipe. If you set
<tt>notification-fd</tt> to 1, the logging pipe will be overwritten
by the notification pipe, which is probably not what you want. Instead,
if your daemon writes a notification message to its stdout, you should
set <tt>notification-fd</tt> to (for instance) 3, and redirect outputs
in your run script. For instance, to redirect stderr to the logger and
stdout to a <tt>notification-fd</tt> set to 3, you would start your
daemon as <tt>fdmove -c 2 1 fdmove 1 3 prog...</tt> (in execline), or
<tt>exec 2&gt;&amp;1 1&gt;&amp;3 3&lt;&amp;- prog...</tt> (in shell).
</p>

<h2> Signals </h2>

<p>
 s6-supervise reacts to the following signals:
</p>

<ul>
 <li> SIGTERM: bring down the service and exit, as if a
<a href="s6-svc.html">s6-svc -xd</a> command had been received </li>
 <li> SIGHUP: close its own stdin and stdout, and exit as soon as the
service stops, as if an <a href="s6-svc.html">s6-svc -x</a> command
had been received </li>
 <li> SIGQUIT: exit immediately without touching the service in any
way. </li>
 <li> SIGINT: send a SIGINT to the process group of the service, then
exit immediately. (The point here is to correctly forward SIGINT
in the case where s6-supervise is running in a terminal and the user
sent ^C to interrupt it.) </li>
</ul>

<a name="#detailed">
<h2> Detailed operation </h2>
</a>

<ul>
 <li> s6-supervise switches to the <em>servicedir</em>
<a href="servicedir.html">service directory</a>. </li>
 <li> It creates a <tt>supervise/</tt> subdirectory (if it doesn't exist yet) to
store its internal data. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> If the <tt>./event</tt> <a href="fifodir.html">fifodir</a> does not exist,
s6-supervise creates it and allows subscriptions to it from processes having the same
effective group id as the s6-supervise process.
If it already exists, it uses it as is, without modifying the subscription rights. </li>
 <li> It <a href="libs6/ftrigw.html">sends</a> a <tt>'s'</tt> event to <tt>./event</tt>. </li>
 <li> If the default service state is up (i.e. there is no <tt>./down</tt> file),
s6-supervise spawns <tt>./run</tt>. One argument is given to the <tt>./run</tt>
program: <em>servicedir</em>, the name of the directory s6-supervise is being
run on. It is given exactly as given to s6-supervise, without recanonicalization.
In particular, if s6-supervise is being managed by <a href="s6-svscan.html">s6-svscan</a>,
<em>servicedir</em> is always of the form <tt><em>foo</em></tt> or <tt><em>foo</em>/log</tt>,
and <em>foo</em> contains no slashes. </li>
 <li> s6-supervise sends a <tt>'u'</tt> event to <tt>./event</tt> whenever it
successfully spawns <tt>./run</tt>. </li>
 <li> If there is a <tt>./notification-fd</tt> file in the service directory and,
at some point after the service has been spawned, s6-supervise is told that the
service is ready, it sends a <tt>'U'</tt> event to <tt>./event</tt>. There are
several ways to tell s6-supervise that the service is ready:
  <ul>
   <li> the daemon may <a href="notifywhenup.html">do so itself</a>. </li>
   <li> the run script may have forked a
<a href="s6-notifyoncheck.html">s6-notifyoncheck</a> process that polls the
service for readiness. </li>
  </ul> </li>
 <li> When <tt>./run</tt> dies, s6-supervise sends a <tt>'d'</tt> event to <tt>./event</tt>.
It then spawns <tt>./finish</tt> if it exists.
<tt>./finish</tt> will have <tt>./run</tt>'s exit code as first argument, or 256 if
<tt>./run</tt> was signaled; it will have the number of the signal that killed <tt>./run</tt>
as second argument, or an undefined number if <tt>./run</tt> was not signaled;
and it will have <em>servicedir</em> as third argument. </li>
 <li> By default, <tt>./finish</tt> must exit in less than 5 seconds. If it takes more than that,
s6-supervise kills it with a SIGKILL. This can be configured via the
<tt>./timeout-finish</tt> file, see the description in the
<a href="servicedir.html">service directory page</a>. </li>
 <li> When <tt>./finish</tt> dies (or is killed),
s6-supervise sends a <tt>'D'</tt> event to <tt>./event</tt>. Then
it restarts <tt>./run</tt> unless it has been told not to. </li>
 <li> If <tt>./finish</tt> exits 125, then s6-supervise sends a <tt>'O'</tt> event
to <tt>./event</tt> <em>before</em> the <tt>'D'</tt>, and it
<strong>does not restart the service</strong>, as if <tt>s6-svc -O</tt> had
been called. This can be used to signify permanent failure to start the service. </li>
 <li> There is a minimum 1-second delay between two <tt>./run</tt> spawns, to avoid busylooping
if <tt>./run</tt> exits too quickly. If the service has been <em>ready</em> for more
than one second, it will restart immediately, but if it not <em>ready</em> when
it dies, s6-supervise will always pause for 1 second before spawning it again. </li>
 <li> When killed or asked to exit, it waits for the service to go down one last time, then
sends a <tt>'x'</tt> event to <tt>./event</tt> before exiting 0. </li>
</ul>

<p>
 Make sure to also check the <a href="servicedir.html">service directory</a>
documentation page, for the full list of files that can be present in a service
directory and impact s6-supervise's behaviour in any way.
</p>

<h2> Usage notes </h2>

<ul>
 <li> s6-supervise is a long-lived process. It normally runs forever, from the system's
boot scripts, until shutdown time; it should not be killed or told to exit. If you have
no use for a service, just turn it off; the s6-supervise process does not hurt. </li>
 <li> Even in boot scripts, s6-supervise should normally not be run directly. It's
better to have a collection of <a href="servicedir.html">service directories</a> in a
single <a href="scandir.html">scan directory</a>, and just run
<a href="s6-svscan.html">s6-svscan</a> on that scan directory. s6-svscan will spawn
the necessary s6-supervise processes, and will also take care of logged services. </li>
 <li> s6-supervise always spawns its child in a new session, as a session leader.
The goal is to protect the supervision tree from misbehaved services that would
send signals to their whole process group. Nevertheless, s6-supervise's handling of
SIGINT ensures that its service is killed if you happen to run it in a terminal and
send it a ^C. </li>
 <li> You can use <a href="s6-svc.html">s6-svc</a> to send commands to the s6-supervise
process; mostly to change the service state and send signals to the monitored
process. </li>
 <li> You can use <a href="s6-svok.html">s6-svok</a> to check whether s6-supervise
is successfully running. </li>
 <li> You can use <a href="s6-svstat.html">s6-svstat</a> to check the status of a
service. </li>
 <li> s6-supervise maintains internal information inside the <tt>./supervise</tt>
subdirectory of <em>servicedir</em>. <em>servicedir</em> itself can be read-only,
but both <em>servicedir</em><tt>/supervise</tt> and <em>servicedir</em><tt>/event</tt>
need to be read-write. </li>
 <li> If <em>servicedir</em> is read-only, then the <a href="s6-svc.html">s6-svc</a>
<tt>-D</tt> and <tt>-U</tt> commands will not work properly since s6-supervise will
be unable to create or delete a <em>servicedir<em><tt>/down</tt> file; in this case
s6-supervise will print a warning on stderr, and perform the equivalent of <tt>-d</tt>
and <tt>-u</tt> instead, but will be unable to change the permanent service
configuration. </li>
</ul>

<h2> Implementation notes </h2>

<ul>
 <li> s6-supervise tries its best to stay alive and running despite possible
system call failures. It will write to its standard error everytime it encounters a
problem. However, unlike <a href="s6-svscan.html">s6-svscan</a>, it will not go out
of its way to stay alive; if it encounters an unsolvable situation, it will just
die. </li>
 <li> Unlike other "supervise" implementations, s6-supervise is a fully asynchronous
state machine. That means that it can read and process commands at any time, even
when the machine is in trouble (full process table, for instance). </li>
 <li> s6-supervise <em>does not use malloc()</em>. That means it will <em>never leak
memory</em>. <small>However, s6-supervise uses opendir(), and most opendir()
implementations internally use heap memory - so unfortunately, it's impossible to
guarantee that s6-supervise does not use heap memory at all.</small> </li>
 <li> s6-supervise has been carefully designed so every instance maintains as little
data as possible, so it uses a very small
amount of non-sharable memory. It is not a problem to have several
dozens of s6-supervise processes, even on constrained systems: resource consumption
will be negligible. </li>
</ul>

</body>
</html>