about summary refs log tree commit diff
path: root/doc/s6-supervise.html
blob: b2a83d55a1afd1fcdfc0b114f4b2cec7c370c4ad (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: the s6-supervise program</title>
    <meta name="Description" content="s6: the s6-supervise program" />
    <meta name="Keywords" content="s6 command s6-supervise servicedir supervision supervise" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> The s6-supervise program </h1>

<p>
s6-supervise monitors a long-lived process (or <em>service</em>), making sure it
stays alive, sending notifications to registered processes when it dies, and
providing an interface to control its state. s6-supervise is designed to be the
last non-leaf branch of a <em>supervision tree</em>, the supervised process
being a leaf.
</p>

<h2> Interface </h2>

<pre>
     s6-supervise <em>servicedir</em>
</pre>

<p>
 s6-supervise's behaviour is approximately the following:
</p>

<ul>
 <li> s6-supervise changes its current directory to <em>servicedir</em>. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> It forks and executes the <tt>./run</tt> file in the service directory.
 <li> <tt>./run</tt> should be a long-lived process: it can chain load (i.e. exec into
other binaries), but should not die. It's the daemon that s6-supervise monitors
and manages. </li>
 <li> When <tt>./run</tt> dies, s6-supervise spawns <tt>./finish</tt>, if it exists.
This script should be short-lived: it's meant to clean up application state, if
necessary, that has not been cleaned up by <tt>./run</tt> itself before dying. </li>
 <li> When <tt>./finish</tt> dies, s6-supervise spawns <tt>./run</tt> again. </li>
 <li> s6-supervise operation can be controlled by the <a href="s6-svc.html">s6-svc</a>
program. It can be sent commands like "restart the service", "bring the service down", etc. </li>
 <li> s6-supervise normally runs forever. If told to exit by <a href="s6-svc.html">s6-svc</a>,
it waits for the service to go down one last time, then exits 0. </li>
</ul>

<p>
 For a precise description of s6-supervise's behaviour, check the
<a href="#detailed">Detailed operation</a> section below, as well as
the <a href="servicedir.html">service directory</a> page:
s6-supervise operation can be extensively configured by the presence
of certain files in the service directory.
</p>

<h2> Options </h2>

<p>
 s6-supervise does not support options, because it is normally not run
manually via a command line; it is usually launched by its own
supervisor, <a href="s6-svscan.html">s6-svscan</a>. The way to
tune s6-supervise's behaviour is via files in the
<a href="servicedir.html">service directory</a>.
</p>

<h2> Readiness notification support </h2>

<p>
 If the <a href="servicedir.html">service directory</a> contains a valid
<tt>notification-fd</tt> file when the service is started, or restarted,
s6-supervise creates and listens to an additional pipe from the service
for <a href="notifywhenup.html">readiness notification</a>. When the
notification occurs, s6-supervise updates the <tt>./supervise/status</tt>
file accordingly, then sends
a <tt>'U'</tt> event to <tt>./event</tt>.
</p>

<p>
 If the service is logged, i.e. if the service directory has a
<tt>log</tt> subdirectory that is also a service directory, and the
s6-supervise process has been launched by
that is also <a href="s6-svscan.html">s6-svscan</a>, then by default
the service's stdout goes into the logging pipe. If you set
<tt>notification-fd</tt> to 1, the logging pipe will be overwritten
by the notification pipe, which is probably not what you want. Instead,
if your daemon writes a notification message to its stdout, you should
set <tt>notification-fd</tt> to (for instance) 3, and redirect outputs
in your run script. For instance, to redirect stderr to the logger and
stdout to a <tt>notification-fd</tt> set to 3, you would start your
daemon as <tt>fdmove -c 2 1 fdmove 1 3 prog...</tt> (in execline), or
<tt>exec 2&gt;&amp;1 1&gt;&amp;3 3&lt;&amp;- prog...</tt> (in shell).
</p>

<h2> Signals </h2>

<p>
 s6-supervise reacts to the following signals:
</p>

<ul>
 <li> SIGTERM: bring down the service and exit, as if a
<a href="s6-svc.html">s6-svc -xd</a> command had been received </li>
 <li> SIGHUP: exit as soon as the service stops, as if a
<a href="s6-svc.html">s6-svc -x</a> command had been received </li>
 <li> SIGQUIT: close stdin, stdout and stderr and exit as soon as
the service stops, as if a
<a href="s6-svc.html">s6-svc -X</a> command had been received </li>
</ul>

<a name="#detailed">
<h2> Detailed operation </h2>
</a>

<ul>
 <li> s6-supervise switches to the <em>servicedir</em>
<a href="servicedir.html">service directory</a>. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> If the <tt>./event</tt> <a href="fifodir.html">fifodir</a> does not exist,
s6-supervise creates it and allows subscriptions to it from processes having the same
effective group id as the s6-supervise process.
If it already exists, it uses it as is, without modifying the subscription rights. </li>
 <li> It <a href="libs6/ftrigw.html">sends</a> a <tt>'s'</tt> event to <tt>./event</tt>. </li>
 <li> If the default service state is up (i.e. there is no <tt>./down</tt> file),
s6-supervise spawns <tt>./run</tt>. </li>
 <li> s6-supervise sends a <tt>'u'</tt> event to <tt>./event</tt> whenever it
successfully spawns <tt>./run</tt>. </li>
 <li> When <tt>./run</tt> dies, s6-supervise sends a <tt>'d'</tt> event to <tt>./event</tt>.
It then spawns <tt>./finish</tt> if it exists.
<tt>./finish</tt> will have <tt>./run</tt>'s exit code as first argument, or 256 if
<tt>./run</tt> was signaled; it will have the number of the signal that killed <tt>./run</tt>
as second argument, or an undefined number if <tt>./run</tt> was not signaled. </li>
 <li> By default, <tt>./finish</tt> must exit in less than 5 seconds. If it takes more than that,
s6-supervise kills it with a SIGKILL. This can be configured via the
<tt>./timeout-finish</tt> file, see the description in the
<a href="servicedir.html">service directory page</a>. </li>
 <li> When <tt>./finish</tt> dies (or is killed),
s6-supervise sends a <tt>'D'</tt> event to <tt>./event</tt>. Then
it restarts <tt>./run</tt> unless it has been told not to. </li>
 <li> If <tt>./finish</tt> exits 125, then s6-supervise sends a <tt>'O'</tt> event
to <tt>./event</tt> <em>before</em> the <tt>'D'</tt>, and it
<strong>does not restart the service</strong>, as if <tt>s6-svc -O</tt> had
been called. This can be used to signify permanent failure to start the service. </li>
 <li> There is a minimum 1-second delay between two <tt>./run</tt> spawns, to avoid busylooping
if <tt>./run</tt> exits too quickly. </li>
 <li> When killed or asked to exit, it waits for the service to go down one last time, then
sends a <tt>'x'</tt> event to <tt>./event</tt> before exiting 0. </li>
</ul>

<p>
 Make sure to also check the <a href="servicedir.html">service directory</a>
documentation page, for the full list of files that can be present in a service
directory and impact s6-supervise's behaviour in any way.
</p>

<h2> Usage notes </h2>

<ul>
 <li> s6-supervise is a long-lived process. It normally runs forever, from the system's
boot scripts, until shutdown time; it should not be killed or told to exit. If you have
no use for a service, just turn it off; the s6-supervise process does not hurt. </li>
 <li> Even in boot scripts, s6-supervise should normally not be run directly. It's
better to have a collection of <a href="servicedir.html">service directories</a> in a
single <a href="scandir.html">scan directory</a>, and just run
<a href="s6-svscan.html">s6-svscan</a> on that scan directory. s6-svscan will spawn
the necessary s6-supervise processes, and will also take care of logged services. </li>
 <li> You can use <a href="s6-svc.html">s6-svc</a> to send commands to the s6-supervise
process; mostly to change the service state and send signals to the monitored
process. </li>
 <li> You can use <a href="s6-svok.html">s6-svok</a> to check whether s6-supervise
is successfully running. </li>
 <li> You can use <a href="s6-svstat.html">s6-svstat</a> to check the status of a
service. </li>
 <li> s6-supervise maintains internal information inside the <tt>./supervise</tt>
subdirectory of <em>servicedir</em>. <em>servicedir</em> itself can be read-only,
but both <em>servicedir</em><tt>/supervise</tt> and <em>servicedir</em><tt>/event</tt>
need to be read-write. </li>
</ul>

<h2> Implementation notes </h2>

<ul>
 <li> s6-supervise tries its best to stay alive and running despite possible
system call failures. It will write to its standard error everytime it encounters a
problem. However, unlike <a href="s6-svscan.html">s6-svscan</a>, it will not go out
of its way to stay alive; if it encounters an unsolvable situation, it will just
die. </li>
 <li> Unlike other "supervise" implementations, s6-supervise is a fully asynchronous
state machine. That means that it can read and process commands at any time, even
when the machine is in trouble (full process table, for instance). </li>
 <li> s6-supervise <em>does not use malloc()</em>. That means it will <em>never leak
memory</em>. <small>However, s6-supervise uses opendir(), and most opendir()
implementations internally use heap memory - so unfortunately, it's impossible to
guarantee that s6-supervise does not use heap memory at all.</small> </li>
 <li> s6-supervise has been carefully designed so every instance maintains as little
data as possible, so it uses a very small
amount of non-sharable memory. It is not a problem to have several
dozens of s6-supervise processes, even on constrained systems: resource consumption
will be negligible. </li>
</ul>

</body>
</html>