about summary refs log tree commit diff
path: root/doc/notifywhenup.html
blob: 762a62be47761900095e51b721c39d989cf22650 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: service startup notifications</title>
    <meta name="Description" content="s6: service startup notifications" />
    <meta name="Keywords" content="s6 ftrig notification notifier writer libftrigw ftrigw startup U up svwait s6-svwait" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> Service startup notifications </h1>

<p>
 It is easy for a process supervision suite to know when a service that was <em>up</em>
is now <em>down</em>: the long-lived process implementing the service is dead. The
supervisor, running as the daemon's parent, is instantly notified via a SIGCHLD.
When it happens, <a href="s6-supervise.html">s6-supervise</a> sends a 'd' event
to its <tt>./event</tt> <a href="fifodir.html">fifodir</a>, so every subscriber
knows that the service is down. All is well.
</p>

<p>
 It is much trickier for a process supervision suite to know when a service
that was <em>down</em> is now <em>up</em>. The supervisor forks and execs the
daemon, and knows when the exec has succeeded; but after that point, it's all
up to the daemon itself. Some daemons do a lot of initialization work before
they're actually ready to serve, and it is impossible for the supervisor to
know exactly <em>when</em> the service is really ready.
<a href="s6-supervise.html">s6-supervise</a> sends a 'u' event to its
<tt>./event</tt> <a href="fifodir.html">fifodir</a> when it successfully
spawns the daemon, but any subscriber
reacting to 'u' is subject to a race condition - the service provided by the
daemon may not be ready yet.
</p>

<p>
 Reliable startup notifications need support from the daemons themselves.
Daemons should do two things to signal the outside world that they are
ready:
</p>

<ol>
 <li> Update a state file, so other processes can get a snapshot
of the daemon's state </li>
 <li> Send an event to processes waiting for a state change. </li>
</ol>

<p>
 This is complex to implement in every single daemon, so s6 provides
tools to make it easier for daemon authors, without any need to link
against the s6 library or use any s6-specific construct:
 daemons can simply write a line to a file descriptor of their choice,
then close that file descriptor, when they're ready to serve. This is
a generic mechanism that some daemons already implement.
</p>

<p>
 s6 supports that mechanism natively: when the
<a href="servicedir.html">service directory</a> for the daemon contains
a valid <tt>notification-fd</tt> file, the daemon's supervisor, i.e. the
<a href="s6-supervise.html">s6-supervise</a> program, will properly catch
the daemon's message, update the status file (<tt>supervise/status</tt>),
then notify all the subscribers
with a <tt>'U'</tt> event, meaning that the service is now up and ready.
</p>

<p>
 This method should really be implemented in every long-running
program providing a service. When it is not the case, it's impossible
to provide reliable startup notifications, and subscribers should then
be content with the unreliable <tt>'u'</tt> events provided by s6-supervise.
</p>

<p>
Unfortunately, a lot of long-running programs do not offer that
functionality; instead, they provide a way to poll them, an external
program that runs and checks whether the service is ready. This is a
<a href="//skarnet.org/software/s6/ftrig.html">bad</a> mechanism, for
<a href="//skarnet.org/lists/supervision/1606.html">several</a>
reasons. Nevertheless, until all daemons are patched to notify their
own readiness, s6 provides a way to run such a check program to poll
for readiness, and route its result into the s6 notification system:
<a href="s6-notifyoncheck.html">s6-notifyoncheck</a>.
</p>

<h2> How to use a check program with s6 (i.e. readiness checking via polling) </h2>

<ul>
 <li> Let's say you have a daemon <em>foo</em>, started under s6 via a
<tt>/run/service/foo</tt> service directory, and that comes with a
<tt>foo-check</tt> program that exhibits different behaviours when
<em>foo</em> is ready and when it is not. </li>
 <li> Create an executable script <tt>/run/service/foo/data/check</tt>
that calls <tt>foo-check</tt>. Make sure this script exits 0 when
<em>foo</em> is ready and nonzero when it's not. </li>
 <li> In your <tt>/run/service/foo/run</tt> script that starts <em>foo</em>,
instead of executing into <tt>foo</tt>, execute into
<tt>s6-notifyoncheck foo</tt>. Read the
<a href="s6-notifyoncheck.html">s6-notifyoncheck</a> page if you need to
give it options to tune the polling. </li>
 <li> <tt>echo 3 &gt; /run/service/foo/notification-fd</tt>. If file descriptor
3 is already open when your run script executes <em>foo</em>, replace 3 with
a file descriptor you <em>know</em> is not already open. </li> 
 <li> That's it.
  <ul>
   <li> Your check script will be automatically invoked by
<a href="s6-notifyoncheck.html">s6-notifyoncheck</a>, until it succeeds. </li>
   <li> <a href="s6-notifyoncheck.html">s6-notifyoncheck</a> will send the
readiness notification to the file descriptor given in the <tt>notification-fd</tt>
file. </li>
   <li> <a href="s6-supervise.html">s6-supervise</a> will receive it and will
mark <em>foo</em> as ready. </li>
  </ul> </li>
</ul>

<h2> How to design a daemon so it uses the s6 mechanism <em>without</em> resorting to polling (i.e. readiness notification) </h2>

<p>
 The <a href="s6-notifyoncheck.html">s6-notifyoncheck</a> mechanism was
made to accommodate daemons that provide a check program but do not notify
readiness themselves; it works, but is suboptimal.
 If you are writing the <em>foo</em> daemon, here is how you can make things better:
</p>

<ul>
 <li> Readiness notification should be optional, so you should guard all
the following with a run-time option to <em>foo</em>. </li>
 <li> Assume a file descriptor other than 0, 1 or 2 is going to be open.
You can hardcode 3 (or 4); or you can make it configurable via a command line
option. See for instance the <tt>-D <em>notif</em></tt> option to the
<a href="//skarnet.org/software/mdevd/mdevd.html">mdevd</a> program. It
really doesn't matter what this number is; the important thing is that your
daemon knows that this fd is already open, and is not using it for another
purpose. </li>
 <li> Do nothing with this file descriptor until your daemon is ready. </li>
 <li> When your daemon is ready, write a newline to this file descriptor.
  <ul>
   <li> If you like, you may write other data before the newline, just in
case it is printed to the terminal. It is not necessary, and it is best to
keep that data short. If the line is read by
<a href="s6-supervise.html">s6-supervise</a>, it will be entirely ignored;
only the newline is important. </li>
  </ul>
 <li> Then close that file descriptor. </li>
</ul>

<p>
 The user who then makes <em>foo</em> run under s6 just has to do the
following:
</p>

<ul>
 <li> Write 3, or the file descriptor the <em>foo</em> daemon uses
to notify readiness, to the <tt>/run/service/foo/notification-fd</tt> file. </li>
 <li> In the <tt>/run/service/foo/run</tt> script, invoke <tt>foo</tt>
with the option that activates the readiness notification. If <em>foo</em>
makes the notification fd configurable, the user needs to make sure that
the number that is given to this option is the same as the number that is
written in the <tt>notification-fd</tt> file. </li>
 <li> And that is all. <strong>Do not</strong> use <tt>s6-notifyoncheck</tt>
in this case, because you do not need to poll to know whether <em>foo</em>
is ready; instead, <em>foo</em> will directly communicate its readiness to
<a href="s6-supervise.html">s6-supervise</a>, and that is a much more efficient
mechanism. </li>
</ul>

 <h2> What does <a href="s6-supervise.html">s6-supervise</a> do with this
readiness information? </h2>

<ul>
 <li> <a href="s6-supervise.html">s6-supervise</a> maintains a readiness
state for other programs to read. You can check for it, for instance, via
the <a href="s6-svstat.html">s6-svstat</a> program. </li>
 <li> <a href="s6-supervise.html">s6-supervise</a> also broadcasts the
readiness event to programs that are waiting for it - for instance the
<a href="s6-svwait.html">s6-svwait</a> program. This can be used to
make sure that other programs only start when the daemon is ready. For
instance, the
<a href="//skarnet.org/software/s6-rc/">s6-rc</a> service manager uses
that mechanism to bring sets of services up or down: a service starts as
soon as all its dependencies are ready, but never earlier. </li>
</ul>

</body>
</html>