about summary refs log tree commit diff
path: root/doc/dieshdiedie.html
blob: 3bac1601da2126bca0a41f768811886f4af83b73 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>execline: why execline and not sh</title>
    <meta name="Description" content="execline: why execline and not sh" />
    <meta name="Keywords" content="execline sh shell script language" />
    <!-- <link rel="stylesheet" type="text/css" href="http://skarnet.org/default.css" /> -->
  </head>
<body>

<a href="index.html">execline</a><br />
<a href="http://skarnet.org/software/">Software</a><br />
<a href="http://skarnet.org/">skarnet.org</a><p />

<h1> Why not just use <tt>/bin/sh</tt>&nbsp;? </h1>


<a name="security">
<h2> Security </h2></a>

<p>
 One of the most frequent sources of security problems in programs
is <em>parsing</em>. Parsing is a complex operation, and it is easy to
make mistakes while designing and implementing a parser. (See
<a href="http://cr.yp.to/qmail/guarantee.html">what Dan Bernstein says
on the subject</a>, section 5.)
</p>

<p>
 But shells parse all the time. Worse, the <em>essence</em>
of the shell is parsing: the parser and the runner are intimately
interleaved and cannot be clearly separated, thanks to the
<a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html">specification</a>.
The shell performs several kinds of expansions, automatic filename
globbing, and automatic word splitting, in an unintuitive order,
requiring users to memorize numerous arbitrary quoting rules in
order to achieve what they want. Pages
<a href="http://www.google.com/search?q=shell+script+pitfalls">abound</a>
where common mistakes are listed, more often than not leading to
security holes. Did you know that <tt>"$@"</tt> is a special case
of double quoting, because it will split the arguments into
several words, whereas every other use of double quotes in a shell is
meant to <em>prevent</em> splitting?
</p>

<p>
<tt>execlineb</tt> parses the script only once: when
reading it. The parser has been designed to be simple and systematic,
to reduce the potential for bugs - which you just cannot do
with a shell. After <tt>execlineb</tt> has split up the script into
words, no other parsing phase will happen, unless the user explicitly
requires it. Positional parameters, when
used, are never split, even if they contain spaces or newlines, unless
the user explicitly requires it. Users control exactly what
is split, what is done, and how.
</p>

<a name="portability">
<h2> Portability </h2></a>

<p>
 The shell language was designed to make scripts portable across various
versions of Unix. But it is actually really hard to write a portable shell
script. There are dozens of distinct
<tt>sh</tt> flavours, not even counting the openly incompatible
<tt>csh</tt> approach and its various <tt>tcsh</tt>-like followers.
The <tt>ash</tt>, <tt>bash</tt>, <tt>ksh</tt> and <tt>zsh</tt> shells
all exhibit a different behaviour, <em>even when they are
run with the so-called compatibility mode</em>. From what I have
seen on various experiments, only <tt>zsh</tt> is able to follow the
specification to the letter, at the expense of being big and complex to
configure. This is a source of endless problems for shell script writers,
who <em>should</em> be able to assume that a script will run everywhere,
but <em>cannot</em> in practice. Even a simple utility like <tt>test</tt>
cannot be used safely with the normalized options, because most shells
come with a builtin <tt>test</tt> that does <em>not</em> respect the
specification to the letter. And let's not get started about <tt>echo</tt>,
which has its own set of problems. Rich Felker has
<a href="http://www.etalabs.net/sh_tricks.html">a page</a> listing tricks
to use to write portable shell scripts. Writing a portable script should
not be that hard.
</p>

<p>
execline scripts are portable. There is no
complex syntax with opportunity to have an undefined or nonportable
behaviour. The execline package is portable across platforms:
there is no reason for vendors or distributors to fork their own
incompatible version.
 Scripts will
not break from one machine to another; if they do,
it's not a "portability problem", it's a bug. You are then encouraged
to find the program that is responsible for the different behaviour,
and send a bug-report to the program author - including me, if the
relevant program is part of the execline distribution.
</p>

<p>
 A long-standing problem with Unix scripts is the shebang line, which
requires an absolute path to the interpreter. Scripts are only portable
as is if the interpreter can be found at the same absolute path on every
system. With <tt>/bin/sh</tt>, it is <em>almost</em> the case (Solaris
manages to get it wrong by having a non-POSIX shell as <tt>/bin/sh</tt>
and requiring something like <tt>#!/usr/xpg4/bin/sh</tt> to get a POSIX
shell to interpret your script). Other scripting languages are not so
lucky: perl can be <tt>/bin/perl</tt>, <tt>/usr/bin/perl</tt>,
<tt>/usr/local/bin/perl</tt> or something else entirely. For those cases,
some people advocate the use of <tt>env</tt>: <tt>#!/usr/bin/env perl</tt>.
But first, <tt>env</tt> can only find interpreters that can be found via the
user's PATH environment variable, which defeats the purpose of having an
absolute path in the shebang line in the first place; and second, this only
displaces the problem: the <tt>env</tt> utility does not
have a guaranteed absolute path. <tt>/usr/bin/env</tt> is the usual
convention, but not a strong guarantee: it is valid for systems to have
<tt>/bin/env</tt> instead, for instance.
</p>

<p>
<tt>execline</tt> suffers from the same issues. <tt>#!/bin/execlineb</tt>&nbsp;?
<tt>#!/usr/bin/execlineb</tt>&nbsp;? This is the only portability problem that
you will find with execline, and it is common to every script language. 
</p>

<p>
 The real solution to this portability problem is a convention that
guarantees fixed absolute paths for executables, which the FHS does not do.
The <a href="http://cr.yp.to/slashpackage.html">slashpackage</a> convention is
such an initiative, and is well-designed; but as with every
convention, it only works if everyone follows it, and unfortunately,
slashpackage has not
found many followers. Nevertheless, like every skarnet.org package, execline
can be configured to follow the slashpackage convention.
</p>

<a name="simplicity">
<h2> Simplicity </h2></a>

<p>
 I originally wanted a shell that could be used on an embedded system.
Even the <tt>ash</tt> shell seemed big, so I thought of writing my
own. Hence I had a look at the
<a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html">sh
specification</a>... and ran away screaming.
This specification
is <em>insane</em>. It goes against every good programming
practice; it seems to have been designed only to give headaches
to wannabe <tt>sh</tt> implementors.
</p>

<p>
 POSIX cannot really be blamed for that: it only normalizes existing, historical
behaviour. One can argue whether it is a good idea to normalize atrocious
behaviour for historical reasons, as is the case with the infamous
<a href="http://pubs.opengroup.org/onlinepubs/9699919799/functions/gets.html">gets</a>
function, but this is the way it is.
</p>

<p>
 The fact remains that modern shells have to be compatible with that historical
nonsense and that makes them big and complex at best, or incompatible and ridden
with bugs at worst.
An OpenBSD developer said to me, when asked about the OpenBSD <tt>/bin/sh</tt>:
"It works, but it's far from not being a nightmare".
</p>

<p>
 Nobody should have
nightmare-like software on their system. Unix is simple. Unix
was designed to be simple. And if, as Dennis Ritchie said, "it takes a
genius to understand the simplicity", that's because incompetent people
took advantage of the huge Unix flexibility to write insanely crappy or
complex software. System administrators can only do a decent job when
they understand how the programs they run are supposed to work. People
are slowly starting to grasp this (or are they&nbsp;? We finally managed
to get rid of sendmail and BIND, but GNU/Linux users seem happy to
welcome the era of D-Bus and systemd. Will we ever learn&nbsp;?) - but even
<tt>sh</tt>, a seemingly simple and basic Unix program, is hard to
understand when you lift the cover.
</p>

<p>
 So I decided to forego sh entirely and take a new approach. So far it
has been working.
 The <a href="grammar.html">execline specification</a> is simple, and,
as I hope to have shown, easy to implement without too many bugs or
glitches.
</p>

<a name="performance">
<h2> Performance </h2></a>

<p>
 Since it was made to run on an embedded system, <tt>execline</tt> was
designed to be light in memory usage. And it is.
</p>

<ul>
 <li> No overhead due to interactive support. </li>
 <li> No overhead due to unneeded features. Since every command performs
its task then executes another command, all occupied resources are instantly
freed. By contrast, a shell stays in memory during the whole execution
time. </li>
 <li> Very limited use of the C library. Only the C interface to the
kernel's system calls, and some very basic functions like <tt>malloc()</tt>,
are used in the C library. In addition to avoiding the horrible interfaces
like <tt>stdio</tt> and the legacy libc bugs, this approach makes it easy
to statically compile execline - you will want to do that on an embedded
system, or just to gain performance. </li>
</ul>

<p>
 You can have hundreds of execline scripts running simultaneously on an
embedded box. Not exactly possible with a shell.
</p>

<p>
 For scripts that do not require many computations that a shell can do
without calling external programs,
 <tt>execline</tt> is <em>faster</em> than the shell.
Unlike <tt>sh</tt>'s
one, the <tt>execline</tt> parser is simple and
straightforward; actually, it's more of a lexer than a parser, because
the execline language has been designed to be LL(1) - keep it simple,
stupid.
execline scripts get analysed and launched practically without a delay.
</p>

<a name="comparison" />
<ul>
 <li>
 The best use case of execline is a linear, straightforward script, a
simple command line that does not require the shell's processing power.
In that case, execline will skip the shell's overhead and win big time
on resource usage and execution speed. </li>
 <li> For longer scripts that fork a few commands, with a bit of
control flow, on average, an execline script will run at roughly the
same speed as the equivalent shell script, while using less resources. </li>
 <li> The worst use case of execline is when the shell is used as a
programming language, and the script loops over complex internal constructs
that execline is unable to replicate without forking. In that case,
execline will waste a lot of time in fork/exec system calls that the
shell does not have to perform, and be noticeably slower. execline has
been designed as a <em>scripting</em> language, not as a <em>programming</em>
language: it is efficient at being the glue that ties together programs
doing a job, not at implementing a program's logic. </li>
</ul>

<a name="limitations">
<h2> execline limitations </h2></a>

<ul>
 <li> <tt>execline</tt> can only handle scripts that fit in one <em>argv</em>.
Unix systems have a limit on the <em>argv</em>+<em>envp</em> size;
<tt>execline</tt> cannot execute scripts that are bigger than this limit.</li>
 <li> <tt>execline</tt> commands do not perform signal handling. It is not
possible to trap signals efficiently inside an execline script. The
<a href="trap.html">trap</a> binary, part of the execline suite, provides a
signal management primitive, but it is more limited and slower than its
equivalent shell construct. </li>
 <li> Due to the <tt>execline</tt> design, maintaining a state is
difficult. Information has to transit via environment variables or
temporary files, which makes commands like
<a href="loopwhilex.html">loopwhilex</a> a bit painful to handle. </li>
 <li> Despite all its problems, the main shell advantage (apart from
being available on every Unix platform, that is) is that it
is often <em>convenient</em>. Shell constructs can be terse and short,
where <tt>execline</tt> constructs will be verbose and lengthy. </li>
 <li> An execline script is generally heavier on <tt>execve()</tt> than
the average shell script - notably in programs where the shell can
use builtins. This can lead to a performance loss, especially when
executed programs make numerous calls to the dynamic linker: the system
ends up spending a lot of time resolving dynamic symbols. If it is a
concern to you, you should try and <em>statically compile</em> the
execline package, to eliminate the dynamic resolution costs. Unless
you're heavily looping around <tt>execve()</tt>,
the remaining costs will be negligible. </li>
</ul>

</body>
</html>