about summary refs log tree commit diff
path: root/doc/Netpbm.programming
blob: c4d38ed4cb9653eb66a97fb223525c358a0558ca (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
This file is an attempt to give some basic guidelines for those who
wish to write new Netpbm programs.  The guidelines ensure:

  A) Your program functions consistently with the rest of the package.

  B) Your program works in myriad environments on which you can't test it.

  C) You don't miss an important detail of the Netpbm formats.

  D) Your program is immune to small changes in the Netpbm formats.

The easiest way to write your own Netpbm program is to take an
existing one, similar to the one you want to write, and to modify
it.  This saves a lot of time, and ensures conformity with these rules.
But pick a recent one (check the file modification date and the
doc/HISTORY file), because many things you see in old programs are
grandfathered in.  Pamtopnm is good example of a thoroughly modern
Netpbm program.  Pamcut is another good one.


COLLECTION PARAMETERS
---------------------

The philosophy that guides what goes into Netpbm and what doesn't is
one of maximum distribution of function.  Contributions to Netpbm are
refused very rarely, and usually because there is already some better
way to do what the contribution attempts to do, so that the
contribution would just make the package more confusing and thus
harder to use.  The second most common reason for declining a
contribution is that it falls well outside Netpbm's scope.  That too
would make the package more confusing and thus harder to use.

"Nobody will use that" is not a reason for leaving something out of
Netpbm.

"That's different from how other Netpbm programs work" is similarly 
not an excuse for refusing distribution of a feature.  (But it's a
good reason to accept a change to make a feature consistent!)

The standard for adding something to Netpbm isn't that it be perfect,
or even a net improvement.  The standard is that it not make Netpbm
worse.  Poor quality additions are made all the time, because a
program that doesn't work well is usually no worse than no program at
all.


DEVELOPMENT PROCESS
-------------------

Just email your code to the Netpbm maintainer.  Base it on the most recent
Netpbm Development release if possible.

The preferred form for changes to existing files is a unified diff patch --
the conventional Unix means of communicating code changes.  A Subversion
'svn diff' creates that easily.

You should update or create documentation too.  But if you don't, the Netpbm
maintainer will do the update before releasing your code.  The source files
for the documentation are the HTML files in the 'userguide' directory of the
Netpbm Subversion repository: https://svn.code.sf.net/p/netpbm/code/userguide.
The identical files are at http://netpbm.sourceforge.net/doc/ .

There are some automated tests in the package - shell scripts in files
named such as "pbmtog3.test".  You can use those to verify your
changes.  You should also add to these tests and create new ones.  But
most developers don't.

As you code and test, do 'make dep' to create the header file dependency files
(depend.mk).  Do it after 'configure' and every time you change the
dependencies.  If you don't, 'make' won't know to recompile things when you
change the header files.  For the convenience of users who are just building,
and not changing anything, the make files do work without a 'make dep', but do
so by automatically generating empty depend.mk files.  This is not what you
want as a developer.


CODING GUIDELINES
-----------------

The Netpbm maintainer accepts programs that do not meet these guidelines,
so don't feel you need to hold back a contribution because you wrote it
before you read these.

* See the Netpbm library documentation to see what library functions
  you should be using and how.  You can find it at:

  http://netpbm.sourceforge.net/doc/libnetpbm.html

* See the specifications of the Netpbm formats at:

  http://netpbm.sourceforge.net/doc/pbm.html
  http://netpbm.sourceforge.net/doc/pgm.html
  http://netpbm.sourceforge.net/doc/ppm.html
  http://netpbm.sourceforge.net/doc/pam.html

  but don't depend very much on these; use the library functions
  instead to read and write the image files.

* You should try to use the "pam" library functions, which are newer than the
  "pbm", "pgm", and "pnm" ones.  The "pam" functions read and write all four
  Netpbm formats and make code easier to write and to read.

* A new program should generate PAM output.  Note that any program that uses
  the modern Netpbm library can read PAM even if it was designed for the older
  formats.  For other programs, one can convert PAM to the older formats
  with 'pamtopnm'.

* If your program involves transparency (alpha masks), you have two
  alternatives.  The older method is to use a separate PGM file for the
  alpha mask, in the style of Pngtopnm/Pnmtopng and Giftopnm/Pnmtogif,
  and use command syntax like those programs.  See the PGM format spec
  for details.
  
  A newer method involves a PAM image with an alpha plane (tuple type
  RGB_ALPHA, etc.).  This is preferred because it's easier for users
  to pipe stuff around.  Pamtotga is an example of this.

* Declare all your symbols except 'main' as static so that they won't
  cause problems to other programs when you do a "merge build" of 
  Netpbm.

* Always start the code in main() with a call to pm_proginit().

* Use shhopt for option processing.  i.e. call optParseOptions3().
  This is really easy if you just copy the parseCommandLine() function
  and struct cmdlineInfo declaration from pamcut.c and adapt it to 
  your program. 

  When you do this, you get a command line syntax consistent with all the
  other Netpbm programs, you save coding time and debugging time, and it
  is trivial to add options later on.

  Do not use shhopt's short option alternative unless you need to be
  compatible with another program that has short options.  Short
  options are traditional one-character Unix options, which can be
  stacked up like "foo -cderx myfile", and they are far too unfriendly
  to be accepted by the Netpbm philosophy.  Note that long options in
  shhopt can always be abbreviated to the shortest unique prefix, even
  one character.

  In messages and examples in documentation, always refer to an option
  by its full name, not the abbreviation you usually use.  E.g. if you have
  a "-width" option which can be abbreviated "-w", don't use -w in 
  documentation.  -width is far clearer.
  
* Use pm_error() and pm_message() for error messages and other messages.
  Note that the universal -quiet option (processed by p?m_init()) causes
  messages issued via pm_message() to be suppressed.  And that's what
  Netpbm's architecture requires.

* The argument to pm_error() and pm_message() is a string of text, not
  a formatted message.  Don't put newlines or other formatting characters
  in it.  These subroutines are designed to be flexible in how they issue
  the messages.

* Use MALLOCARRAY() to allocate space for an array and MALLOCVAR to allocate
  space for a non-array variable.  In fact, you usually want to save some
  programming tedium and use the NOFAIL versions of these (they never fail
  because they abort the program if memory is not available).  These avoid
  array bounds violations.

* Use pm_tmpfile() to make a temporary file.  This avoids races that can
  be used to compromise security.

* Properly use maxvals.  As explained in the format specifications, every
  sample value in an image must be interpreted relative to the image's
  maxval.  For example, a pixel with value 24 in an image with maxval 24
  is the same brightness as a pixel with value 255 in an image with a
  maxval of 255.

  255 is a popular maxval (use the PPM_MAXMAXVAL etc. macros) because it
  makes samples fit in a single byte and at one time was the maximum 
  possible maxval in the format.

  Note that the Pamdepth program converts an image from one maxval to another.

* Don't include extra function.  If you can already do something by 
  piping the input or output of your program through another Netpbm
  program, don't make an option on your program to do it.  That's the
  Netpbm philosophy -- simple building blocks.

  Similarly, if your program does two things which would be useful
  separately, write two programs and advise users to pipe them
  together.  Or add a third program that simply runs the first two.

* Your program should, if appropriate, allow the user to use Standard
  Input and Output for images.  This is because Netpbm users commonly
  use pipes.  As an alternative to Standard Input, the user should be able
  to name the input file as a non-option program argument.  But the user
  should not be able to name the output file if Standard Output is
  feasible.  That's just the Netpbm tradition.

* Don't forget to write a proper html documentation page.  Get an
  example to use as a template from
  <http://netpbm.sourceforge.net/doc/directory.html>.

* No Netpbm source code may contain tab characters.  If you
  generate such a file, the Netpbm maintainer will convert it to spaces
  and possibly change all your indenting at the same time.  Tabs are not
  appropriate for code that multiple people must edit because they don't
  necessarily look the same in every editing environment, and in most
  editing environments, you can't even tell which spaces are actually in
  the file and which came from the editor, expanding tabs.  Spaces, on
  the other hand, look the same for everyone.  Modern editors let you
  compose code just as easily with spaces as with tabs.

* Limit your C code to original ANSI C.  Not C99.  Not C89.  Not C++.  Don't
  use // for comments or put declarations in the interior of a block.
  Interpreted languages are OK, with Perl being preferred.

* Make output invariant with input.  The program should produce the same output
  when fed the same input.  This helps with testing and also reduces the chance
  of bugs being encountered randomly.

  This is an issue when there are "don't care" bits in the output.  You should
  normally make those zero.

  The valgrind utility helps you identify instances of undefined bits and
  bytes getting to output.

  A common place to find "don't care" bits is padding at the end of a raster
  row.  You can use pbm_cleanrowend_packed() to clear padding bits when you
  use pbm_writepbmrow_packed().

  Some programs are designed to produce random output.  For those, include a
  -seed option to specify the seed of the random number generator so that a
  tester can make the output predictable.


* Never assume that input will be valid; make sure your program can
  handle corrupt input gracefully.

  Watch out for arithmetic overflows caused by large numbers in the input.

  Be careful when writing decoders.  Make sure that corruptions in the
  input image do not cause buffer overruns.




DISCONTINUED CODING GUIDELINES
------------------------------

Here are some things you will see in old Netpbm programs, but they are
obsolete and you shouldn't propagate them into a new program:

* Use of pbm_init(), pgm_init(), ppm_init(), and pnm_init().  We use
  pm_proginit() now.

* Tolerating non-standard C libraries.  You may assume all users have
  ANSI and POSIX compliant C libraries.  E.g. use strrchr() and forget
  about rindex().

* pm_keymatch() for option processing.  Use shhopt instead, as described
  above.

* optParseOptions() and optParseOptions2().  These are obsoleted
  by optParseOptions3(), which is easier to use and more powerful.

* K&R C function declarations.  Always use ANSI function declarations
  exclusively (e.g. use 

      void foo(int arg1){} 

  instead of 

       void foo(arg1)
           int arg1;
           {}

* for (col=0, xP = row; col < cols; col++, xP++)
      foo(*xP);

  This was done in the days before optimizing compilers because it ran faster
  than the more readable:

    for (col=0; col < cols; col++)
        foo(row[col]);

  Now, just use the latter.


CODING STYLE
------------

We do not generally mandate any basic coding style in these
guidelines.  Where you put your braces is a matter of personal style
and other people working with your code will just have to live with
it.  However, people working with your code might just recode it into
another style if it makes it easier for them to read the code and
have confidence in their changes.

But if you have no preference, the following is what the Netpbm
maintainer currently prefers.  Essentially, it is clean, elegant
computer science-type code rather than brute force engineering-type
code.  Modular and structured above all.

* No gotos.  This includes all variations of goto:  break, continue, leave,
  mid-function return.  An exception: aborting the entire program in the 
  middle of something is allowed.

* No functions with side effects.  This is a tough one, since
  functions with side effects is highly traditional C.  In fact, the
  creators of C didn't even have a concept of a subroutine.  However,
  the average human brain, especially one trained in math, cannot
  easily follow code where a function both computes a value and does
  other stuff.
  
  For the purpose of this discussion, we say that a C function whose return
  type is void is not a "function," but a "subroutine."

  So a function should never change anything in the program.  All it does
  is compute a value.

  Where you have to call an external function that has side effects (virtually
  anything in the standard C library, for example), put it in a simple
  assignment function that captures its return value and otherwise consider
  it a subroutine:

    rc = fopen(...)
    if (rc ...)

* No reuse of variables.  Most variables should be set at most once.
  Don't say "A is 5" and then later on, "no, A is 6."  A reader
  should be able to take that first "A is 5" as a statement of fact
  and not hunt for all the places it might be contradicted.  A
  variable that represents the state of execution of an algorithm is an
  obvious exception.

  Use "const" everywhere you possibly can.

  Especially never modify the argument of a function.  All function arguments
  should be "const".

  Don't use initializers except for with constants.  The first value a
  variable has is a property of the algorithm that uses the variable, not of
  the variable.  A reader expects to see the whole algorithm in one place, as
  opposed to having part of it hidden in the declaration of the algorithm
  variables.

* Avoid global variables.  Sometimes, a value is truly global and
  passing it as a parameter just muddies the code.  But most of the
  time, any external value to which a function refers should be one of
  its arguments.  

  Declare a variable in the most local scope possible.

  Global constants are OK.

* Do not use static variables, except for global constants.

* Keep subroutines small.  Generally under 50 lines.  But if the
  routine is a long sequence of simple, similar things, it's OK for it
  run on ad infinitem.

* Use the type "bool" for boolean variables.  "bool" is defined in Netpbm
  headers.  Use TRUE and FALSE as its values.

* Do not say "if (a)" when you mean "if (a != 0)".  Use "if (a)" only if a is
  a boolean variable.  Or where it's defined so that zero is a special value
  that means "doesn't exist".

* Do multiword variable names in camel case: "multiWordName".  Underscores
  waste valuable screen real estate.

* If a variable's value is a pointer to something, it's name should reflect
  that, as opposed to lying and saying the value is the thing pointed to.  A
  "P" on the end is the conventional way to say "pointer to."  E.g. if a
  variable's value is a pointer to a color value, name it "colorP", not
  "color".

* A variable that represents the number of widgets should be "widgetCt",
  ("widget count"), not "widgets".  The latter should represent the actual
  widgets instead.

* Put "const" as close as possible to the thing that is constant.
  "int const width", not "const int width".  When pointers to
  constants are involved, it makes it much easier to read.

* Free something in the same subroutine that allocates it.  Have exactly 
  one free per allocate (this comes naturally if you eliminate gotos).

* Dynamically allocate buffers, for example raster buffers, rather than use
  automatic variables.  Accidentally overrunning an automatic variable,
  typically stored on the stack, is much more dangerous than overrunning a
  variable stored in the heap.

* Use pm_asprintf() to compose strings, instead of sprintf(), strcat(), and
  strcpy().  pm_asprintf() is essentially the same as GNU asprintf(), i.e.
  sprintf(), except it dynamically allocates the result memory.  This
  effortlessly makes it impossible to overrun the result buffer.  Use
  pm_strfree() to free the result memory.  You usually need not worry about
  the pathological case that there is no memory available for the result,
  because in that case, pm_asprintf() returns a constant string "OUT OF MEMORY"
  and in most cases, that won't cause a disaster - just incorrect behavior that
  is reasonable in the face of such a pathological situation.

* Do not use the "register" qualifier of a variable.


MISCELLANEOUS
-------------

Object code immutability across compilations
--------------------------------------------

In a normal build process the same source code always produces the same object
code.  (Of course this depends on your compiler and its settings: if your
compiler writes time stamps into the objects, they will naturally be different
with each compilation.)

The file lib/compile.h has a time stamp which gets written into libnetpbm.a
and libnetpbm.so .  This will affect individual binary executables if they are
statically linked.

If compile.h does not exist it will be created by the program
buildtools/stamp-date.  Once created, compile.h will stay unchanged until you
do "make clean".

If you make alterations to source which are entirely cosmetic (for example,
changes to comments or amount of indentation) and need to confirm that the
actual program logic is not affected, compare the object files.  If you need
to compare library files (or for some reason, statically linked binaries) make
sure that the same compile.h is used throughout.