------------------------------ GUIDELINES FOR ZSH DEVELOPMENT ------------------------------ Zsh is currently developed and maintained by the Zsh Development Group. This development takes place by mailing list. Check the META-FAQ for the various zsh mailing lists and how to subscribe to them. The development is very open and anyone is welcomed and encouraged to join and contribute. Because zsh is a very large package whose development can sometimes be very rapid, we kindly ask that people observe a few guidelines when contributing patches and feedback to the mailing list. These guidelines are very simple and hopefully should make for a more orderly development of zsh. Tools ----- To develop (as opposed to just build) zsh, you'll need a few specialised tools: * GNU autoconf, version 2.12 or later. (Earlier versions mostly work, but part of the configuration system now relies on part of the format of the config.status files that get generated. See the comments in configure.in and at the end of Src/mkmakemod.sh for an explanation.) Note that configure.in has not yet been modified to work with autoconf 2.50. Use version 2.13 until configure.in and associated files have been updated. * GNU m4. (Required by autoconf.) * yodl. * texi2html. Patches ------- * Send all patches to the mailing list rather than directly to me. * Send only context diffs "diff -c oldfile newfile" or unified diffs "diff -u oldfile newfile". They are much easier to read and understand while also allowing the patch program to patch more intelligently. Please make sure the filenames in the diff header are relative to the top-level directory of the zsh distribution; for example, it should say "Src/init.c" rather than "init.c" or "zsh/Src/init.c". * Please put only one bug fix or feature enhancement in a single patch and only one patch per mail message. This helps me to multiplex the many (possibly conflicting) patches that I receive for zsh. You shouldn't needlessly split patches, but send them in the smallest LOGICAL unit. * If a patch depends on other patches, then please say so. Also please mention what version of zsh this patch is for. * Please test your patch and make sure it applies cleanly. It takes considerably more time to manually merge a patch into the baseline code. * There is now a zsh patch archive. To have your patches appear in the archive, send them to the mailing list with a Subject: line starting with "PATCH:". Testing ------- * Because zsh has a huge number of different options and interacts with a wide range of human and artificial life, it is very difficult to test the shell thoroughly. For this purpose, the Test subdirectory exists. It consists of a driver script (ztst.zsh) and various test files (*.ztst) in a format which is described in B01cd.ztst, which acts as a template. It is designed to make it easy to provide input to chunks of shell code and to test the corresponding standard output, error output and exit status. * There is not much there yet, but please don't let that put you off adding tests for basic syntactic features, builtins, options etc. which you know to be flakey or to have had difficulties in the past. Better support for testing job control and interactive features is expected to follow eventually. * The directory is not part of the usual process of building and installation. To run the tests, go to Test and `make check'. Please report any errors with all the usual information about the zsh version and the system you are using. C coding style -------------- * The primary language is ANSI C as defined by the 1989 standard, but the code should always be compatible with late K&R era compilers ("The C Programming Language" 1st edition, plus "void" and "enum"). There are many hacks to avoid the need to actually restrict the code to K&R C -- check out the configure tests -- but always bear the compatibility requirements in mind. In particular, preprocessing directives must have the "#" unindented, and string pasting is not available. * Conversely, there are preprocessor macros to provide safe access to some language features not present in pure ANSI C, such as variable-length arrays. Always use the macros if you want to use these facilities. * Avoid writing code that generates warnings under gcc with the default options set by the configure script. For example, write "if ((foo = bar))" rather than "if (foo = bar)". * Please try not using lines longer than 79 characters. * The indent/brace style is Kernighan and Ritchie with 4 characters indentations (with leading tab characters replacing sequences of 8 spaces). This means that the opening brace is the last character in the line of the if/while/for/do statement and the closing brace has its own line: if (foo) { do that } * Put only one simple statement on a line. The body of an if/while/for/do statement has its own line with 4 characters indentation even if there are no braces. * Do not use space between the function name and the opening parenthesis. Use space after if/for/while. Use space after type casts. * Do not use (unsigned char) casts since some compilers do not handle them properly. Use the provided STOUC(X) macro instead. * If you use emacs 19.30 or newer you can put the following line to your ~/.emacs file to make these formatting rules the default: (add-hook 'c-mode-common-hook (function (lambda () (c-set-style "BSD")))) * Function declarations must look like this: /**/ int foo(char *s, char **p) { function body } There must be an empty line, a line with "/**/", a line with the type of the function, and finally the name of the function with typed arguments. These lines must not be indented. The script generating function prototypes and the ansi2knr program depend on this format. * Variable declarations must similarly be preceded by a line containing only "/**/", for the prototype generation script. The declaration itself should be all on one line (except for multi-line initialisers). * Preprocessor directives that affect the function/variable declarations must also be preceded by a "/**/" line, so that they get copied into the prototype lists. * There are three levels of visibility for a function or variable. It can be file-local, for which it must be marked with the keyword "static" at the front of the declaration. It can be visible to other object files in the same module, for which it requires no extra keyword. Or it can be made available to the entire program (including other dynamically loaded modules), for which it must be marked with the pseudo-keyword "mod_export" at the front of the declaration. Symbols should have the least visibility possible. * Leave a blank line between the declarations and statements in a compound statement, if both are present. Use blank lines elsewhere to separate groups of statements in the interests of clarity. There should never be two consecutive blank lines. * Each .c file *must* #include the .mdh header for the module it is a part of and then its own .pro file (for local prototypes). It may also #include other system headers. It *must not* #include any other module's headers or any other .pro files. Modules ------- Modules have hierarchical names. Name segments are separated by `/', and each segment consists of alphanumerics plus `_'. Relative names are never used; the naming hierarchy is strictly for organisational convenience. Each module is described by a file with a name ending in `.mdd' somewhere under the Src directory. This file is actually a shell script that will sourced when zsh is built. To describe the module it can/should set the following shell variables: - name name of the module - link `static', `dynamic' or `no', as described in INSTALL. In addition, the value `either' is allowed in the .mdd file, which will be converted by configure to `dynamic' if that is available, else `static'. May also be a command string, which will be run within configure and whose output is used to set the value of `link' in config.modules. This allows a system-specific choice of modules. For example, link='case $host in *-hpux*) echo dynamic; ;; *) echo no; ;; esac' - load `yes' or `no': whether the shell should include hooks for loading the module automatically as necessary. (This corresponds to an `L' in xmods.conf in the old mechanism.) - moddeps modules on which this module depends (default none) - nozshdep non-empty indicates no dependence on the `zsh/main' pseudo-module - alwayslink if non-empty, always link the module into the executable - autobins builtins defined by the module, for autoloading - autoinfixconds infix condition codes defined by the module, for autoloading (without the leading `-') - autoprefixconds like autoinfixconds, but for prefix condition codes - autoparams parameters defined by the module, for autoloading - automathfuncs math functions defined by the module, for autoloading - objects .o files making up this module (*must* be defined) - proto .syms files for this module (default generated from $objects) - headers extra headers for this module (default none) - hdrdeps extra headers on which the .mdh depends (default none) - otherincs extra headers that are included indirectly (default none) Be sure to put the values in quotes. For further enlightenment have a look at the `mkmakemod.sh' script in the Src directory of the distribution. Modules have to define four functions which will be called automatically by the zsh core. The first one, named `setup_', should set up any data needed in the module, at least any data other modules may be interested in. The second one, named `boot_', should register all builtins, conditional codes, and function wrappers (i.e. anything that will be visible to the user) and will be called after the `setup_'-function. The third one, named `cleanup_', is called when the user tries to unload a module and should de-register the builtins etc. The last function, `finish_' is called when the module is actually unloaded and should finalize all the data initialized in the `setup_'-function. In short, the `cleanup_'-function should undo what the `boot_'-function did, and the `finish_'-function should undo what the `setup_'-function did. All of these functions should return zero if they succeeded and non-zero otherwise. Builtins are described in a table, for example: static struct builtin bintab[] = { BUILTIN("example", 0, bin_example, 0, -1, 0, "flags", NULL), }; Here `BUILTIN(...)' is a macro that simplifies the description. Its arguments are: - the name of the builtin as a string - optional flags (see BINF_* in zsh.h) - the C-function implementing the builtin - the minimum number of arguments the builtin needs - the maximum number of arguments the builtin can handle or -1 if the builtin can get any number of arguments - an integer that is passed to the handler function and can be used to distinguish builtins if the same C-function is used to implement multiple builtins - the options the builtin accepts, given as a string containing the option characters (the above example makes the builtin accept the options `f', `l', `a', `g', and `s') - and finally a optional string containing option characters that will always be reported as set when calling the C-function (this, too, can be used when using one C-function to implement multiple builtins) The definition of the handler function looks like: /**/ static int bin_example(char *nam, char **args, char *ops, int func) { ... } The special comment /**/ is used by the zsh Makefile to generate the `*.pro' files. The arguments of the function are the number under which this function was invoked (the name of the builtin, but for functions that implement more than one builtin this information is needed). The second argument is the array of arguments *excluding* the options that were defined in the struct and which are handled by the calling code. These options are given as the third argument. It is an array of 256 characters in which the n'th element is non-zero if the option with ASCII-value n was set (i.e. you can easily test if an option was used by `if (ops['f'])' etc.). The last argument is the integer value from the table (the sixth argument to `BUILTIN(...)'). The integer return value by the function is the value returned by the builtin in shell level. To register builtins in zsh and thereby making them visible to the user the function `addbuiltins()' is used: /**/ int boot_example(Module m) { int ret; ret = addbuiltins(m->nam, bintab, sizeof(bintab)/sizeof(*bintab)); ... } The arguments are the name of the module (taken from the argument in the example), the table of definitions and the number of entries in this table. The return value is 1 if everything went fine, 2 if at least one builtin couldn't be defined, and 0 if none of the builtin could be defined. To de-register builtins use the function `deletebuiltins()': /**/ int cleanup_example(Module m) { deletebuiltins(m->nam, bintab, sizeof(bintab)/sizeof(*bintab)); ... } The arguments and the return value are the same as for `addbuiltins()' The definition of condition codes in modules is equally simple. First we need a table with the descriptions: static struct conddef cotab[] = { CONDDEF("len", 0, cond_p_len, 1, 2, 0), CONDDEF("ex", CONDF_INFIX, cond_i_ex, 0, 0, 0), }; Again a macro is used, with the following arguments: - the name of the condition code without the leading hyphen (i.e. the example makes the condition codes `-len' and `-ex' usable in `[[...]]' constructs) - an optional flag which for now can only be CONDF_INFIX; if this is given, an infix operator is created (i.e. the above makes `[[ -len str ]]' and `[[ s1 -ex s2 ]]' available) - the C-function implementing the conditional - for non-infix condition codes the next two arguments give the minimum and maximum number of string the conditional can handle (i.e. `-len' can get one or two strings); as with builtins giving -1 as the maximum number means that the conditional accepts any number of strings - finally as the last argument an integer that is passed to the handler function that can be used to distinguish different condition codes if the same C-function implements more than one of them The definition for the function looks like: /**/ static int cond_p_len(char **a, int id) { ... } The first argument is an array containing the strings (NULL-terminated like the array of arguments for builtins), the second argument is the integer value stored in the table (the last argument to `CONDDEF(...)'). The value returned by the function should be non-zero if the condition is true and zero otherwise. Note that no preprocessing is done on the strings. This means that no substitutions are performed on them and that they will be tokenized. There are three helper functions available: - char *cond_str(args, num, raw) The first argument is the array of strings the handler function got as an argument and the second one is an index into this array. The return value is the num'th string from the array with substitutions performed. If the last argument is zero, the string will also be untokenized. - long cond_val(args, num) The arguments are the same as for cond_str(). The return value is the result of the mathematical evaluation of the num'th string form the array. - int cond_match(args, num, str) Again, the first two arguments are the same as for the other functions. The third argument is any string. The result of the function is non-zero if the the num'th string from the array taken as a glob pattern matches the given string. Registering and de-registering condition codes with the shell is almost exactly the same as for builtins, using the functions `addconddefs()' and `deleteconddefs()' instead: /**/ int boot_example(Module m) { int ret; ret = addconddefs(m->nam, cotab, sizeof(cotab)/sizeof(*cotab)); ... } /**/ int cleanup_example(Module m) { deleteconddefs(m->nam, cotab, sizeof(cotab)/sizeof(*cotab)); ... } Arguments and return values are the same as for the functions for builtins. For defining parameters, a module can call `createparam()' directly or use a table to describe them, e.g.: static struct paramdef patab[] = { PARAMDEF("foo", PM_INTEGER, NULL, get_foo, set_foo, unset_foo), INTPARAMDEF("exint", &intparam), STRPARAMDEF("exstr", &strparam), ARRPARAMDEF("exarr", &arrparam), }; There are four macros used: - PARAMDEF() gets as arguments: - the name of the parameter - the parameter flags to set for it (from the PM_* flags defined in zsh.h) - optionally a pointer to a variable holding the value of the parameter - three functions that will be used to get the value of the parameter, store a value in the parameter, and unset the parameter - the other macros provide simple ways to define the most common types of parameters; they get the name of the parameter and a pointer to a variable holding the value as arguments; they are used to define integer-, scalar-, and array-parameters, so the variables whose addresses are given should be of type `long', `char *', and `char **', respectively For a description of how to write functions for getting or setting the value of parameters, or how to write a function to unset a parameter, see the description of the following functions in the `params.c' file: - `intvargetfn()' and `intvarsetfn()' for integer parameters - `strvargetfn()' and `strvarsetfn()' for scalar parameters - `arrvargetfn()' and `arrvarsetfn()' for array parameters - `stdunsetfn()' for unsetting parameters Note that if one defines parameters using the last two macros (for scalars and arrays), the variable holding the value should be initialized to either `NULL' or to a a piece of memory created with `zalloc()'. But this memory should *not* be freed in the finish-function of the module because that will be taken care of by the `deleteparamdefs()' function described below. To register the parameters in the zsh core, the function `addparamdefs()' is called as in: /**/ int boot_example(Module m) { int ret; ret = addparamdefs(m->nam, patab, sizeof(patab)/sizeof(*patab)) ... } The arguments and the return value are as for the functions used to add builtins and condition codes and like these, it should be called in the boot-function of the module. To remove the parameters defined, the function `deleteparamdefs()' should be called, again with the same arguments and the same return value as for the functions to remove builtins and condition codes: /**/ int cleanup_example(Module m) { deleteparamdefs(m->nam, patab, sizeof(patab)/sizeof(*patab)); ... } Modules can also define math functions. Again, they are described using a table: static struct mathfunc mftab[] = { NUMMATHFUNC("sum", math_sum, 1, -1, 0), STRMATHFUNC("length", math_length, 0), }; The `NUMMATHFUNC()' macro defines a math function that gets an array of mnumbers (the zsh type for representing values in arithmetic expressions) taken from the string in parentheses at the function call. Its arguments are the name of the function, the C-function implementing it, the minimum and maximum number of arguments (as usual, the later may be `-1' to specify that the function accepts any number of arguments), and finally an integer that is given unchanged to the C-function (to be able to implement multiple math functions in one C-function). The `STRMATHFUNC()' macro defines a math function that gets the string in parentheses at the call as one string argument (without the parentheses). The arguments are the name of the function, the C-function, and an integer used like the last argument of `NUMMATHFUNC()'. The C-functions implementing the math functions look like this: /**/ static mnumber math_sum(char *name, int argc, mnumber *argv, int id) { ... } /**/ static mnumber math_length(char *name, char *arg, int id) { ... } Functions defined with `NUMMATHFUNC' get the name of the function, the number of numeric arguments, an array with these arguments, and the last argument from the macro-call. Functions defined with `STRMATHFUNC' get the name of the function, the string between the parentheses at the call, and the last argument from the macro-call. Both types of functions return an mnumber which is a discriminated union looking like: typedef struct { union { zlong l; double d; } u; int type; } mnumber; The `type' field should be set to `MN_INTEGER' or `MN_FLOAT' and depending on its value either `u.l' or `u.d' contains the value. To register and de-register math functions, the functions `addmathfuncs()' and `deletemathfuncs()' are used: /**/ int boot_example(Module m) { int ret; ret = addmathfuncs(m->nam, mftab, sizeof(mftab)/sizeof(*mftab)); ... } /**/ int cleanup_example(Module m) { deletemathfuncs(m->nam, mftab, sizeof(mftab)/sizeof(*mftab)); ... } The arguments and return values are as for the functions used to register and de-register parameters, conditions, etc. Modules can also define function hooks. Other modules can then add functions to these hooks to make the first module call these functions instead of the default. Again, an array is used to define hooks: static struct hookdef foohooks[] = { HOOKDEF("foo", foofunc, 0), }; The first argument of the macro is the name of the hook. This name is used whenever the hook is used. The second argument is the default function for the hook or NULL if no default function exists. The last argument is used to define flags for the hook. Currently only one such flag is defined: `HOOKF_ALL'. If this flag is given and more than one function was added to the hook, all functions will be called (including the default function). Otherwise only the last function added will be called. The functions that can be used as default functions or that can be added to a hook have to be defined like: /**/ static int foofunc(Hookdef h, void *data) { ... } The first argument is a pointer to the struct defining the hook. The second argument is an arbitrary pointer that is given to the function used to invoke hooks (see below). The functions to register and de-register hooks look like those for the other things that can be defined by modules: /**/ int boot_foo(Module m) { int ret; ret = addhookdefs(m->nam, foohooks, sizeof(foohooks)/sizeof(*foohooks)) ... } ... /**/ int cleanup_foo(Module m) { deletehookdefs(m->nam, foohooks, sizeof(foohooks)/sizeof(*foohooks)); ... } Modules that define hooks can invoke the function(s) registered for them by calling the function `runhook(name, data)'. The first argument is the name of the hook and the second one is the pointer given to the hook functions as their second argument. Hooks that have the `HOOKF_ALL' flag call all function defined for them until one returns non-zero. The return value of `runhook()' is the return value of the last hook function called or zero if none was called. To add a function to a hook, the function `addhookfunc(name, func)' is called with the name of the hook and a hook function as arguments. Deleting them is done by calling `deletehookfunc(name, func)' with the same arguments as for the corresponding call to `addhookfunc()'. Alternative forms of the last three function are provided for hooks that are changed or called very often. These functions, `runhookdef(def, data)', `addhookdeffunc(def, func)', and `deletehookdeffunc(def, func)' get a pointer to the `hookdef' structure defining the hook instead of the name and otherwise behave like their counterparts. Finally, modules can define wrapper functions. These functions are called whenever a shell function is to be executed. The definition is simple: static struct funcwrap wrapper[] = { WRAPDEF(ex_wrapper), }; The macro `WRAPDEF(...)' gets the C-function as its only argument. This function should be defined like: /**/ static int ex_wrapper(List list, FuncWrap w, char *name) { ... runshfunc(list, w, name); ... return 0; } The first two arguments should only be used to pass them to `runshfunc()' which will execute the shell function. The last argument is the name of the function to be executed. The arguments passed to the function can be accessed vie the global variable `pparams' (a NULL-terminated array of strings). The return value of the wrapper function should be zero if it calls `runshfunc()' itself and non-zero otherwise. This can be used for wrapper functions that only need to run under certain conditions or that don't need to clean anything up after the shell function has finished: /**/ static int ex_wrapper(List list, FuncWrap w, char *name) { if (wrapper_need_to_run) { ... runshfunc(list, w, name); ... return 0; } return 1; } Inside these wrapper functions the global variable `sfcontext' will be set to a clue indicating the circumstances under which the shell function was called. It can have any of the following values: - SFC_DIRECT: the function was invoked directly by the user - SFC_SIGNAL: the function was invoked as a signal handler - SFC_HOOK: the function was automatically invoked as one of the special functions known by the shell (like `chpwd') - SFC_WIDGET: the function was called from the zsh line editor as a user-defined widget - SFC_COMPLETE: the function was called from the completion code (e.g. with `compctl -K func') If a module invokes a shell function (e.g. as a hook function), the value of this variable should only be changed temporarily and restored to its previous value after the shell function has finished. There is a problem when the user tries to unload a module that has defined wrappers from a shell function. In this case the module can't be unloaded immediately since the wrapper function is still on the call stack. The zsh code delays unloading modules until all wrappers from them have finished. To hide this from the user, the module's cleanup function is run immediately so that all builtins, condition codes, and wrapper function defined by the module are de-registered. But if there is some module-global state that has to be finalized (e.g. some memory that has to be freed) and that is used by the wrapper functions finalizing this data in the cleanup function won't work. This is why there are two functions each for the initialization and finalization of modules. The `boot'- and `cleanup'-functions are run whenever the user calls `zmodload' or `zmodload -u' and should only register or de-register the module's interface that is visible to the user. Anything else should be done in the `setup'- and `finish'-functions. Otherwise modules that other modules depend upon may destroy their state too early and wrapper functions in the latter modules may stop working since the state they use is already destroyed. Documentation ------------- * Edit only the .yo files. All other formats (man pages, TeXinfo, HTML, etc.) are automatically generated from the yodl source. * Always use the correct markup. em() is used for emphasis, and bf() for citations. tt() marks text that is literal input to or output from the shell. var() marks metasyntactic variables. * In addition to appropriate markup, always use quotes (`') where appropriate. Specifically, use quotes to mark text that is not a part of the actual text of the documentation (i.e., that it is being quoted). In principle, all combinations of quotes and markup are possible, because the purposes of the two devices are completely orthogonal. For example, Type `tt(xyzzy)' to let zsh know you have played tt(advent). Saying `plugh' aloud doesn't have much effect, however. In this case, "zsh" is normal text (a name), "advent" is a command name ocurring in the main text, "plugh" is a normal word that is being quoted (it's the user that says `plugh', not the documentation), and "xyzzy" is some text to be typed literally that is being quoted. * For multiple-line pieces of text that should not be filled, there are various models. - If the text is pure example, i.e. with no metasyntactic variables etc., it should be included within `example(...)'. The text will be indented, will not be filled and will be put into a fixed width font. - If the text includes mixed fonts, it should be included within `indent(...)'. The text is now filled unless `nofill(...)' is also used, and explicit font-changing commands are required inside. - If the text appears inside some other format, such as for example the `item()' list structure, then the instruction `nofill(...)', which simply turns off filling should be used; as with `indent(...)', explicit font changing commands are required. This can be used without `indent()' when no indentation is required, e.g. if the accumulated indentation would otherwise be too long. All the above should appear on their own, separated by newlines from the surrounding text. No extra newlines after the opening or before the closing parenthesis are required. Module names ------------ Modules have hierarchical names. Name segments are separated by `/', and each segment consists of alphanumerics plus `_'. Relative names are never used; the naming hierarchy is strictly for organisational convenience. Top-level name segments should be organisational identifiers, assigned by the Zsh Development Group and recorded here: top-level identifier organisation -------------------- ------------ x_* reserved for private experimental use zsh The Zsh Development Group (contact: ) Below the top level, naming authority is delegated.