Lionel's Perl Programming Guide

Here is my guide to Perl programming. It is not some indisputable truth, just some common sense principles that I try to keep for my programs and modules. Your Mileage May Vary.

Version information: $Id: guide.html,v 1.26 2010/03/05 17:01:04 cons Exp $

Motivation

From Sun's "Code Conventions for the Java Programming Language":

- 80% of the lifetime cost of a piece of software goes to maintenance.
- Hardly any software is maintained for its whole life by the original author.
- Code conventions improve the readability of the software, allowing engineers to understand new code more quickly and thoroughly.

In addition to this, Perl is often seen as a "write-only" language where reading someone else's code is almost impossible. For fun, just try to understand entries from the Obfuscated Perl Contest, for instance Mark Dominus' japh.pl:

 @P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
 @p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f|ord
 ($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
 close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print

Contents


Philosophy

It is always good to start with a bit of wisdom ;-)
A computer program does what you tell it to do, not what you want it to do.
- Greer's Third Law
Plan to throw one away; you will, anyhow.
- Fred Brooks, "The Mythical Man-Month"
You know you've achieved perfection in design, not when you have nothing more to add, but when you have nothing more to take away.
- Antoine de Saint-Exupery
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
- Donald Knuth
It is necessary for us to learn from others' mistakes. You will not live long enough to make them all yourself.
- Hyman G. Rickover


Basic

Always use a version control system such as CVS or Subversion and clearly indicate the version string in usage message, help, documentation...

Delete old code. Don't comment it out or wrap it in an "if (0) { ... }" statement. If you need the code later you can extract it from the version control system.

Program defensively: all return values must always be checked (see "Can't Happen or /*NOTREACHED*/ or Real Programs Dump Core"). Similarly, check the status of any external command that you launch (using $?) as well as the result of any code that you evaluate (using $@).

Always validate external input: command line options, user input, configuration files, environment variables, network data... Your program should always handle gracefully all situations and give meaningful error messages.

Always use else clauses (or equivalent) such as in:

    die("...") unless $action =~ /^[yn]$/i;
    ... some code ...
    if ($action eq "y") {
        ... whatever ...
    } elsif ($action eq "n") {
        ... whatever ...
    } else {
        die("unexpected action: $action");
    }
(here the else catches the case $action being "Y" that we missed ;-)

Put relevant information in error messages, usually:

(see the "GNU Coding Standards" for more information).

If you support long option names, also have a look at the "GNU Coding Standards" for commonly used ones.

Code fragments should not be repeated: factorise them into reusable functions and modules. Each of these should do only one thing but it should do it well!

Take time to design (and document) good data structures, coding is always easier with good structures.

Smart data structures and dumb code works a lot better than the other way around.
- Eric S. Raymond (The Cathedral and the Bazaar)

Define names for all constants, this improves code's readability and modifiability.

Logical blocks (mainly function blocks) should be small enough to fit on one page or screen (i.e. 50 lines, 80 columns). From the "Linux kernel coding style":

Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all know), and do one thing and do that well.
The maximum length of a function is inversely proportional to the complexity and indentation level of that function. So, if you have a conceptually simple function that is just one long (but simple) case-statement, where you have to do lots of small things for a lot of different cases, it is OK to have a longer function.
However, if you have a complex function, and you suspect that a less-than-gifted first-year high-school student might not even understand what the function is all about, you should adhere to the maximum limits all the more closely.
[...] Now, some people will claim that having 8-character indentations makes the code move too far to the right, and makes it hard to read on a 80-character terminal screen. The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

Do not clobber your code with ugly debugging stuff: learn how to use the Perl debugger instead! See the tools section for more information.

Minimise the operating system dependencies to allow your code to work on other systems. Especially watch out for external commands and consult the "Writing portable Perl" document for more information.

Whenever possible, avoid creating temporary files. This is seldomly needed and very difficult to do securely.


Style

Always use a consistent style for code, indentation, names, comments, parenthesis, choice of equivalent solutions (e.g. "x if y" versus "y and x")...

Preferably use the K&R style (see the "Linux kernel coding style", chapter 2).

Use "and or not" for flow control and boolean tests (they are more readable and with a more natural precedence) but use "&& || !" when the value is used:

    if ($foo and ref($foo)) { ... }
    do_something() or die("...");
    $home = $ENV{'HOME'} || $ENV{'LOGDIR'} || (getpwuid($<))[7];

Use parenthesis generously because:

Examples:
                           instead of

    return(1);                                  return 1;
    foo(bar(1), 2);                             foo bar 1, 2;
    print($x, $y);                              print $x, $y;
    fork();                                     fork;

Always put a single space around keywords and operators, after commas (but not before!) and not between function names and arguments, for instance:

                           instead of

    $x = $y + foo(1, 2);                        $x= $y +foo(1 ,2);
    if ($x > 1)                                 if($x>1)
    foo($x);                                    foo ($x);

Avoid useless space, in particular: multiple empty lines, multiple space not used for indentation or alignment and spaces at the end of lines.


Naming

Use a consistent naming style, for instance:

The verbosity of a variable or function name should be proportional to its scope: $i is fine for a loop counter, not for a global variable...

Use a consistent naming convention for nouns, verbs, order, abbreviations... to avoid having at the same time send_packet, pkt_recv, Packet::print...

Private things (functions, variables...) should start with an underscore to clearly mark the fact that they should not be used from the outside and that they may change in future versions. This convention is checked by the B::Lint module.

Avoid using the names of core functions to define new ones (methods are acceptable) unless they are designed to replace their core homonym. This is because:


Documentation

Document as much as possible, ideally before coding but do not forget to keep it up-to-date: often, an incorrect documentation is worse than no documentation at all...

Use the POD (Plain Old Documentation) format (see the perlpod man page) for program and module documentation. You can put it either at the end (after an __END__) to reduce parsing time or interleaved with code to keep code and related documentation close.

Ideally:


Comments

Avoid useless comments, prefer obvious algorithms and naming.

Comments are more needed on data structures. Algorithms should be "self documented":

Basically, avoid comments. If your code needs a comment to be understood, it would be better to rewrite it so it is easier to understand.
- Rob Pike

Comments should normally describe what the code does, maybe why but rarely how...

Excessive comments tend to add noise and increase the number of lines of code, violating the principle of small blocks fitting on one page.

On the other hand, you should use comments to highlight important information (such as a file header or RCS keywords such as $Id$) and to delimit distinct sections of your code.

Always keep code and comments synchronised. Comments that are inaccurate or misleading are worse than no comments at all. If a comment is wrong then correct it or delete it.


Language Features

The "Perl style guide" contains:
Perl is designed to give you several ways to do anything, so consider picking the most readable one.

As a corollary, the more you know Perl and its features, the more you can write readable code.

Assume a recent version of Perl (5.8 at least, released in July 2002):

The perltrap man page says:

The biggest trap of all is forgetting to use the -w switch; see the perlrun manpage. The second biggest trap is not making your entire program runnable under use strict.

Always use the strict pragma everywhere. Exceptions should occur only in isolated blocks such as in:

    ... some code ...
    {
        no strict "refs";
        ... whatever ...
    }
    ... more code ...

Always use the warnings pragma (roughly equivalent to the old -w switch) everywhere. In addition, you may want to transform Perl warnings into fatal errors with something like:

    $SIG{__WARN__} = sub { die($_[0]) };

Most programs and all modules should run fine with the -T switch. For modules, exceptions to this must be documented so that programs running under -T know what they can use or not. All code running under a privileged account (especially network daemons) should definitively use -T. For more information, see the perlsec man page.

Do not put code other than function calls at the top level, a program should look like this:

    use strict;
    use warnings;
    use Some::Module;
    ...
    our($Flags, $TopPath);
    ...
    sub foo ($) {
        ...
    }
    ...
    init();
    main();

Never omit default parameters such as $_ or @_. Omitting these may confuse the reader and the behaviour may be Perl version specific.

Declare global variables using the our declaration. The canonical way to do it with one-line comments per variable is:

    our(
        $Debug,          # numerical debugging flags
        %WordCount,      # number of occurrences of each word
    );

Declare constants using the constant pragma but note that this does not currently work inside hashes:

    use constant FOO => "... whatever ...";
    printf(FOO, "\n");         # ok
    $table{FOO}++;             # not ok, same as $table{"FOO"}++
    $table{FOO()}++;           # ok
    $table{+FOO}++;            # also ok

Use the /x modifier for complex regular expressions (see the perlre man page for more information). This allows more clarity.


Subroutines

Use my() instead of local() whenever possible (common exceptions include filehandles, $_ and typeglobs). On the other hand, one interesting usage of local() is to temporarily alter a global variable such as in:

    {
        local $ENV{"PATH"} = "/bin:/usr/bin";
        system("... something ...");
    }
For more information on my() versus local(), see the perlsub man page.

Minimise the number of variables local to the function. From the "Linux kernel coding style":

Another measure of the function is the number of local variables. They shouldn't exceed 5-10, or you're doing something wrong. Re-think the function, and split it into smaller pieces. A human brain can generally easily keep track of about 7 different things, anything more and it gets confused. You know you're brilliant, but maybe you'd like to understand what you did 2 weeks from now.

Try to declare all the local variables just under the beginning of the function: first the passed arguments and then the remaining variables:

    sub foo ($) {
        my($path) = @_;
        my($x, $y, @result);
        ... code ...
    }

This way, you see all the local variables that can be used inside the function in one place. If there are too many of them, it is a good indication that the function is too complex (as explained above). Alternatively, for some variables with a very limited scope (for instance used only inside an "if" block or as a loop iterator), you could declare them where you need them. The canonical example is a simple loop such as:

    foreach my $i (1 .. 99) {
        ... code ...
    }

Always use prototypes (see the perlsub man page) except for methods (see below). To call functions, use "foo()" instead of "&foo()" that bypasses prototype checking.

Use "$subref->(@args)" to dereference functions references. This does not check prototypes but is consistent with the method invocation recommended further down ("$obj->meth(@args)").

Group similar functions and order them so that prototype checking can work (this order is good for the human reader too!). Use forward declarations only if needed.

Use "%opt argument parsing" style (like in the CGI module) for flexibility. For instance:

    $query = CGI->new();
    $query->use_named_parameters(1);
    $field = $query->radio_group(
        "name"    => "OS",
        "values"  => ["Unix", "Windows", "Macintosh"],
        "default" => "Unix",
    );

Use return consistently:


Packages and Modules

Never use .ph files. They're historical, very system dependent, sometimes buggy (because of bugs in h2ph), not available everywhere and their functionality can usually be found in standard modules such as Socket or POSIX.

Always prefer use to require because it enables compile-time checks. require is really only useful when conditionally loading an expensive module which is not always needed.

Always use the right file path and package name. A module named Foo::Bar is expected to be located in a file named Foo/Bar.pm. Also, small sub-packages that are not supposed to be used directly could be included in a parent package but in this case the sub-package names should match the parent package. For instance, the file Foo/Bar.pm could hold the top package Foo::Bar as well as sub-packages like Foo::Bar::Baz.

Always define the global variable $VERSION in your modules, for instance (with RCS):

    $VERSION = sprintf("%d.%02d", q$Revision: 1.26 $ =~ /(\d+)\.(\d+)/);
Note: a global "version" variable should rather be named $Version (or VERSION if we define it as a constant) but we use $VERSION to be compliant with use, see the perlfunc man page:
If the VERSION argument is present between Module and LIST, then the use will call the VERSION method in class Module with the given version as an argument. The default VERSION method, inherited from the Universal class, croaks if the given version is larger than the value of the variable $Module::VERSION.

Avoid namespace pollution in your modules, i.e. export as little as possible by default, ideally nothing. See the Exporter man page for @EXPORT versus @EXPORT_OK discussion.

Use the minimum number of Perl external modules (e.g. from CPAN) because of bugs (that we do not control), inconsistent behaviour, performance penalty and inconsistent error handling. When you do use them, always specify the symbols you need to import to avoid unnecessary namespace pollution. For instance use:

    use Socket qw(inet_aton SOCK_STREAM);
instead of
    use Socket;

Understand the different types of special blocks (BEGIN, CHECK, INIT and END, see the perlmod man page) and use them wisely. Don't forget that BEGIN and CHECK blocks are executed at compilation time so they are executed even when simply checking the syntax using "perl -c"!


Object Oriented

Methods cannot take advantage of Perl's prototypes so they should explicitly check that they are called with the right number of arguments. For instance:

    sub bark : method {
        my($self, $count) = @_;
        die("Usage: \$obj->bark(<count>)\n") unless @_ == 2;
        ... bark $count times ...
    }

Always declare methods with the "method" attribute (see the attributes man page) to differentiate them from subroutines that lack prototype declaration.

When using a single method to get and set an object attribute, always use the length of @_ to detect what to do (i.e. get or set), e.g.:

    sub color : method {
        my($self, $color) = @_;
        $self->{"_color"} = $color if @_ > 1;
        return($self->{"_color"});
    }

For consistency, always use "$obj->meth(@args)" instead of "meth $obj @args", including when creating a new object with "Pkg->new(@args)". This also ensures that the code will be called like a method, i.e. with the extra first parameter.


Optimisation

In most cases, do not optimise: speed or memory footprint is almost always less important than clarity and maintainability.

Rules of Optimisation:
   Rule 1: Don't do it.
   Rule 2 (for experts only): Don't do it yet.

- M.A. Jackson

Use a profiling tool such as Devel::DProf or a benchmarking tool such as Benchmark to find out where optimisation is really needed.

Never use $`, $& or $'. See the perlvar man page:

The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches. See the Devel::SawAmpersand module from CPAN for more information.

Avoid using something that uses the heavy modules like DynaLoader or Config.

Minimise the number of external commands called, many fork+exec may be expensive.

Unless really needed, delay the initialisation of large or heavy data structures until the time they are really needed. The usual approach is to have a global variable indicating if the initialisation took place or not and to put before using the data a line like:

    _init_my_data() unless $_MyDataReady;


Tools

Learn how to use the (very powerful and flexible) Perl debugger (see the perldebug man page). Try for instance:

    # setenv PERLDB_OPTS "NonStop frame=2 AutoTrace"
    # perl -d program.pl

For the Emacs fans, there is also an integrated support for the Perl debugger via the GUD (try M-x perldb).

If you're not afraid of heavy tools, ddd supports Perl.

For benchmarking (i.e. finding which algorithm is faster), use the standard Benchmark module.

For profiling (i.e. finding bottlenecks in your code), use the standard Devel::DProf module. See also its related program to display profiling data: dprofpp.

There is also an experimental lint module: B::Lint. To use it:

    # perl -MO=Lint foo.pl


Security

Last but not least... Make sure your code does not contain security holes. This class of bugs can have disastrous consequences.

When it makes sense, use Perl's taint mode. For more information (and other security recommendations), see the "Perl security guide".

For complementary information, see:


References

Here are some documents really worth reading:

And also, for reference only (as the EDH guide is quite different):


Lionel Cons, 5-Mar-2010.