The principle of orthogonal program code indentation

2022-07-06 / 2022-07-19

Whenever I have had occasion to define a coding standard for a language used on a project, the indentation convention has complied with a simple principle. It needed a name, so I now call it the principle of orthogonal program code indentation.

Steven Pemberton, a designer of the ABC language, observed that a programming language should be designed as an interface between humans and computers. The same is true of a particular program text, and how it is indented. ABC, the forerunner to Python, uses semantic indentation. In languages such as C and its derivatives, freely chosen indentation is even more obviously part of the "user interface", albeit as an interface presented by the program authors, unconnected to the semantics of the language.

What constitutes a good user interface is not merely a matter of taste. In the case of an indentation scheme, success can be determined relative to its purpose. That purpose may be thought to be obvious, but it is worth stating. Indentation serves two distinct functions:

first, it shows the control structure by visually relating a statement to those around it;
second, it distinguishes separate statements - in particular, where statements may continue onto another line.

None of the popular standards for languages such as Go, Java, or Python properly accomplish these functions. If this comes as a surprise, it is because we are not sufficiently introspective to be aware of what we are doing as we scan through code: cues are taken not only from indentation but the first words on a line or, e.g., the presence of a colon ':' or continuation character '\' or brace '{' at the end of the previous line.

There is a simple test to determine whether the indentation is fulfilling its responsibilities:

Can a program be written that, from the indentation alone, successfully identifies the separate statements in a program text?

Consider the following potential rules:

Nested statements are indented four spaces.
Continuation lines are indented by eight spaces relative to the first line.

Eight spaces can be regarded merely as a minimum for intra-statement indentation, allowing for flexible indentation of nested data structures. Applying these rules to Python (an easier case: no braces to argue about), code could look something like this:

def long_function_name(var_one,
        var_two, var_three,
        var_four):
    my_list = [
            1, 2, 3,
            4, 5, 6]
    nest_dict = {
            a: [
                (1, 'a'),
                (2, 'b'),
            ],
            b: [
                (3, 'c'),
                (4, 'd'),
            ]}
    if (var_one > len(my_list) and
            nest_dict[a][0] == 1):
        do_something()

Piping the above through a short script:

perl -ne'{/^( *)(\S*.*)$/; if (!$2) {print} else {
    $l = length $1;
    if ($l <= $last + 4) {$last = $l; $c = ($c + 1) % 10};
    $i = $1; $m = $2; $m =~ s/./$c/g; print "$i$m\n"}}'

produces the following output:

1111111111111111111111111111111
        1111111111111111111
        1111111111
    22222222222
            22222222
            22222222
    3333333333333
            3333
                333333333
                333333333
            33
            3333
                333333333
                333333333
            33
    444444444444444444444444444444
            4444444444444444444444
        55555555555555

Note the distinction of the '4' and '5' lines: '5' is the line on which the next (nested in this case) statement after '4' begins.

It is not enough that an indentation convention is unambiguous or capable of being mechanically enforced by some tool. Assess it instead on a test of the result: if an indentation parser cannot distinguish the lines on which statements begin then information that could have been visually conveyed by the program's indentation is missing from it.

Notes

While I suspect that other sources could now be cited for the notion of "programming language as user interface", Steven Pemberton's is the earliest I am aware of: "So big, so bad, so often", ACM Interactions 10(6), doi.org/10.1145/947226.947247 It is also available on his home page.

It may be argued that the popular conventions are good enough, i.e., that it is acceptable that the reader of code should need to read some of the text to distinguish statements. As a particular example, it can be conceded in that, in a C-like language, a closing brace '}' on a separate line is just as visually salient as indentation. In general, however, remembering that minor impediments to cognition can exist without rising to the level of conscious awareness, the response must be: why not have indentation satisfy its full potential? Especially when that can be done with very simple rules.

There are limits to the extent to which the "indentation parser" can successfully ignore the content, e.g., where there are /* block comments */ or multi-line strings within which global indentation rules cannot apply. However, such exceptions are no reason to abandon the principle for the remainder of the code, any more than they are a reason to abandon indentation completely.

The example above does not include an "else" or "switch", etc. The Perl script given above would classify an "else" as a separate statement rather than as part of the "if" statement, so, to be precise, it identifies statement boundaries rather than statements. For a visual indication of the nesting and control flow (as distinct from language syntax), this is appropriate: the first of the two functions of indentation stated above takes priority over the second.

ABC does not have continuation lines: unlike Python, lines cannot be broken; instead, ABC has no line length limit.

The example above is pulled from an old critique of Python's unsimple PEP8 which can still be found here.