Human factors alphabets for higher base numbers

2022-03-30 2022-04-13

The alphabets for higher base numbers are generally chosen either arbitrarily or to satisfy certain constraints, such as to avoid characters that may cause a problem in the context of a URL or a filename. Where humans are concerned, constraints can include the need to avoid confusable characters and collisions with words, as in case of the base 20 alphabet used for Open Location Codes.

What is not generally seen is a choice of alphabet that makes it easier for humans to convert the digits to decimal values. This is not surprising, given that

other constraints can conflict with this goal;
in many contexts, such as base 64 encoding, there is little or no value in being able to convert a number back to decimal;
where there could be value in making mental conversion easier, the operation would remain difficult due to the need to do division by successive powers of the operative radix.

An alphabet that facilitates mnemonic conversion can be useful, however. For example, if base 60 is used for a clock then the minutes and seconds are each a single digit (so no long division). In such a context, what would be a good alphabet? This question arose for a personal project (not a clock), and led to the definition of two alphabets described below.

One approach is form an alphabet from a few simple rules, so that it is easy to remember how to reconstruct the alphabet. This criterion is fulfilled by many existing alphabets. For example, one could take as a base 60 alphabet the first 60 digits of the base 64 alphabet for which one needs only to remember that the digits are the concatenation of

the uppercase Latin letters 'A'-'Z' (decimal values 0-25)
the lowercase letters 'a'-'z' (26-51)
the Arabic numerals '0'-'7' (52-59)

The problem is that while the alphabet is easy to reconstruct, mental reconstruction of the alphabet is not something one would wish to need to do in order to convert a particular number.

The Human Factors Decade-Congruent Alphabet

An improvement is the first of the two alphabets presented here: the Human Factors Decade-Congruent Alphabet (HFDCA). The following table presents the HFDCA in sequential decades - so 'a' is 10, 't' is 29, etc.

    The Human Factors Decade-Congruent Alphabet

    0  0123456789
    1  abcdefghij
    2  klmnopqrst
    3  ABCDEFGHIJ
    4  KLMNOPQRST
    5  vwxyzVWXYZ

This alphabet strikes a good compromise, in that the construction remains simple while it is also considerably easier to apply without full reconstruction, since each uppercase letter has the same value as the corresponding lowercase letter plus 20. As a nod to reducing collisions with words, it is the vowel 'u' that is omitted (rather than 'z') as it also occurs at a natural break in the sequence.

A base 60 HFDCA digital clock: (enable JavaScript)

However, to interpret a given HFDCA digit, one is still likely to resort to counting from the start of the relevant decade, e.g., "K is 40, L is 41, ...".

The Human Factors Decimal Morphology Alphabet

An alternative is to make the individual digits more memorable, which is the approach taken for the Human Factors Decimal Morphology Alphabet (HFDMA).

    The Human Factors Decimal Morphology Alphabet

    0  0123456789
    1  cjzwfsbvxq
    2  nltmhgdrkp
    3  CJZWFSBVXQ
    4  NLTMHGDRKP
    5  uiyeaUIYEA

Here, the points to observe for recall are:

The digits for decimal values 10-19 have the same shape as the corresponding numerals 0-9 - either directly ('2', '5', '6', '9'), or, in the case of '3', '4', and '7', after a quarter turn clockwise (based on an "open" 4).
Exceptions are 'c', which is '0' with the right side clipped, and 'x', which is a cropped '8'.
The choice of letters for the next decade (20-29) is also based on shape but inferior in involving a rotation other than 90 degrees clockwise, or a mirror reversal ('6', '9'); 'k' is a deformed 'x', and 'l' is inferior to 'j' in that it is potentially confusable with '1'.
Digits 't' (22) and 'g' (25) may be learned as exceptions; alternatively, one can, e.g., imagine rotating '5' a half turn and closing the rounded part.
For multiples of 10, 'c' precedes 'n' in the Latin alphabet, which in turn precedes 'u'. All are clipped '0's; 'o' is omitted because capital 'O' is the character most likely to be confused with another digit (zero).
The vowels are finally used in the last decade. Here 'i' is similar to '1' and 'E' is a mirror image of '3', but it might be best remembered by a mnemonic phrase (e.g., "Up in your eerie attic").

A base 60 HFDMA digital clock: (enable JavaScript)

Where it is necessary to exclude digits confusable with '1', a modified version of this alphabet replaces 'l' (lowercase 'L') with '~' and 'I' (capital 'i') with '_'.

Number bases other than 60

For base X numbers, X < 60, the first X digits of the alphabet are used (so, formally, hexadecimal uses the HFDCA). Preferred letters come first, particularly in the HFDMA, for which numbers in any base up to base 50 will avoid the vowels, and thus collisions with most words.

However, for some bases above 20, other constraints might make adjustment appropriate. For example, a base 30 alphabet could use the 3x-decade digits ("CJZ...") for the decimal values 20-29. This would avoid both lowercase 'l' as well as the need to remember the weaker mnemonic for the digits "nlt...".

For X > 60, the definition of both alphabets is extended to 64 digits:

HFDCA values 60-63: _ ~ u U
HFDMA values 60-63: _ ~ o O

A HF base 64 alphabet seems unlikely to have much application, although a time nominally in base 60 might occasionally have an underscore '_' for a leap second.