Yapps is a lightweight LL(1) parser generator that produces human-readable parsers written in Python. It's pretty neat and it generally does what you would expect (and want). Amit Patel has made it available under a free licence but seems to have stopped maintaining it. This is a quick reference; it also includes some details not in the manual.
Version info and URL from yapps2.py in yapps2.zip:
# Yapps 2.0 – yet another python parser system # Amit J Patel, January 1999 # See http://theory.stanford.edu/~amitp/Yapps/ for documentation and updates ... # v.2.0.4 changes (July 2003) ...
Use yapps2.py to generate a parser from a grammar file:
[michael yapps2 9]$ python yapps2.py examples/expr.g Input Grammar: examples/expr.g Output File: examples/expr.py [michael yapps2 10]$
The generated file includes scanner and parser classes
derived from the base classes Scanner and Parser
in yappsrt.py – the “Yapps 2.0 Runtime” which must be
available in the same directory;
it also defines SyntaxError and NoMoreTokens
exceptions and functions for printing error messages.
An example based on calc.g included in the distribution is a calculator that supports interaction like:
>>> set x 2 x = 2 >>> x * 4 = 8 >>> let x = 1 in x * 4 = 4 >>> x * 4 = 8 >>> 3 * (6 + 4) = 30 >>>
The grammar file for this is:
#!/usr/bin/env python
# ... any other code to be copied straight over –
# typically variables and functions invoked by code
# attached to the rules of the parser:
globalvars = {} # We will store the calculator's variables here
def lookup(map, name):
"get variable value. map:local variables; name: variable id"
for x,v in map:
if x==name: return v
if name not in globalvars.keys():
print 'Undefined:', name
return globalvars.get(name, 0)
%%
# Parser section after the '%%' separator
# (comments in this section are not copied to the .py file)
parser Calculator:
# Without this option, Yapps produces a context-sensitive
# scanner: the parser tells the scanner what tokens it
# expects – so, e.g., a keyword could be read in as an
# identifier where the keyword token wasn't expected.
# However, if a context-sensitive scanner is not needed
# then it's probably better for debugging to have the
# simpler context-insensitive scanner.
option: "context-insensitive-scanner"
# 'ignore' really means 'treat as token separators'
# Note all these strings are regular expressions.
ignore: '[ \r\t\n]+'
ignore: '#.*?\r?\n' # line comment
token NUM: '[0-9]+'
token VAR: '[a-zA-Z_]+'
# Even if it doesn't appear in the rules,
# an END token is usually needed: otherwise, with most
# grammars, the scanner will keep trying to read beyond
# the end of the string.
token END: '$'
# The goal production is specified when the parser is
# invoked (i.e., it doesn't have to be named 'goal'
# or be the first one listed).
# The END token usually needs to be specified in the
# goal rule. (In fact, for reasons to do with the
# recursive nature of this grammar, it's sufficient
# for END to be defined as a token – but it does no
# harm to include it in the goal rule too.)
rule goal: goal2 END
# Rules of the form NonTerminal<<Parameters>>: ...
# allow one or more attributes to be passed in.
# In this case, the attribute is the list of calculator's
# local variables defined using the 'let' alternative of
# the 'term' production below; there are no locals to
# begin with so we pass an empty list to expr.
rule goal2: expr<<[]>>
# Only a single statement can be included in each
# {{ code fragment }} attached to the grammar.
# The return value of rule 'expr' is in 'expr'.
{{ print '=', expr }}
# This could be omitted – 'goal' doesn't use
# the return value.
{{ return expr }}
# 'set' becomes an anonymous token for the scanner;
# it is added at the beginning of the list of tokens
# and so takes precedence over VAR above
| "set" VAR expr<<[]>>
# The text of the terminal symbol VAR is in VAR
{{ globalvars[VAR] = expr }}
{{ print VAR, '=', expr }}
{{ return expr }}
# V holds the calculator's local variables (see comment above).
rule expr<<V>>: factor<<V>> {{ n = factor }}
( "[+]" factor<<V>> {{ n = n+factor }}
| "-" factor<<V>> {{ n = n-factor }}
)* {{ return n }}
rule factor<<V>>: term<<V>> {{ v = term }}
( "[*]" term<<V>> {{ v = v*term }}
| "/" term<<V>> {{ v = v/term }}
)* {{ return v }}
rule term<<V>>:
NUM {{ return atoi(NUM) }}
| VAR {{ return lookup(V, VAR) }}
| r"\(" expr<<V>> r"\)" {{ return expr }}
| "let" VAR "=" expr<<V>> {{ V = [(VAR, expr)] + V }}
"in" expr<<V>> {{ return expr }}
%%
# If is second '%%' separator is present then the first one
# must be too, even if there's no code before the parser.
# Anything here is copied straight to the .py file after
# the generated code.
# If this section (and the '%%') is omitted, Yapps inserts
# test code.
if __name__=='__main__':
print 'Welcome to the calculator sample for Yapps 2.0.'
print ' Enter either "<expression>" or "set <var> <expression>",'
print ' or just press return to exit. An expression can have'
print ' local variables: let x = expr in expr'
# We could have put this loop into the parser, by making the
# `goal' rule use (expr | set var expr)*, but by putting the
# loop into Python code, we can make it interactive (i.e., enter
# one expression, get the result, enter another expression, etc.)
while 1:
try: s = raw_input('>>> ')
except EOFError: break
if not strip(s): break
parse('goal', s)
print 'Bye.'
[ optional ]
( oneOrMore )+
( zeroOrMore )*
A null production is just a blank (normally after the last | in
a set of alternatives).
token LT: '<'
token EQ: '='
token LTEQ: '<='
For tokens of equal length, the first one matched takes
precedence.
rule stmt: expr | "if" stmt
is fine but the following won't work (and Yapps issues no warning):
rule stmt: expr
rule stmt: "if" stmt
return_value even if it hasn't been assigned (i.e.,
when an exception has been caught). Fix:
163c163 < return_value = getattr(parser, rule)() --- > return getattr(parser, rule)() 173d172 < return return_value
< | "\\(" expr "\\)" {{ return expr }}
---
> | "\\(" expr<<V>> "\\)" {{ return expr }}
s by the
code that searches for the DIVIDER).