Yapps is a lightweight LL(1) parser generator that produces human-readable parsers written in Python. It's pretty neat and it generally does what you would expect (and want). Amit Patel has made it available under a free licence but seems to have stopped maintaining it. This is a quick reference; it also includes some details not in the manual.
Version info and URL from yapps2.py in yapps2.zip:
# Yapps 2.0 – yet another python parser system # Amit J Patel, January 1999 # See http://theory.stanford.edu/~amitp/Yapps/ for documentation and updates ... # v.2.0.4 changes (July 2003) ...
Use yapps2.py to generate a parser from a grammar file:
[michael yapps2 9]$ python yapps2.py examples/expr.g Input Grammar: examples/expr.g Output File: examples/expr.py [michael yapps2 10]$
The generated file includes scanner and parser classes
derived from the base classes Scanner
and Parser
in yappsrt.py – the “Yapps 2.0 Runtime” which must be
available in the same directory;
it also defines SyntaxError
and NoMoreTokens
exceptions and functions for printing error messages.
An example based on calc.g included in the distribution is a calculator that supports interaction like:
>>> set x 2 x = 2 >>> x * 4 = 8 >>> let x = 1 in x * 4 = 4 >>> x * 4 = 8 >>> 3 * (6 + 4) = 30 >>>
The grammar file for this is:
#!/usr/bin/env python # ... any other code to be copied straight over – # typically variables and functions invoked by code # attached to the rules of the parser: globalvars = {} # We will store the calculator's variables here def lookup(map, name): "get variable value. map:local variables; name: variable id" for x,v in map: if x==name: return v if name not in globalvars.keys(): print 'Undefined:', name return globalvars.get(name, 0) %% # Parser section after the '%%' separator # (comments in this section are not copied to the .py file) parser Calculator: # Without this option, Yapps produces a context-sensitive # scanner: the parser tells the scanner what tokens it # expects – so, e.g., a keyword could be read in as an # identifier where the keyword token wasn't expected. # However, if a context-sensitive scanner is not needed # then it's probably better for debugging to have the # simpler context-insensitive scanner. option: "context-insensitive-scanner" # 'ignore' really means 'treat as token separators' # Note all these strings are regular expressions. ignore: '[ \r\t\n]+' ignore: '#.*?\r?\n' # line comment token NUM: '[0-9]+' token VAR: '[a-zA-Z_]+' # Even if it doesn't appear in the rules, # an END token is usually needed: otherwise, with most # grammars, the scanner will keep trying to read beyond # the end of the string. token END: '$' # The goal production is specified when the parser is # invoked (i.e., it doesn't have to be named 'goal' # or be the first one listed). # The END token usually needs to be specified in the # goal rule. (In fact, for reasons to do with the # recursive nature of this grammar, it's sufficient # for END to be defined as a token – but it does no # harm to include it in the goal rule too.) rule goal: goal2 END # Rules of the form NonTerminal<<Parameters>>: ... # allow one or more attributes to be passed in. # In this case, the attribute is the list of calculator's # local variables defined using the 'let' alternative of # the 'term' production below; there are no locals to # begin with so we pass an empty list to expr. rule goal2: expr<<[]>> # Only a single statement can be included in each # {{ code fragment }} attached to the grammar. # The return value of rule 'expr' is in 'expr'. {{ print '=', expr }} # This could be omitted – 'goal' doesn't use # the return value. {{ return expr }} # 'set' becomes an anonymous token for the scanner; # it is added at the beginning of the list of tokens # and so takes precedence over VAR above | "set" VAR expr<<[]>> # The text of the terminal symbol VAR is in VAR {{ globalvars[VAR] = expr }} {{ print VAR, '=', expr }} {{ return expr }} # V holds the calculator's local variables (see comment above). rule expr<<V>>: factor<<V>> {{ n = factor }} ( "[+]" factor<<V>> {{ n = n+factor }} | "-" factor<<V>> {{ n = n-factor }} )* {{ return n }} rule factor<<V>>: term<<V>> {{ v = term }} ( "[*]" term<<V>> {{ v = v*term }} | "/" term<<V>> {{ v = v/term }} )* {{ return v }} rule term<<V>>: NUM {{ return atoi(NUM) }} | VAR {{ return lookup(V, VAR) }} | r"\(" expr<<V>> r"\)" {{ return expr }} | "let" VAR "=" expr<<V>> {{ V = [(VAR, expr)] + V }} "in" expr<<V>> {{ return expr }} %% # If is second '%%' separator is present then the first one # must be too, even if there's no code before the parser. # Anything here is copied straight to the .py file after # the generated code. # If this section (and the '%%') is omitted, Yapps inserts # test code. if __name__=='__main__': print 'Welcome to the calculator sample for Yapps 2.0.' print ' Enter either "<expression>" or "set <var> <expression>",' print ' or just press return to exit. An expression can have' print ' local variables: let x = expr in expr' # We could have put this loop into the parser, by making the # `goal' rule use (expr | set var expr)*, but by putting the # loop into Python code, we can make it interactive (i.e., enter # one expression, get the result, enter another expression, etc.) while 1: try: s = raw_input('>>> ') except EOFError: break if not strip(s): break parse('goal', s) print 'Bye.'
[ optional ] ( oneOrMore )+ ( zeroOrMore )*A null production is just a blank (normally after the last | in a set of alternatives).
token LT: '<' token EQ: '=' token LTEQ: '<='For tokens of equal length, the first one matched takes precedence.
rule stmt: expr | "if" stmtis fine but the following won't work (and Yapps issues no warning):
rule stmt: expr rule stmt: "if" stmt
return_value
even if it hasn't been assigned (i.e.,
when an exception has been caught). Fix:
163c163 < return_value = getattr(parser, rule)() --- > return getattr(parser, rule)() 173d172 < return return_value
< | "\\(" expr "\\)" {{ return expr }} --- > | "\\(" expr<<V>> "\\)" {{ return expr }}
s
by the
code that searches for the DIVIDER
).