Notes on Yapps 2 Python Parser Generator

Michael Breen

Yapps is a lightweight LL(1) parser generator that produces human-readable parsers written in Python. It's pretty neat and it generally does what you would expect (and want). Amit Patel has made it available under a free licence but seems to have stopped maintaining it. This is a quick reference; it also includes some details not in the manual.

Version info and URL from in

  # Yapps 2.0 – yet another python parser system
  # Amit J Patel, January 1999
  # See for documentation and updates
  # v.2.0.4 changes (July 2003)

Use to generate a parser from a grammar file:

  [michael yapps2 9]$ python examples/expr.g
  Input Grammar: examples/expr.g
  Output File: examples/
  [michael yapps2 10]$

The generated file includes scanner and parser classes derived from the base classes Scanner and Parser in – the “Yapps 2.0 Runtime” which must be available in the same directory; it also defines SyntaxError and NoMoreTokens exceptions and functions for printing error messages.

Annotated Example

An example based on calc.g included in the distribution is a calculator that supports interaction like:

  >>> set x 2
  x = 2
  >>> x * 4
  = 8
  >>> let x = 1 in x * 4
  = 4
  >>> x * 4
  = 8
  >>> 3 * (6 + 4)
  = 30

The grammar file for this is:

  #!/usr/bin/env python
  # ... any other code to be copied straight over –
  # typically variables and functions invoked by code
  # attached to the rules of the parser:
  globalvars = {}       # We will store the calculator's variables here
  def lookup(map, name):
      "get variable value. map:local variables; name: variable id"
      for x,v in map:
          if x==name: return v
      if name not in globalvars.keys():
          print 'Undefined:', name
      return globalvars.get(name, 0)
  # Parser section after the '%%' separator
  # (comments in this section are not copied to the .py file)
  parser Calculator:
      # Without this option, Yapps produces a context-sensitive
      # scanner: the parser tells the scanner what tokens it
      # expects – so, e.g., a keyword could be read in as an
      # identifier where the keyword token wasn't expected.
      # However, if a context-sensitive scanner is not needed
      # then it's probably better for debugging to have the
      # simpler context-insensitive scanner.
      option:  "context-insensitive-scanner"
      # 'ignore' really means 'treat as token separators'
      # Note all these strings are regular expressions.
      ignore:    '[ \r\t\n]+'
      ignore:    '#.*?\r?\n'    # line comment
      token NUM: '[0-9]+'
      token VAR: '[a-zA-Z_]+'
      # Even if it doesn't appear in the rules,
      # an END token is usually needed: otherwise, with most
      # grammars, the scanner will keep trying to read beyond
      # the end of the string.
      token END: '$'
      # The goal production is specified when the parser is
      # invoked (i.e., it doesn't have to be named 'goal'
      # or be the first one listed).
      # The END token usually needs to be specified in the
      # goal rule. (In fact, for reasons to do with the
      # recursive nature of this grammar, it's sufficient
      # for END to be defined as a token – but it does no
      # harm to include it in the goal rule too.)
      rule goal: goal2 END
      # Rules of the form  NonTerminal<<Parameters>>: ...
      # allow one or more attributes to be passed in.
      # In this case, the attribute is the list of calculator's
      # local variables defined using the 'let' alternative of
      # the 'term' production below; there are no locals to
      # begin with so we pass an empty list to expr.
      rule goal2: expr<<[]>>
                    # Only a single statement can be included in each
                    # {{ code fragment }} attached to the grammar.
                    # The return value of rule 'expr' is in 'expr'.
                    {{ print '=', expr }}
                    # This could be omitted – 'goal' doesn't use
                    # the return value.
                    {{ return expr }}
                # 'set' becomes an anonymous token for the scanner;
                # it is added at the beginning of the list of tokens
                # and so takes precedence over VAR above
                | "set" VAR expr<<[]>>
                    # The text of the terminal symbol VAR is in VAR
                    {{ globalvars[VAR] = expr }}
                    {{ print VAR, '=', expr }}
                    {{ return expr }}
      # V holds the calculator's local variables (see comment above).
      rule expr<<V>>:   factor<<V>>         {{ n = factor }}
                       ( "[+]" factor<<V>>  {{ n = n+factor }}
                       |  "-"  factor<<V>>  {{ n = n-factor }}
                       )*                   {{ return n }}
      rule factor<<V>>: term<<V>>           {{ v = term }}
                       ( "[*]" term<<V>>    {{ v = v*term }}
                       |  "/"  term<<V>>    {{ v = v/term }}
                       )*                   {{ return v }}
      rule term<<V>>:
                   NUM                      {{ return atoi(NUM) }}
                 | VAR                      {{ return lookup(V, VAR) }}
                 | r"\(" expr<<V>> r"\)"    {{ return expr }}
                 | "let" VAR "=" expr<<V>>  {{ V = [(VAR, expr)] + V }}
                   "in" expr<<V>>           {{ return expr }}
  # If is second '%%' separator is present then the first one
  # must be too, even if there's no code before the parser.
  # Anything here is copied straight to the .py file after
  # the generated code.
  # If this section (and the '%%') is omitted, Yapps inserts
  # test code.
  if __name__=='__main__':
      print 'Welcome to the calculator sample for Yapps 2.0.'
      print '  Enter either "<expression>" or "set <var> <expression>",'
      print '  or just press return to exit.  An expression can have'
      print '  local variables:  let x = expr in expr'
      # We could have put this loop into the parser, by making the
      # `goal' rule use (expr | set var expr)*, but by putting the
      # loop into Python code, we can make it interactive (i.e., enter
      # one expression, get the result, enter another expression, etc.)
      while 1:
          try: s = raw_input('>>> ')
          except EOFError: break
          if not strip(s): break
          parse('goal', s)
      print 'Bye.'

Further Tips