A Quick-start Guide to Stratego

Signature

A signature file defines a module containing constructor definitions for the various ATerm nodes that your program(s) will manipulate. The signature file name ends in .str. It is generated automatically from the SDF grammar. Study the signature file to make sure you understand it.

A signature file starts with a few keywords. For instance, your PidginC.str signature might start as:

module PidginC

signature
  sorts
   ...
  constructors
   ...

The constructors section consists of a sequence of type definitions, one for each node, of the form:

<node_label> : <child_1_type> *  ... <child_n_type> -> <node_type>

“Node label” is an arbitrary string that you want to use as a label for that particular node type. The convention is to use a label starting with an upper case (Stratego will generate a warning if you depart from this convention). For example you might want to use the label Const for constant values or Expr for nodes representing expression subtrees, etc. These are the labels that you specify inside the cons constructors in your SDF file.

All symbols to the right of the “:” are grammar symbols. These symbols are merely place-holders for recursively defining the structure of the ATerm tree and will never occur in any ATerms. However, to ensure correctness of your programs, a node with the name node_label must take n arguments of the type listed above. The resulting node is of type node_type and can be used wherever node_type is allowed. The sorts section simply lists all the grammar symbols used in the signature file.

Notice that for the signature of a language to be complete all grammar symbols that occur on the right of the arrow (->) must eventually be defined in terms of Strings, the most basic type understood by Stratego. Stratego also understands the constructor List.

A Simple Stratego Program

Once a signature file has been created it can be used inside a Stratego program that manipulates the language whose ATerms are defined by the signature. A Stratego program consists of a list of strategies and a main strategy, with the default name main. At any point in the program Stratego has a notion of a current ATerm. To start with, the current ATerm is the root of the input tree. In other words, the current ATerm is the entire input tree.

Let's say that we want to write a Stratego program called duplicator. Here is one way we could start this program:

module duplicator
  imports libstratego-lib PidginC

strategies
  main =
    io-wrap(duplicate)

    duplicate =
 ...

The first line defines the name of the module, which would usually also define the name of the file—this program would reside in duplicator.str. The next line instructs Stratego to import the standard Stratego library and the signature for PidginC that your makefile created (assuming it is in the current directory, or in a directory where Stratego can find it). The main section of the file is strategies. Here the first strategy, main, uses an idiom to read data from stdin, apply the strategy duplicate to it, and emit the result to stdout. Recall that Stratego is a program transformation tool, so a strategy usually specifies a rewriting. A strategy may succeed or fail. If the top-level strategy (i.e., main) fails the Stratego program exits with an error. If it succeeds, the program outputs the transformed tree.

Each strategy uses strategy combinators to perform non-trivial rewriting. Examples of strategy combinators include:

;: A ; B means apply the strategy A, and if it succeeds, apply B, to the current ATerm.
<+: A <+ B means apply the strategy A, and it if fails, apply B, to the current ATerm.

There are several other combinators that you can find in the Stratego documentation. For this quick-start we will only use the above two.

Here is a complete Stratego program that accepts a program in PidginC with exactly one function and duplicates that function, by simply replicating it syntactically.

module duplicator
  imports libstratego-lib PidginC

strategies
  main =
    io-wrap(duplicate)

  duplicate =
    {rettype, fn, body, vardecls:
       ?Program([Function(rettype, fn, vardecls, body)])
     ; !Program([Function(rettype, fn, vardecls, body),
                 Function(rettype, fn, vardecls, body)])
    }

The above program can be compiled using the Stratego compiler strc:

$ strc -i duplicator.str -la stratego-lib

If the above command succeeds, it will create an executable called duplicator in the current directory. The “-la” flag specifies the standard Stratego library to be linked in. The program can now be executed on the command line:

$ cat example.c | ./parse_PidginC | ./duplicator | pp-aterm

Here pp-aterm is the ATerm pretty-printer, which is part of the Stratego toolkit and parse_PidginC is a script that invokes the PidginC parser.

In the above Stratego program, the duplicate strategy defines four local variables, rettype, fn, body, and vardecls. The scope of the variables is limited by enclosing the body of the strategy within curly braces, otherwise the default scope of the variables is global, which is usually not what you want. The strategy uses the sequencing strategy combinator (;). The first part, starting with “?” specifies a pattern. It matches an ATerm with the format specified by the pattern, matching the node names exactly and matching the variables with arbitrary nodes. If the match succeeds the variables are bound to the subtrees that correspond to the variable positions. The second part, starting with a “!”, specifies an instantiation. Usually, instantiation is the last part of a strategy and is, in a sense, its “output”. The instantiation uses the variables bound earlier to duplicate the single function in the program.

Can you see how you could write the duplicate strategy even more simply?

Extending the Program

Suppose we want to modify the above program to duplicate each individual statement in the body of the function, instead of simply replicating the function. In order to do that we need to traverse the body of the function, match each individual statement and create a copy of it. Recall that the body of a function consists of two lists, a list of variable declarations, and a list of statements. It is the second list that we want to duplicate. We could write a strategy that takes a list of statements and duplicates each statement in it. However, we need not restrict ourselves to lists of statements. In fact, we could write a strategy that operates on any arbitrary list and duplicates it.

dup-list =
  {head, tail:
     ?[head | tail]
   ; !<conc>([head, head], <dup-list>tail)
  }

The dup-list strategy matches a list that is broken into a head and tail. The vertical bar (|) within square-brackets is the Stratego syntax to split a list into its first element and the rest of the list. (This is similar to LISP car and cdr.) The conc strategy is a built-in Stratego strategy that concatenates a pair of lists. We see a new syntax here, that of a strategy name enclosed in angular brackets (< >). This syntax simply means applying the named strategy to the ATerm that immediately follows it. Here, the conc strategy is applied to the unnamed pair of lists that immediately follows it—this is different from passing arguments, which we will see shortly. Similarly, dup-list is invoked recursively on the rest of the list.

However, the way it is written, the strategy is not entirely correct. What is missing? Here is the complete Stratego program.

module duplicator
  imports libstratego-lib PidginC

strategies
  main =
    io-wrap(duplicate)

  duplicate =
    {rettype, fn, body, vardecls:
       ?Program([Function(rettype, fn, vardecls, body)])
     ; !Program([Function(rettype, fn, vardecls, <dup-body-stmts>body)])
    }

  dup-body-stmts =
    {decls, stmts:
       ?Block(decls, stmts)
     ; !Block(decls, <dup-list>stmts)
    }

  dup-list =
    {head, tail:
       ?[head | tail]
     ; !<conc>([head, head], <dup-list>tail)
    }
    <+ {?[]; ![]}

It is possible to achieve some syntactic cleanliness by replacing some of the strategies by rules. The two programs are entirely equivalent, bringing home the point that rules are simply syntactic sugar for certain kinds of strategies. The variables used in rules are automatically local.

module duplicator
  imports libstratego-lib PidginC

strategies
  main =
    io-wrap(Duplicate)

  dup-list =
    {head, tail:
       ?[head | tail]
     ; !<conc>([head, head], <dup-list>tail)
    }
    <+ {?[]; ![]}

rules
  Duplicate:
    Program([Function(rettype, fn, vardecls, body)])
    -> Program([Function(rettype, fn, vardecls, <DuplicateBodyStmts>body)])

  DuplicateBodyStmts:
    Block(decls, stmts) -> Block(decls, <dup-list>stmts)

As a final cleaning-up step, the rule DuplicateBodyStmts can, in fact, be folded into Duplicate and dup-list can be replaced by a library strategy called mapconcat that takes a strategy (or a rule) as an argument (see “Final Remarks” on argument passing syntax). The strategy mapconcat operates on a list and the strategy passed in as the argument is expected to produce a list from each element. The resulting lists are then merged into a single list. All we need to do is pass a strategy (or a rule) that duplicates an element. Moreover, Stratego allows writing anonymous rules or lambda rules, enclosed within back-slashes (\). With this the duplicator program reduces to just a few lines if Duplicate is also passed as a lambda rule to io-wrap. We do need to include an additional module to use mapconcat.

module duplicator
  imports libstratego-lib collection/list/common PidginC

strategies
  main =
    io-wrap(
      \ Program([Function(r, f, v, Block(d, b))])
        -> Program([Function(r, f, v, Block(d, <mapconcat(\s -> [s, s]\)>b))]) \
    )

How would you modify the program to duplicate all simple statements, including those inside compound statements (i.e., for loops and if-then-else statements) without duplicating the compound statements? How would you modify the program to handle an arbitrary number of functions?

Final Remarks

Strategies can also accept arguments. The arguments may be zero or more strategies (or rules) and zero or more ATerms. To distinguish between strategy arguments and ATerm arguments, the parameters are separated by a vertical bar (|). For example, my_strategy(s1, s2 | a1, a2) is a call to my_strategy with strategy arguments s1 and s2, and ATerm arguments a1 and a2. If there are no ATerm arguments the vertical bar may be omitted. If there is no argument at all the parentheses may also be omitted.
Pay attention to the formatting and resist the temptation to use the semicolon (;) operator as the C-like statement terminator. It is not a statement terminator in Stratego.
An Emacs major mode file for Stratego is available. Unfortunately, it does not implement automatic indentation.
You may find the Stratego library reference useful.
In the Stratego tutorial you may want to focus on Section III for this assignment.

B629, Arun Chauhan, Department of Computer Science, Indiana University