A Quick-start Guide to Stratego

Signature

A signature file defines a module containing constructor definitions for the various ATerm nodes that your program(s) will manipulate. The signature file name ends in .str.

A signature file starts with a few keywords. For instance, your nC.str might start as:

module nC

signature
  sorts
   ...
  constructors
   ...

The constructors section consists of a sequence of type definitions, one for each node, of the form:

<node_label> : <child_1_type> *  ... <child_n_type> -> <node_type>

“Node label” is an arbitrary string that you want to use as a label for that particular node type. The convention is to use a label starting with an upper case (Stratego will generate a warning if you depart from this convention). For example you might want to use the label Const for constant values or Expr for nodes representing expression subtrees, etc.

All symbols to the right of the “:” are grammar symbols. These symbols are merely place-holders for recursively defining the structure of the ATerm tree and will never occur in any ATerms. However, to ensure correctness of your programs, a node with the name node_label must take n arguments of the type listed above. The resulting node is of type node_type and can be used wherever node_type is allowed. The sorts section simply lists all the grammar symbols used in the signature file.

Notice that for the signature of a language to be complete all grammar symbols that occur on the right of the arrow (->) must eventually be defined in terms of Strings, the most basic type understood by Stratego. Stratego also understands the constructor List.

An example will make this clearer. Following is one possible way to start writing the signature for nC.

module nC

signature
  sorts
    nC_program nC_function nC_ident
    nC_formal_param nC_body nC_type
    nC_var_decl nC_stmt
  constructors
    Program   : List(nC_function) -> nC_program
    Function  : nC_type * nC_ident * List(nC_formal_param) * nC_body -> nC_function
              : String -> nC_ident
              : nC_type * nC_ident -> nC_formal_param
    Type      : String -> nC_type
    Body      : List(nC_var_decl) * List(nC_stmt) -> nC_body
       ...

This is by no means the only way to write the signature! Here, all the grammar symbols start with the prefix nC_ to emphasize that they form a name-space distinct from the labels on the left of “:”. (Stratego does not care if you mix the names, although that might create confusion.) Also, notice that not every type need be labeled. In the ATerm form, such nodes will be unnamed—the first unlabeled node in the above example will occur as a string and the second as a pair of nodes. Such unlabeled nodes are usually few and restricted to those that can be identified by their contexts so that patterns may be written succinctly. Unlabeled nodes may make the ATerm more compact, but can potentially lead to ambiguous trees. (Consider what would happen if all the nodes were unlabeled.)

A Simple Stratego Program

Once a signature file has been written it can be used inside a Stratego program that manipulates the language whose ATerms are defined by the signature. A Stratego program consists of a list of strategies and a main strategy, with the default name main. At any point in the program Stratego has a notion of a current ATerm. To start with, the current ATerm would be the root of the input tree.

Let's say that we want to write a Stratego program called duplicator. Here is one way we could start this program:

module duplicator
  imports lib nC

strategies
  main =
    io-wrap(duplicate)

    duplicate =
 ...

The first line defines the name of the module, which would usually also define the name of the file—this program would reside in duplicator.str. The next line instructs Stratego to import the standard Stratego library and the signature for nC that you just wrote (assuming it is in the current directory, or in a directory where Stratego can find it). The main section of the file is strategies. Here the first strategy, main, uses an idiom to read data from stdin and apply the strategy duplicate to it. Recall that Stratego is a program transformation tool, so a strategy usually specifies a rewriting. A strategy may succeed or fail. If the top-level strategy (i.e., main) fails the Stratego program exits with an error. If it succeeds, the program outputs the transformed tree.

Each strategy uses strategy combinators to perform non-trivial rewriting. Examples of strategy combinators include:

;
A ; B means apply the strategy A, and if it succeeds, apply B, to the current ATerm.
<+
A <+ B means apply the strategy A, and it if fails, apply B, to the current ATerm.

There are several other combinators that you can find in the Stratego documentation. For this quick-start we will only use the above two.

Here is a complete Stratego program that accepts a program in nC with exactly one function and duplicates that function, by simply replicating it syntactically.

module duplicator
  imports lib nC

strategies
  main =
    io-wrap(duplicate)

  duplicate =
    {rettype, fn, body, vardecls:
       ?Program([Function(rettype, fn, vardecls, body)])
     ; !Program([Function(rettype, fn, vardecls, body),
                 Function(rettype, fn, vardecls, body)])
    }

The above program can be compiled using the Stratego compiler strc:

$ strc -i duplicator.str -la stratego-lib

If the above command succeeds, it will create an executable called duplicator in the current directory. The “-la” flag specifies the standard Stratego library to be linked in. The program can now be executed on the command line:

$ cat example.txt | ./duplicator | pp-aterm

Here pp-aterm is the ATerm pretty-printer, which is part of the Stratego toolkit and example.txt is an nC program specified in ATerm format. Here is the complete example program that was also shown in nC and a quick introduction to ATerms.

Program (
 [ Function (
     Type("int")
   , "example"
   , [(Type("int"), "x")]
   , Body(
       [ Decl(Type("int"), ["x"])
       , Decl(Type("Float"), ["y", "z"])
       ]
     , [ Assign("x", Expr("10"))
       , Assign("y", Expr("20"))
       , Assign("z", Expr(Expr("x"), BinOp("+"), Expr("y")))
       ]
     )
   )
 ]
)

In the Stratego program, the duplicate strategy defines four local variables, retttype, fn, body, and vardecls. The scope of the variables is limited by enclosing the body of the strategy within curly braces, otherwise the default scope of the variables is global, which is usually not what you want. The strategy uses the sequencing strategy combinator (;). The first part, starting with “?” specifies a pattern. It matches an ATerm with the format specified by the pattern, matching the node names exactly and matching the variables with arbitrary nodes. If the match succeeds the variables are bound to the subtrees that correspond to the variable positions. The second part, starting with a “!”, specifies an instantiation. Usually, instantiation is the last part of a strategy and is, in a sense, its “output”. The instantiation uses the variables bound earlier to duplicate the single function in the program.

What happens if you change example.txt to include another function? Can you see how you could write the duplicate strategy even more simply?

Extending the Program

Suppose we want to modify the above program to duplicate each individual statement in the body of the function, instead of simply replicating the function. In order to do that we need to traverse the body of the function, match each individual statement and create a copy of it. Recall that the body of a function consists of two lists, a list of variable declarations, and a list of statements. It is the second list that we want to duplicate. We could write a strategy that takes a list of statements and duplicates each statement in it. However, we need not restrict ourselves to lists of statements. In fact, we could write a strategy that operates on any arbitrary list and duplicates it.

dup-list =
  {head, tail:
     ?[head | tail]
   ; !<conc>([head, head], <dup-list>tail)
  }

The dup-list strategy matches a list that is broken into a head and tail. The vertical bar (|) within square-brackets is the Stratego syntax to split a list into its first element and the rest of the list. (This is similar to LISP car and cdr.) The conc strategy is a built-in Stratego strategy that concatenates a pair of lists. We see a new syntax here, that of a strategy name enclosed in angular brackets (< >). This syntax simply means applying the named strategy to the ATerm that immediately follows it. Here, the conc strategy is applied to the unnamed pair of lists that immediately follows it—this is different from passing arguments, which we will see shortly. Similarly, dup-list is invoked recursively on the rest of the list.

However, the way it is written, the strategy is not entirely correct. What is missing? Here is the complete Stratego program.

module duplicator
  imports lib nC

strategies
  main =
    io-wrap(duplicate)

  duplicate =
    {rettype, fn, body, vardecls:
       ?Program([Function(rettype, fn, vardecls, body)])
     ; !Program([Function(rettype, fn, vardecls, <dup-body-stmts>body)])
    }

  dup-body-stmts =
    {decls, stmts:
       ?Body(decls, stmts)
     ; !Body(decls, <dup-list>stmts)
    }

  dup-list =
    {head, tail:
       ?[head | tail]
     ; !<conc>([head, head], <dup-list>tail)
    }
    <+ {?[]; ![]}

It is possible to achieve some syntactic cleanliness by replacing some of the strategies by rules. The two programs are entirely equivalent, bringing home the point that rules are simply syntactic sugar for certain kinds of strategies. The variables used in rules are automatically local.

module duplicator
  imports lib nC

strategies
  main =
    io-wrap(Duplicate)

  dup-list =
    {head, tail:
       ?[head | tail]
     ; !<conc>([head, head], <dup-list>tail)
    }
    <+ {?[]; ![]}

rules
  Duplicate:
    Program([Function(rettype, fn, vardecls, body)])
    -> Program([Function(rettype, fn, vardecls, <DuplicateBodyStmts>body)])

  DuplicateBodyStmts:
    Body(decls, stmts) -> Body(decls, <dup-list>stmts)

As a final cleaning-up step, the rule DuplicateBodyStmts can, in fact, be folded into Duplicate and dup-list can be replaced by a library strategy called mapconcat that takes a strategy (or a rule) as an argument (see “Final Remarks” on argument passing syntax). The strategy mapconcat operates on a list and the strategy passed in as the argument is expected to produce a list from each element. The resulting lists are then merged into a single list. All we need to do is pass a strategy (or a rule) that duplicates an element. Moreover, Stratego allows writing anonymous rules or lambda rules, enclosed within back-slashes (\). With this the duplicator program reduces to just a few lines if Duplicate is also passed as a lambda rule to io-wrap. We do need to include an additional module to use mapconcat.

module duplicator
  imports lib collection/list/common nC

strategies
  main =
    io-wrap(
      \ Program([Function(r, f, v, Body(d, b))])
        -> Program([Function(r, f, v, Body(d, <mapconcat(\s -> [s, s]\)>b))]) \
    )

How would you modify the program to duplicate all simple statements, including those inside compound statements (i.e., for loops and if-then-else statements) without duplicating the compound statements? How would you modify the program to handle an arbitrary number of functions?

Final Remarks

B629, Arun Chauhan, Department of Computer Science, Indiana University