For the purpose of this assignment we will use a small subset of C, called nC, or non-C. A program in nC is defined by the following grammar in BNF:
program | ::= | function_list |
function_list | ::= | function_list function |
| | function | |
function | ::= | type ident ( formal_params ) body |
formal_params | ::= | param_list |
| | ε | |
param_list | ::= | param_list , param |
| | param | |
param | ::= | type ident |
type | ::= | int |
| | float | |
body | ::= | { var_decls stmt_list } |
var_decls | ::= | var_decls var_decl |
| | ε | |
var_decl | ::= | type ident_list ; |
stmt_list | ::= | stmt_list stmt |
| | ε | |
stmt | ::= | assign |
| | for_loop | |
| | if_then_else | |
assign | ::= | ident = expr ; |
| | expr ; | |
for_loop | ::= | for ident = expr to expr body |
if_then_else | ::= | if ( expr ) body else body |
expr | ::= | ident |
| | const | |
| | Expr bop Expr | |
| | ( Expr ) | |
| | ident ( expr_list ) | |
ident_list | ::= | ident-list , ident |
| | ident | |
expr_list | ::= | expr_list , expr |
| | expr |
In the above specifications the symbols in red must occur literally in the source and the symbols in green represent classes of symbols—“ident” is a string of letters and digits and underscores starting with a non-digit, “const” is an integer or floating-point constant (could be a negative value), and bop is a binary operation (+, -, *, or /).
Following is an example of a valid program in nC:
int main () { int x; float y, z; x = 10; y = 2.5 / x; z = 2.5 + x; print(x, y, z); }
Even though nC is a very simple language it allows all the main constructs found in common imperative languages. nC also allows functions to have side effects. As a result, even though there is no explicit syntax for arrays, arrays can be supported by using functions with side-effects. For example, a function such as allocate_array may allocate an arbitrary array with an integer handle and functions put_array_element and get_array_element may be used to access and manipulate the array.
We can represent the above example program abstractly in the ATerm form. An ATerm, i.e., Annotated Term, is a textual representation of a tree. It is designed to represent parse trees or Abstract Syntax Trees (ASTs), which correspond to the source of a program in an arbitrary language. The above program in nC can be written in the ATerm form as follows:
Program ( [ Function( Type("int") , "main" , [] , Body ( [ Decl(Type("int"), ["x"]) , Decl(Type("float"), ["y", "z"]) ] , [ Assign("x", Const("10")) , Assign("y", Expr(Const("2.5"), BinOp("/"), Expr("x"))) , Assign("z", Expr(Const("2.5"), BinOp("+"), Expr("x"))) , Expr("print", [Expr("x"), Expr("y"), Expr("z")]) ] ) ) ] )
In the ATerm format a tree node is either a labeled node (i.e., a node with a string name) or a list enclosed in square brackets (“[” and “]”) or a constant value. The children of a node are written as a comma-separated list within parentheses. Each child is either an ATerm or a list of ATerms.
In the above example, the top-level node—the root node—is a labeled node called program that represents the entire program. Its only child is a list, of function nodes. Each function node has three child nodes, representing the name of the function as an identifier, a list of arguments, and the body of the function. Each labeled node is said to have a constructor, which is completely defined by the node's label and its arguments. Notice that a named node may have multiple constructors. More information on ATerms is available on the ATerm web site.
Notice that the ATerm representation follows the original BNF grammar very closely. Most non-terminals appear as labeled nodes, except for those representing lists. The details of how the ATerm nodes are created is beyond the scope of this assignment.
Finally, we will ignore the “annotation” aspect of the ATerm representation.
B629, Arun Chauhan, Department of Computer Science, Indiana University