The parser's principal role is to generate a parse tree. It does that by following language-specific production rules that are followed after encountering lexical tokens that are provided by a lexer.
By means of construction flags it is possible to tell the lexer to accept e.g. 'class' as a keyword (C++) or as an identifier (C). Similarly, it is possible to configure the parser for particular rules.
The parse tree itself is a lisp-like structure. All nodes subclass PTree::Atom (for terminals) or PTree::List (for non-terminals). A Visitor allows to traverse the parse tree based on the real run-time types of the individual nodes (there are about 120 different PTree::Node types).
The C++ grammar makes it quite hard to recover certain semantic information from syntactic structure. For example, in a simple declaration individual declarators may carry part of the type information for the variables they declare. For example,
char *a, b, c[3];
three declarators a, b, and
c. The first has type char *
, the second
char
, the third char[3]
. In order to avoid the
need to analyze the whole declaration to extract the type of a declarator,
the parser attaches the type and name to declarators.
A similar argument applies to other cases, where non-local information
is encoded into a node's encoded_name
and encoded_type
member.
The Encoding class needs to be able to represent full type names, and thus it seems sensible to use a mangling similar (or even identical !) to the one developed as part of the C++ ABI standard (see C++ ABI).
Parse Trees tend to grow quickly, and it becomes quickly hard to debug them by
simply traversing the list. Thus, the PTree
module provides a simple
means to print a (sub-)tree to an output stream.
PTree::display(node, std::cout, false, false);
will print the tree referred to by node
to std::cout
.
The third parameter is a flag indicating whether the encodings should be printed, too.
The fourth parameter indicates, whether the actual C++ type of the node being printed should
be included in the output.
Since this API turned out to be rather useful, there is a stand-alone applet that just generates a parse tree and then prints it out using the above function.
display-ptree [-g <output>] [-d] [-r] [-e] <input>
The available options are:
filename
Generate a dot graph and write it to the given file.
Print debug information (in particular traces) during the parsing.
Print the C++ type of the parse tree nodes.
Print encoded names / types for nodes such as names, declarators, etc..