QCodeEdit
2.2
|
[Previous:Tutorial] |
Syntax file creation is rather straightforward once you know how they are structured. The purpose of this small example is to cover that topic as extensively as possible.
There are two fundamental notions in QCE syntax files : contexts and "regular" matches.
The concept of context is extremely powerful while remaining extremely simple. The syntax engine enters a context when a given start token is matched and it leaves it when a stop token is matched. Within a context there can be any number of contexts and "regular" matches. contexts are typically used to match comments, strings or other special blocks of code.
"Regular" matches are what one would expect them to be : simple tokens. They can be matched from either regular expressions or plain strings. Start and stop tokens of context are "regular" matches in a way, except that they also trigger the context enter/leave event.
Now, on to the analysis of the structure of a syntax file.
The root element of the document is a <QNFA> tag. It provides various informations in its attributes.
The QNFA tag represents the root context of the language. It can contain any number of the following tags :
context : defines a context. To be valid, requires children tags of type start and stop.
sequence : defines a "regular" match. The value of this element is always assumed to be a regexp (no internal optimizations attempt for plain strings).
word : defines a "regular" match. The value of this element is checked to determine whether it can be matched as a plain string (internal optimizations). Additionally, this element will ONLY be matched at word boundaries. For instance, if the value of a word element is for, it will not be matched in "foreach" while it would have been, if declared using a sequence tag.
list : this is a "meta-element" used to group regular matches and give them the same attributes as they are propagated from the this element to its children. Subrouping (nesting list elements) is NOT supported.
Additionally, the following tags are valid inside a context block (and, again, their number isn't limited). Also note that, while ordering of all tags above within a context DO matter, ordering of the tags below DO NOT matter.
start : defines a context start token as a "regular" match (remarks made about the word tag apply to this one as well).
stop : defines a context stop token as a "regular" match (remarks made about the word tag apply to this one as well).
All these tags, except embed, support the following attributes :
format : specifies the format to be applied to the matches (highlighting). This property is propagated.
Additionally all tags, except context and list, support the following extra attributes :
exclusive : Indicate that the token may be matched multiple times. For instance some contexts have the same end token (a newline in many cases) and the innermost context must not prevent its parenth from matching the newline and exiting. This attribute is reserved to start and stop tag of a context. Valid values are "true", "false", "1" or "0".
parenthesis : specifies that the element is a parenthesis. The concept of parenthesis actually extends way beyond simple parentheses. Parentheses are tokens that may be matched (brace matching), delimit foldable block or trigger indentation.
The value of this attribute is a string formatted as follows : "$id:$type[@nomatch]". Where $id is the identifier for the parenthesis and type is its type, which can be either "open", "close" or "boundary". Finally the "@nomatch", if present, indicate that the parenthesis should not be taken into account for brace matching. The square brackets indicate that it is optional and should not be used in a syntax file.
While the "open" and "close" type of parenthesis are quite easy to understand, the "boundary" require more details. It indicates a parenthesis that acts as both "open" and "close". Typical use of such parentheses happen in C++ for visibility keywords (public, protected, private) or in Latex for chapter tags, section tags and so on. There are of course many more cases where this type of parenthesis is the right choice but there is no point in listing them all.
fold : element will delimit foldable block(s). Valid values are "true", "false", "1" or "0".
The context tag however supports the following extra attributes :
transparency : specifies whether the contexts and matches declared before this context should also be matched inside that context (with no need to declare them again). Valid values are "true", "false", "1" or "0".
The regexp format used by QCE is near to that used by QRegExp but with some slight variations.
First of all, a list of QRegExp features not supported in syntax files :
Then, character classes (word, space, digit and their negation) use the same "specific character" (respectively w, s, d and uppercase) but a different prefix character ($ instead of ).
C-style escaping is used. Simple C escapes (for newlines and tab) are converted properly and C-style escaping is used to escape control characters.
Sets and negated sets are supported, using the same syntax as QRegExp.
Regular regexp operator '?', '*' and '+' are supported.
A revision of the syntax format may bring grouping and alternation support (and possibly other niceties) in a future version but as this would break backward compat (due to escaping issues among other things) and require a rewrite of the syntax engine a new (but very similar) syntax file format would be used.
Now that the fundamentals have been covered, let's use them to create a small syntax file for an imaginary language.
More examples availables in the qxs/ directory where all syntax files reside.