Friedrich Wilhelm Schröer
Fraunhofer Institute for Computer Architecture and Software Technology
f.w.schroeer@first.fraunhofer.de
ACCENT allows the user to annote grammrs to resolve ambiguities. It also offers a default strategy. Unless an ambiguity is resolved in this way, it is detected at parsing time whether a given text is ambiguous.
AMBER can check a grammar statically. If ambiguities are detected, AMBER gives hints how to resolve them by annotations.
In general it is undecidable whether a given grammar is ambiguous. But if a given grammar is ambiguous this can be detected by enumerating and checking the token strings of a language. If such an algorithm presents a text with two different parsing trees we know that the grammar is ambiguous. But if the grammar is unambiguous the algorithm may not terminate.
AMBER is a tool that systematically generates example strings of a given grammar and checks them for ambiguity. Because this is done using an highly efficient algorithm it is realistic to check millions of such examples in short time. Whenever two examples have a common prefix the prefix is inspected only once.
Hence one has a good chance to detect a problem. Nevertheless, the user should be aware of the fact that the search space in general is infinite and that the number of examples grows exponentially with their length. AMBER has a number of options to influence the search. For example, the user can choose to inspect all examples up to a given length or a randomly selected subset allowing examples of greater length in reasonable time.
AMBER does not report ambiguities that are explicitely resolved by user annotations. The grammar specification for ACCENT should use %nodefault in order to switch off the default ambiguity resolution.
If spec.acc is a grammar in the ACCENT grammar language, then the command
Example
Progress information is displayed on stderr.
When a rule is processed we use a "dot" (denoted by "*") to indicate the actual position inside the rule. For example, in
N : M_1 ... * M_i ... M_nthe next symbol being to be processed is M_i. Such a "dotted rule" is called an item.
An item has also a "back-pointer" to find items that triggered the actual one (I do not discuss this here).
The algorithm constructs an item list for each input token.
The kernel of the item list for a particular input token is constructed by a step called the scanner.
If 't' is the current input token then all items of the previous list that have the form
M : ... * 't' ...are placed into the next item list where the dot is advanced behind the token
M : ... 't' * ...This indicates that 't' has been recognized and the symbol following it has to be processed.
If the dot appears in front of a nonterm, the "predictor" is invoked. It inserts items that start the processing of the member.
If the item has the form
M : ... * N ...and there is a rule
N : Alphathen an item
N : * Alphais inserted.
If the dot appears at the end of a rule, the "completer" is invoked. It takes the item that caused the processing of this rule and puts the dot after the corresponding nonterminal.
If the item has the form
N : ... *and this item was initially triggered by an item
M : ... * N ...then an item
M : ... N * ...is added, indicating that the member N has been processed.
Processing starts with the item
YYSTART : * S YYEOFwhere S is the start symbol of the grammar. The closure of this item determines the initial item list.
When the algorithm has processed i tokens it has constructed i item lists that contain all information to parse all continuations of the token list. The last item list has items of form
M : ... * 't' ...that will be processed by the Scanner to construct the kernel of the next item list. All tokens 't' in those items are valid continuations of the current token string.
We collect these in a list of valid tokens and treat each separately as if it would have been the next source token and construct the next item list. This is embedded into a recursive procedure that extends a given token string of length n :
extend (n) { if (search ends at n) return; l = list of valid tokens; for (each s in l) { let s be the next token; next_item_list(); extend(n+1); } }Using this approach, only valid token sequences are considered. Instead of parsing each example separately and from the beginning, examples with common prefixes are parsed together where the prefix is processed only once.
[1] |
Earley, J.: An Efficient Context-Free Parsing Algorithm Communications of the ACM Volume 13, Number 2, February 1970, pp. 94-102 |
[2] |
Schröer, F.W.: ACCENT, A Compiler Compiler for the Entire Class of Context-free Grammars compilertools.net, Technical Report, 2000 |
[3] |
Schröer, F.W.: ENTIRE, A Generic Parser for the Entire Class of Context-free Grammars compilertools.net, Technical Report, 2000 |