Java parser symbols

The parse tree is built with all children pointing to their parents. As each node finishes and becomes well-formed, it attaches itself to the proper parent. Thus, there is a natural collapsing of useless tree symbols. When changing code in the parse tree, never change the parent field except for children of the current symbol being considered. You'll likely destroy the parse tree. Examples of what I mean are the infix and postfix operator precedence resolution. Only when all the infix and postfix operators have been created and control returns to the parent of all these operators is the order rearranged.

Memory leakage. There are a large quantity of leaked symbols. By that, I mean that of those symbols that should be dropped after the final parse tree is assembled, many are still pointed to somewhere in the tree. Because of the doubly-linked nature of the tree, if any client holds onto just one symbol of the tree, most of the parse tree will stick around. Hopefully we can clean that up in the future, but it's not real high priority right now.

Common root and error reporting

During parse tree gneration, CommonRoot is the root of the entire tree because there needs to be a uniform way to report errors and detect the parsing finis. The real root that clients see will either be RootSymbol, CodeBlockSymbol, or ExpressionSymbol (thus, each is an ErrorReporter).

Expression implementation

Collapsing. Basically, 80% of the expression symbols generated by the recognizer are useless. The collapsing mechanism is the following. Useless expressions are tagged by the recognizer with a data value of zero. All useless expressions are collapsed as the parse tree is built.

Infix operators. Infix precedence is unavailable at the recognition level (if it was, it would cause 97.5% of the generated expressions to be useless as opposed to just 80%). The recognizer handles this by creating a parent InfixExpressionSymbol and creating more InfixExpressionSymbols as it continues to encounter more infix operators. As these infix symbols are created, they're pushed onto a InfixExpressionSymbol.InfixPrecedenceStack object kept by the parent infix symbol. The stack object determines the correct order of execution and rearranges the tree accordingly.

Postfix operators and primary selectors. Given a series of such operators, the operators are executed from left to right which is opposite the order they are added to the parse tre. Similar to the infix operator, a sequence of such operators will have a parent symbol. The PostfixExpressionSymbol and PrimaryExpressionSymbol both maintain stacks and after the sequence of operators has concluded, it rearranges the tree accordingly.

Typecasts. Now, the way the parse tree handles typecasts isn't real pretty. There is a constructor TypeSymbol( ExpressionSymbol, ReadTextBuffer ) that takes care of this conversion. Simply put, it ain't real pretty. A SYNTAX_EXP_INNER is considered to be a typecast if its data value is JavaTokens.TK_LPAREN.


Author: Andy Yu (acyu)