The Java parser package

Public API

The Java parser package primarily provides two APIs. The first is composed of the JavaLexer class which provides Java lexical analysis functionality (a scanner, for those unfamiliar with the terminology). The second is the JavaParser class which provides Java parsing functionality as well as a facility for selective parsing.

JavaLexer

The JavaLexer implements the oracle.javatools.parser.Lexer interface. Clients can find the enumerated token constants in the JavaTokens interface. Other functionality provided is the ability to skip comments and the ability to recognize (but not lex into) SQLJ constructs. Refer to the documentation found in the JavaLexer class for more details.

JavaParser

The JavaParser provides a host of public static methods. This is so that every parse is done with a new instance of the parser thus avoiding any concurrency issues. The methods provide the ability to do a full parse (default) of a Java source file as well as the ability to do selectively parsing.

Selective parsing allows the client to parse at any of the following three parse levels: root-level (e.g. only obtain class member names), statement-level, or expression-level (full Java parsing). Selective parsing also allows clients to parse arbitrary blocks or expressions.

The JavaParser has two modes of error handling. The first mode is choke-on-any-error. A successful parse produces a well-formed parse tree. This is best for clients that have no facility (or just don't care) for reporting syntax errors to users. The second mode produces as much of a parse tree as possible and provides an array of error Strings that can be reported to the user. Here, if any errors are present, there is no guarantee of a well-formed parse tree.

Note that though the JavaParser will usually be called on oracle.javatools.buffer.TextBuffer, it does NOT obtain read locks because the JavaParser talks through the ReadTextBuffer which does NOT have any facility for read locking. Clients are expected to obtain all proper locks prior to parsing.

Stress test

In the oracle.javatools.test.parser directory are three files of interest. JavaSyntaxRecognizerTester isn't useful except for seeing the entire syntax stream produced by the recognizer. JavaTreeGeneratorTester generates a compilable source file given a successful parse of a Java file. stresstest.pl is a Perl script that runs over the entire oracle.* and borland.jbuilder.* hierarchy in the JDev 5.0 product. The generated src files are then compiled and JDev 5.0 is run using those class files. So far as I could tell, JDev was running the way it was supposed to (Yes!). The one file that the parser cannot handle is oracle.jdevimpl.uieditor.menucanvas.EditMenuItem because it has two typecasts in a row in a number of places. But, otherwise, the non-selective parser functionality works.

Known bugs

Because of the algorithm used by the recognizer, it's very difficult with any prefix operator after a typecast, including more typecasts. The easiest way to put this is, typecasts are non-associative with other prefix operators. The cases where programmers actually use this obscure associativity feature of the typecast are very few.

Non-quite-so-public API

JavaSyntaxRecognizer

The JavaSyntaxRecognizer is the heart of the syntactic analysis, the first stage of parsing. It implements a fully "stop-and-go" parsing algorithm and also implements what we term "selective" parsing. In general, clients should refer to the JavaParser for the parser API. The JavaSyntaxRecognizer can be used by clients, but the only API that has been exposed is the undocumented list of enumerated constants found in the JavaSyntaxCodes interface.

The JavaSyntaxCodes interface was intentionally left undocumented to force clients to use JavaParser. Because the syntax recognizer is the guts of the parsing algorithm, any direct client would have to fully understand the parsing algorithm in order to use the contained functionality. In contrast, the JavaParser and parse tree APIs completely hide the parsing algorithms used.


Author: Andy Yu