Merge pull request #3496 from adangel:pmd7-antlr-doc

[doc] Improve Antlr documentation #3496
This commit is contained in:
Andreas Dangel
2022-06-30 15:33:09 +02:00
2 changed files with 71 additions and 10 deletions

View File

@@ -3,17 +3,54 @@ title: Adding PMD support for a new ANTLR grammar based language
short_title: Adding a new language with ANTLR
tags: [devdocs, extending]
summary: "How to add a new language to PMD using ANTLR grammar."
last_updated: July 21, 2019
last_updated: October 2021
sidebar: pmd_sidebar
permalink: pmd_devdocs_major_adding_new_language_antlr.html
folder: pmd/devdocs
# needs to be changed to branch master instead of pmd/7.0.x
#
# needs to be changed to branch master instead of pmd/7.0.x once pmd7 is released
# https://github.com/pmd/pmd/blob/pmd/7.0.x -> https://github.com/pmd/pmd/blob/master
#
---
{% include callout.html type="warning" content="
## 1. Start with a new sub-module.
**Before you start...**<br><br>
This is really a big contribution and can't be done with a drive by contribution. It requires dedicated passion
and long commitment to implement support for a new language.<br><br>
This step by step guide is just a small intro to get the basics started and it's also not necessarily up-to-date
or complete and you have to be able to fill in the blanks.<br><br>
Currently the Antlr integration has some basic limitations compared to JavaCC: The output of the
Antlr parser generator is not an abstract syntax tree (AST) but a parse tree. As such, a parse tree is
much more fine-grained than what a typical JavaCC grammar will produce. This means that the
parse tree is much deeper and contains nodes down to the different token types.<br><br>
The Antlr nodes themselves don't have any attributes because they are on the wrong abstraction level.
As they don't have attributes, there are no attributes that can be used in XPath based rules.<br><br>
In order to overcome these limitations, one would need to implement a post-processing step that transforms
a parse tree into an abstract syntax tree and introducing real nodes on a higher abstraction level. This
step is **not** described in this guide.<br><br>
After the basic support for a language is there, there are lots of missing features left. Typical features
that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.<br><br>
Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type
of each used type, following along method calls (including overloaded and overwritten methods), allowing
to query sub types and type hierarchy. This requires additional configuration of an auxiliary classpath.
Call and data flow analysis keep track of the data as it is moving through different execution paths
a program has.<br><br>
These features are out of scope of this guide. Type resolution and data flow are features that
definitely don't come for free. It is much effort and requires perseverance to implement.<br><br>
" %}
## 1. Start with a new sub-module
* See pmd-swift for examples.
## 2. Implement an AST parser for your language
@@ -24,7 +61,7 @@ folder: pmd/devdocs
## 3. Create AST node classes
* The individual AST nodes are generated, but you need to define the common interface for them.
* You need a need to define the supertype interface for all nodes of the language. For that, we provide
* You need to define the supertype interface for all nodes of the language. For that, we provide
[`AntlrNode`](https://github.com/pmd/pmd/blob/pmd/7.0.x/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrNode.java).
* See [`SwiftNode`](https://github.com/pmd/pmd/blob/pmd/7.0.x/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/ast/SwiftNode.java)
as an example.
@@ -52,7 +89,7 @@ folder: pmd/devdocs
## 4. Generate your parser
* Make sure, you have the property `<antlr4.visitor>true</antlr4.visitor>` in your `pom.xml` file.
* This is just a matter of building the language module. ANTLR is called via ant, and this step is added
to the phase `generate-sources`. So you can just call e.g. `./mvnw generate-source -pl pmd-swift` to
to the phase `generate-sources`. So you can just call e.g. `./mvnw generate-sources -pl pmd-swift` to
have the parser generated.
* The generated code will be placed under `target/generated-sources/antlr4` and will not be committed to
source control.

View File

@@ -1,16 +1,40 @@
---
title: Adding PMD support for a new JAVACC grammar based language
short_title: Adding a new language with JAVACC
title: Adding PMD support for a new JavaCC grammar based language
short_title: Adding a new language with JavaCC
tags: [devdocs, extending]
summary: "How to add a new language to PMD using JAVACC grammar."
last_updated: October 5, 2019
summary: "How to add a new language to PMD using JavaCC grammar."
last_updated: October 2021
sidebar: pmd_sidebar
permalink: pmd_devdocs_major_adding_new_language_javacc.html
folder: pmd/devdocs
---
{% include callout.html type="warning" content="
## 1. Start with a new sub-module.
**Before you start...**<br><br>
This is really a big contribution and can't be done with a drive by contribution. It requires dedicated passion
and long commitment to implement support for a new language.<br><br>
This step by step guide is just a small intro to get the basics started and it's also not necessarily up-to-date
or complete and you have to be able to fill in the blanks.<br><br>
After the basic support for a language is there, there are lots of missing features left. Typical features
that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.<br><br>
Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type
of each used type, following along method calls (including overloaded and overwritten methods), allowing
to query sub types and type hierarchy. This requires additional configuration of an auxiliary classpath.
Call and data flow analysis keep track of the data as it is moving through different execution paths
a program has.<br><br>
These features are out of scope of this guide. Type resolution and data flow are features that
definitely don't come for free. It is much effort and requires perseverance to implement.<br><br>
" %}
## 1. Start with a new sub-module
* See pmd-java or pmd-vm for examples.
## 2. Implement an AST parser for your language