Merge pull request #3496 from adangel:pmd7-antlr-doc

[doc] Improve Antlr documentation #3496
2022-06-30 15:33:09 +02:00
parent 05bb0867e5 39807f325f
commit 6ce135e1cc
2 changed files with 71 additions and 10 deletions
--- a/docs/pages/pmd/devdocs/major_contributions/adding_a_new_antlr_based_language.md
+++ b/docs/pages/pmd/devdocs/major_contributions/adding_a_new_antlr_based_language.md
@@ -3,17 +3,54 @@ title: Adding PMD support for a new ANTLR grammar based language
 short_title: Adding a new language with ANTLR
 tags: [devdocs, extending]
 summary: "How to add a new language to PMD using ANTLR grammar."
-last_updated: July 21, 2019
+last_updated: October 2021
 sidebar: pmd_sidebar
 permalink: pmd_devdocs_major_adding_new_language_antlr.html
 folder: pmd/devdocs

-# needs to be changed to branch master instead of pmd/7.0.x
+#
+# needs to be changed to branch master instead of pmd/7.0.x once pmd7 is released
 # https://github.com/pmd/pmd/blob/pmd/7.0.x -> https://github.com/pmd/pmd/blob/master
+#
 ---

+{% include callout.html type="warning" content="

-## 1.  Start with a new sub-module.
+**Before you start...**<br><br>
+
+This is really a big contribution and can't be done with a drive by contribution. It requires dedicated passion
+and long commitment to implement support for a new language.<br><br>
+
+This step by step guide is just a small intro to get the basics started and it's also not necessarily up-to-date
+or complete and you have to be able to fill in the blanks.<br><br>
+
+Currently the Antlr integration has some basic limitations compared to JavaCC: The output of the
+Antlr parser generator is not an abstract syntax tree (AST) but a parse tree. As such, a parse tree is
+much more fine-grained than what a typical JavaCC grammar will produce. This means that the
+parse tree is much deeper and contains nodes down to the different token types.<br><br>
+
+The Antlr nodes themselves don't have any attributes because they are on the wrong abstraction level.
+As they don't have attributes, there are no attributes that can be used in XPath based rules.<br><br>
+
+In order to overcome these limitations, one would need to implement a post-processing step that transforms
+a parse tree into an abstract syntax tree and introducing real nodes on a higher abstraction level. This
+step is **not** described in this guide.<br><br>
+
+After the basic support for a language is there, there are lots of missing features left. Typical features
+that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.<br><br>
+
+Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type
+of each used type, following along method calls (including overloaded and overwritten methods), allowing
+to query sub types and type hierarchy. This requires additional configuration of an auxiliary classpath.
+Call and data flow analysis keep track of the data as it is moving through different execution paths
+a program has.<br><br>
+
+These features are out of scope of this guide. Type resolution and data flow are features that
+definitely don't come for free. It is much effort and requires perseverance to implement.<br><br>
+
+" %}
+
+## 1.  Start with a new sub-module
 *   See pmd-swift for examples.

 ## 2.  Implement an AST parser for your language
@@ -24,7 +61,7 @@ folder: pmd/devdocs

 ## 3.  Create AST node classes
 *   The individual AST nodes are generated, but you need to define the common interface for them.
-*   You need a need to define the supertype interface for all nodes of the language. For that, we provide
+*   You need to define the supertype interface for all nodes of the language. For that, we provide
    [`AntlrNode`](https://github.com/pmd/pmd/blob/pmd/7.0.x/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrNode.java).
 *   See [`SwiftNode`](https://github.com/pmd/pmd/blob/pmd/7.0.x/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/ast/SwiftNode.java)
    as an example.
@@ -52,7 +89,7 @@ folder: pmd/devdocs
 ## 4.  Generate your parser
 *   Make sure, you have the property `<antlr4.visitor>true</antlr4.visitor>` in your `pom.xml` file.
 *   This is just a matter of building the language module. ANTLR is called via ant, and this step is added
-    to the phase `generate-sources`. So you can just call e.g. `./mvnw generate-source -pl pmd-swift` to
+    to the phase `generate-sources`. So you can just call e.g. `./mvnw generate-sources -pl pmd-swift` to
    have the parser generated.
 *   The generated code will be placed under `target/generated-sources/antlr4` and will not be committed to
    source control.
--- a/docs/pages/pmd/devdocs/major_contributions/adding_a_new_javacc_based_language.md
+++ b/docs/pages/pmd/devdocs/major_contributions/adding_a_new_javacc_based_language.md
@@ -1,16 +1,40 @@
 ---
-title: Adding PMD support for a new JAVACC grammar based language
-short_title: Adding a new language with JAVACC
+title: Adding PMD support for a new JavaCC grammar based language
+short_title: Adding a new language with JavaCC
 tags: [devdocs, extending]
-summary: "How to add a new language to PMD using JAVACC grammar."
-last_updated: October 5, 2019
+summary: "How to add a new language to PMD using JavaCC grammar."
+last_updated: October 2021
 sidebar: pmd_sidebar
 permalink: pmd_devdocs_major_adding_new_language_javacc.html
 folder: pmd/devdocs
 ---

+{% include callout.html type="warning" content="

-## 1.  Start with a new sub-module.
+**Before you start...**<br><br>
+
+This is really a big contribution and can't be done with a drive by contribution. It requires dedicated passion
+and long commitment to implement support for a new language.<br><br>
+
+This step by step guide is just a small intro to get the basics started and it's also not necessarily up-to-date
+or complete and you have to be able to fill in the blanks.<br><br>
+
+After the basic support for a language is there, there are lots of missing features left. Typical features
+that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.<br><br>
+
+Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type
+of each used type, following along method calls (including overloaded and overwritten methods), allowing
+to query sub types and type hierarchy. This requires additional configuration of an auxiliary classpath.
+Call and data flow analysis keep track of the data as it is moving through different execution paths
+a program has.<br><br>
+
+These features are out of scope of this guide. Type resolution and data flow are features that
+definitely don't come for free. It is much effort and requires perseverance to implement.<br><br>
+
+" %}
+
+
+## 1.  Start with a new sub-module
 *    See pmd-java or pmd-vm for examples.

 ## 2.  Implement an AST parser for your language