Table of Contents
This page covers the specifics of writing a rule in Java. The basic development process is very similar to the process for XPath rules, which is described in Your First Rule.
Basically, you open the designer, look at the structure of the AST, and refine your rule as you add test cases.
In this page we’ll talk about rules for the Java language, but the process is very similar for other languages.
Basics
To write a rule in Java you’ll have to:
- Write a Java class that implements the interface
Rule
. Each language implementation provides a base rule class to ease your pain, e.g.AbstractJavaRule
. - Compile this class, linking it to PMD APIs (e.g. using PMD as a Maven dependency)
- Bundle this into a JAR and add it to the execution classpath of PMD
- Declare the rule in your ruleset XML
Rule execution
Most base rule classes use a Visitor pattern to explore the AST.
Tree traversal
When a rule is applied to a file, it’s handed the root of the AST and told
to traverse all the tree to look for violations. Each rule defines a specific
visit
method for each type of node for of the language, which
by default just visits the children.
So the following rule would traverse the whole tree and do nothing:
public class MyRule extends AbstractJavaRule {
// all methods are default implementations!
}
Generally, a rule wants to check for only some node types. In our XPath example
in Your First Rule,
we wanted to check for some VariableDeclaratorId
nodes. That’s the XPath name,
but in Java, you’ll get access to the ASTVariableDeclaratorId
full API.
If you want to check for some specific node types, you can override the
corresponding visit
method:
public class MyRule extends AbstractJavaRule {
@Override
public Object visit(ASTVariableDeclaratorId node, Object data) {
// This method is called on each node of type ASTVariableDeclaratorId
// in the AST
if (node.getType() == short.class) {
// reports a violation at the position of the node
// the "data" parameter is a context object handed to by your rule
// the message for the violation is the message defined in the rule declaration XML element
asCtx(data).addViolation(node);
}
// this calls back to the default implementation, which recurses further down the subtree
return super.visit(node, data);
}
}
The super.visit(node, data)
call is super common in rule implementations,
because it makes the traversal continue by visiting all the descendants of the
current node.
Stopping the traversal
Sometimes you have checked all you needed and you’re sure that the descendants
of a node may not contain violations. In that case, you can avoid calling the
super
implementation and the traversal will not continue further down. This
means that your callbacks (visit
implementations) won’t be called on the rest
of the subtree. The siblings of the current node may be visited
recursively nevertheless.
Economic traversal: the rulechain
If you don’t care about the order in which the nodes are traversed (e.g. your rule doesn’t maintain any state between visits), then you can monumentally speed-up your rule by using the rulechain.
That mechanism doesn’t recurse on all the tree, instead, your rule will only be passed the nodes it is interested in. To use the rulechain correctly:
- Your rule must override the method
buildTargetSelector
. This method should return a target selector, that selects all the node types you are interested in. E.g. the factory methodforTypes
can be used to create such a selector. - For the Java language, there is another base class, to make it easier:
AbstractJavaRulechainRule
. You’ll need to call the super constructor and provide the node types you are interested in. - Your visit methods must not recurse! In effect, you should call never
call
super.visit
in the methods.
Manual AST navigation
In Java rule implementations, you often need to navigate the AST to find the interesting nodes.
In your visit
implementation, you can start navigating the AST from the given node.
The Node
interface provides a couple of useful methods
that return a NodeStream
and can be used to query the AST:
The returned NodeStream API provides easy to use methods that follow the Java Stream API (java.util.stream
).
Example:
NodeStream.of(someNode) // the stream here is empty if the node is null
.filterIs(ASTVariableDeclaratorId.class)// the stream here is empty if the node was not a variable declarator id
.followingSiblings() // the stream here contains only the siblings, not the original node
.filterIs(ASTVariableInitializer.class)
.children(ASTExpression.class)
.children(ASTPrimaryExpression.class)
.children(ASTPrimaryPrefix.class)
.children(ASTLiteral.class)
.filterMatching(Node::getImage, "0")
.filterNot(ASTLiteral::isStringLiteral)
.nonEmpty(); // If the stream is non empty here, then all the pipeline matched
The Node
interface provides also an alternative way to navigate the AST for convenience:
getParent
getNumChildren
getChild
getFirstChild
getLastChild
getPreviousSibling
getNextSibling
firstChild
Depending on the AST of the language, there might also be more specific methods that can be used to
navigate. E.g. in Java there exists the method ASTIfStatement#getCondition
to get the condition of an If-statement.
Reporting violations
In your visit method, you have access to the RuleContext
which is the entry point into
reporting back during the analysis.
addViolation
reports a rule violation at the position of the given node with the message defined in the rule declaration XML element.- The message defined in the rule declaration XML element might contain placeholder, such as
{0}
. In that case, you need to calladdViolation
and provide the values for the placeholders. The message is actually processed as ajava.text.MessageFormat
. - Sometimes a rule might want to differentiate between different cases of a violation and use different
messages. This is possible by calling the methods
addViolationWithMessage
oraddViolationWithMessage
. Using these methods, the message defined in the rule declaration XML element is not used. - Rules can be customized using properties and sometimes you want to include the actual value of a property
in the message, e.g. if the rule enforces a specific limit.
The syntax for such placeholders is:
${propertyName}
. - Some languages support additional placeholder variables. E.g. for Java, you can use
${methodName}
to insert the name of the method in which the violation occurred. See Java-specific features and guidance.
Execution across files, thread-safety and statefulness
When starting execution, PMD will instantiate a new instance of your rule. If PMD is executed in multiple threads, then each thread is using its own instance of the rule. This means, that the rule implementation does not need to care about threading issues, as PMD makes sure, that a single instance is not used concurrently by multiple threads.
However, for performance reasons, the rule instances are reused for multiple files.
This means, that the constructor of the rule is only executed once (per thread)
and the rule instance is reused. If you rely on a proper initialization of instance
properties, you can do the initialization in the start
method of the rule
(you need to override this method).
The start method is called exactly once per file.
Using metrics
Some languages might support metrics.
Using symbol table
Some languages might support symbol table.
Using type resolution
Some languages might support type resolution.
Rule lifecycle reference
Construction
Exactly once (per thread):
- The rule’s no-arg constructor is called when loading the ruleset. The rule’s constructor must define already any Property descriptors the rule wants to use.
- If the rule was included in the ruleset as a rule reference, some properties may be overridden. If an overridden property is unknown, an error is reported.
- Misconfigured rules are removed from the ruleset
Execution
For each thread, a deep copy of the rule is created. Each thread is given a different set of files to analyse. Then, for each such file and for each rule copy:
start
is called once, before parsingapply
is called with the root of the AST. That method performs the AST traversal that ultimately calls visit methods. It’s not called for RuleChain rules.end
is called when the rule is done processing the file
Example projects
See https://github.com/pmd/pmd-examples for a couple of example projects, that create custom PMD rules for different languages.