First of all, thanks for the contribution!
Happily for you, to add CPD support for a new language is now easier than ever!
All you need to do is follow this few steps:
- Create a new module for your language, you can take the Golang module as an example
-
Create a Tokenizer
- For Antlr grammars you can take the grammar from here and extend AntlrTokenizer taking Go as an example
public class GoTokenizer extends AntlrTokenizer { @Override protected AntlrTokenManager getLexerForSource(SourceCode sourceCode) { CharStream charStream = AntlrTokenizer.getCharStreamFromSourceCode(sourceCode); return new AntlrTokenManager(new GolangLexer(charStream), sourceCode.getFileName()); } }
- For JavaCC grammars you should subclass JavaCCTokenizer which has many examples you could follow, you should also take the Python implementation as reference
- For any other scenario you can use AnyTokenizer
If you’re using Antlr or JavaCC, update the pom.xml of your submodule to use the appropriate ant wrapper. See pmd-go/pom.xml
and pmd-python/pom.xml
for examples.
-
Create your Language class
public class GoLanguage extends AbstractLanguage { public GoLanguage() { super("Go", "go", new GoTokenizer(), ".go"); } }
Pro Tip: Yes, keep looking at Go!You are almost there!
-
Update the list of supported languages
-
Write the fully-qualified name of your Language class to the file
src/main/resources/META-INF/services/net.sourceforge.pmd.cpd.Language
-
Update the test that asserts the list of supported languages by updating the
SUPPORTED_LANGUAGES
constant in BinaryDistributionIT
-
-
Please don’t forget to add some test, you can again.. look at Go implementation ;)
If you read this far, I’m keen to think you would also love to support some extra CPD configuration (ignore imports or crazy things like that)
If that’s your case , you came to the right place! -
You can add your custom properties using a Token filter
-
For Antlr grammars all you need to do is implement your own AntlrTokenFilter
And by now, I know where you are going to look…
WRONG
Why do you want GO to solve all your problems?
You should take a look to Kotlin token filter implementation
-
For non-Antlr grammars you can use BaseTokenFilter directly or take a peek to Java’s token filter
-
Testing your implementation
Add a Maven dependency on pmd-lang-test
(scope test
) in your pom.xml
.
This contains utilities to test your Tokenizer.
For simple tests, create a test class extending from CpdTextComparisonTest
.
That class is written in Kotlin, but you can extend it in Java as well.
To add tests, you need to write regular JUnit @Test
-annotated methods, and
call the method doTest
with the name of the test file.
For example, for the Dart language:
public class DartTokenizerTest extends CpdTextComparisonTest {
/**********************************
Implementation of the superclass
***********************************/
public DartTokenizerTest() {
super(".dart"); // the file extension for the dart language
}
@Override
protected String getResourcePrefix() {
// If your class is in src/test/java /some/package
// you need to place the test files in src/test/resources/some/package/cpdData
return "cpdData";
}
@Override
public Tokenizer newTokenizer() {
// Override this abstract method to return the correct tokenizer
return new DartTokenizer();
}
/**************
Test methods
***************/
@Test // don't forget the JUnit annotation
public void testLiterals() {
// This will look for a file named literals.dart
// in the directory identified by getResourcePrefix,
// tokenize it, then compare the result against a baseline
// literals.txt file in the same directory
// If the baseline file does not exist, it is created automatically
doTest("literals");
}
}