@ -117,15 +117,15 @@ definitely don't come for free. It is much effort and requires perseverance to i
|
|||||||
|
|
||||||
## 5. Create a TokenManager
|
## 5. Create a TokenManager
|
||||||
* This is needed to support CPD (copy paste detection)
|
* This is needed to support CPD (copy paste detection)
|
||||||
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java).
|
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrTokenManager.java).
|
||||||
* You must create your own "AntlrTokenizer" such as we do with
|
* You must create your own "AntlrCpdLexer" such as we do with
|
||||||
[`SwiftTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftTokenizer.java).
|
[`SwiftCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftCpdLexer.java).
|
||||||
* If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON")
|
* If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON")
|
||||||
you can create your own implementation of
|
you can create your own implementation of
|
||||||
[`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java).
|
[`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java).
|
||||||
You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)`
|
You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)`
|
||||||
and return your custom filter. See the tokenizer for C# as an exmaple:
|
and return your custom filter. See the CpdLexer for C# as an exmaple:
|
||||||
[`CsTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsTokenizer.java).
|
[`CsCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsCpdLexer.java).
|
||||||
|
|
||||||
If you don't need a custom token filter, you don't need to override the method. It returns the default
|
If you don't need a custom token filter, you don't need to override the method. It returns the default
|
||||||
`AntlrTokenFilter` which doesn't filter anything.
|
`AntlrTokenFilter` which doesn't filter anything.
|
||||||
|
@ -11,7 +11,7 @@ author: Matías Fraga, Clément Fournier
|
|||||||
## Adding support for a CPD language
|
## Adding support for a CPD language
|
||||||
|
|
||||||
CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}.
|
CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}.
|
||||||
To add support for a new language, the crucial piece is writing a tokenizer that
|
To add support for a new language, the crucial piece is writing a CpdLexer that
|
||||||
splits the source file into the tokens specific to your language. Thankfully you
|
splits the source file into the tokens specific to your language. Thankfully you
|
||||||
can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC
|
can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC
|
||||||
grammar to generate a lexer for you. If you cannot use a lexer generator, for
|
grammar to generate a lexer for you. If you cannot use a lexer generator, for
|
||||||
@ -31,12 +31,12 @@ Use the following guide to set up a new language module that supports CPD.
|
|||||||
the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)).
|
the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)).
|
||||||
Once that is done, `mvn generate-sources` should generate the lexer sources for you.
|
Once that is done, `mvn generate-sources` should generate the lexer sources for you.
|
||||||
|
|
||||||
You can now implement a tokenizer, for instance by extending {% jdoc core::cpd.impl.AntlrTokenizer %}. The following reproduces the Go implementation:
|
You can now implement a CpdLexer, for instance by extending {% jdoc core::cpd.impl.AntlrCpdLexer %}. The following reproduces the Go implementation:
|
||||||
```java
|
```java
|
||||||
// mind the package convention if you are going to make a PR
|
// mind the package convention if you are going to make a PR
|
||||||
package net.sourceforge.pmd.lang.go.cpd;
|
package net.sourceforge.pmd.lang.go.cpd;
|
||||||
|
|
||||||
public class GoTokenizer extends AntlrTokenizer {
|
public class GoCpdLexer extends AntlrCpdLexer {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected Lexer getLexerForSource(CharStream charStream) {
|
protected Lexer getLexerForSource(CharStream charStream) {
|
||||||
@ -64,9 +64,9 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
public Tokenizer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||||
// This method should return an instance of the tokenizer you created.
|
// This method should return an instance of the CpdLexer you created.
|
||||||
return new GoTokenizer();
|
return new GoCpdLexer();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
@ -77,7 +77,7 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
|
|||||||
|
|
||||||
4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java).
|
4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java).
|
||||||
|
|
||||||
5. Add some tests for your tokenizer by following the [section below](#testing-your-implementation).
|
5. Add some tests for your CpdLexer by following the [section below](#testing-your-implementation).
|
||||||
|
|
||||||
6. Finishing up your new language module by adding a page in the documentation. Create a new markdown file
|
6. Finishing up your new language module by adding a page in the documentation. Create a new markdown file
|
||||||
`<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:
|
`<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:
|
||||||
@ -100,10 +100,10 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
|
|||||||
{% endraw %}
|
{% endraw %}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Declaring tokenizer options
|
### Declaring CpdLexer options
|
||||||
|
|
||||||
To make the tokenizer configurable, first define some property descriptors using
|
To make the CpdLexer configurable, first define some property descriptors using
|
||||||
{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.Tokenizer %}
|
{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.CpdLexer %}
|
||||||
for some predefined ones which you can reuse (prefer reusing property descriptors if you can).
|
for some predefined ones which you can reuse (prefer reusing property descriptors if you can).
|
||||||
You need to override {% jdoc core::Language#newPropertyBundle() %}
|
You need to override {% jdoc core::Language#newPropertyBundle() %}
|
||||||
and call `definePropertyDescriptor` to register the descriptors.
|
and call `definePropertyDescriptor` to register the descriptors.
|
||||||
@ -112,13 +112,13 @@ of {% jdoc core::cpd.CpdCapableLanguage#createCpdTokenizer(core::lang.LanguagePr
|
|||||||
|
|
||||||
To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %}
|
To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %}
|
||||||
as a base class, or another base class in {% jdoc_package core::cpd.impl %}.
|
as a base class, or another base class in {% jdoc_package core::cpd.impl %}.
|
||||||
Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinTokenizer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaTokenizer.java).
|
Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinCpdLexer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaCpdLexer.java).
|
||||||
|
|
||||||
|
|
||||||
### Testing your implementation
|
### Testing your implementation
|
||||||
|
|
||||||
Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`.
|
Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`.
|
||||||
This contains utilities to test your tokenizer.
|
This contains utilities to test your CpdLexer.
|
||||||
|
|
||||||
Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}.
|
Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}.
|
||||||
To add tests, you need to write regular JUnit `@Test`-annotated methods, and
|
To add tests, you need to write regular JUnit `@Test`-annotated methods, and
|
||||||
|
@ -24,7 +24,7 @@ PropertyName is the name of the property converted to SCREAMING_SNAKE_CASE, that
|
|||||||
|
|
||||||
As a convention, properties whose name start with an *x* are internal and may be removed or changed without notice.
|
As a convention, properties whose name start with an *x* are internal and may be removed or changed without notice.
|
||||||
|
|
||||||
Properties whose name start with **CPD** are used to configure CPD tokenizer options.
|
Properties whose name start with **CPD** are used to configure CPD CpdLexer options.
|
||||||
|
|
||||||
Programmatically, the language properties can be set on `PMDConfiguration` (or `CPDConfiguration`) before using the
|
Programmatically, the language properties can be set on `PMDConfiguration` (or `CPDConfiguration`) before using the
|
||||||
{%jdoc core::PmdAnalyzer %} (or {%jdoc core::cpd.CpdAnalyzer %}) instance
|
{%jdoc core::PmdAnalyzer %} (or {%jdoc core::cpd.CpdAnalyzer %}) instance
|
||||||
|
@ -159,10 +159,14 @@ The following previously deprecated classes have been removed:
|
|||||||
If the current version is needed, then `Node.getTextDocument().getLanguageVersion()` can be used. This
|
If the current version is needed, then `Node.getTextDocument().getLanguageVersion()` can be used. This
|
||||||
is the version that has been selected via CLI `--use-version` parameter.
|
is the version that has been selected via CLI `--use-version` parameter.
|
||||||
|
|
||||||
**Renamed classes**
|
**Renamed classes and methods **
|
||||||
|
|
||||||
* pmd-core
|
* pmd-core
|
||||||
* {%jdoc_old core::lang.ast.TokenMgrError %} has been renamed to {% jdoc core::lang.ast.LexException %}
|
* {%jdoc_old core::lang.ast.TokenMgrError %} has been renamed to {% jdoc core::lang.ast.LexException %}
|
||||||
|
* {%jdoc_old core::cpd.Tokenizer %} has been renamed to {% jdoc core::cpd.CpdLexer %}. Along with this rename,
|
||||||
|
all the implementations have been renamed as well (`Tokenizer` -> `CpdLexer`), e.g. "CppCpdLexer", "JavaCpdLexer".
|
||||||
|
This affects all language modules.
|
||||||
|
* {%jdoc_old core::cpd.AnyTokenizer %} has been renamed to {% jdoc core::cpd.AnyCpdLexer %}.
|
||||||
|
|
||||||
#### External Contributions
|
#### External Contributions
|
||||||
* [#4640](https://github.com/pmd/pmd/pull/4640): \[cli] Launch script fails if run via "bash pmd" - [Shai Bennathan](https://github.com/shai-bennathan) (@shai-bennathan)
|
* [#4640](https://github.com/pmd/pmd/pull/4640): \[cli] Launch script fails if run via "bash pmd" - [Shai Bennathan](https://github.com/shai-bennathan) (@shai-bennathan)
|
||||||
|
@ -5,12 +5,12 @@
|
|||||||
package net.sourceforge.pmd.lang.apex;
|
package net.sourceforge.pmd.lang.apex;
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.lang.LanguageModuleBase;
|
import net.sourceforge.pmd.lang.LanguageModuleBase;
|
||||||
import net.sourceforge.pmd.lang.LanguageProcessor;
|
import net.sourceforge.pmd.lang.LanguageProcessor;
|
||||||
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
||||||
import net.sourceforge.pmd.lang.PmdCapableLanguage;
|
import net.sourceforge.pmd.lang.PmdCapableLanguage;
|
||||||
import net.sourceforge.pmd.lang.apex.cpd.ApexTokenizer;
|
import net.sourceforge.pmd.lang.apex.cpd.ApexCpdLexer;
|
||||||
|
|
||||||
public class ApexLanguageModule extends LanguageModuleBase implements PmdCapableLanguage, CpdCapableLanguage {
|
public class ApexLanguageModule extends LanguageModuleBase implements PmdCapableLanguage, CpdCapableLanguage {
|
||||||
private static final String ID = "apex";
|
private static final String ID = "apex";
|
||||||
@ -47,7 +47,7 @@ public class ApexLanguageModule extends LanguageModuleBase implements PmdCapable
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||||
return new ApexTokenizer();
|
return new ApexCpdLexer();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -12,16 +12,16 @@ import org.antlr.runtime.ANTLRStringStream;
|
|||||||
import org.antlr.runtime.Lexer;
|
import org.antlr.runtime.Lexer;
|
||||||
import org.antlr.runtime.Token;
|
import org.antlr.runtime.Token;
|
||||||
|
|
||||||
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.cpd.TokenFactory;
|
import net.sourceforge.pmd.cpd.TokenFactory;
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
|
||||||
import net.sourceforge.pmd.lang.apex.ApexJorjeLogging;
|
import net.sourceforge.pmd.lang.apex.ApexJorjeLogging;
|
||||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||||
|
|
||||||
import apex.jorje.parser.impl.ApexLexer;
|
import apex.jorje.parser.impl.ApexLexer;
|
||||||
|
|
||||||
public class ApexTokenizer implements Tokenizer {
|
public class ApexCpdLexer implements CpdLexer {
|
||||||
|
|
||||||
public ApexTokenizer() {
|
public ApexCpdLexer() {
|
||||||
ApexJorjeLogging.disableLogging();
|
ApexJorjeLogging.disableLogging();
|
||||||
}
|
}
|
||||||
|
|
@ -9,9 +9,9 @@ import org.junit.jupiter.api.Test;
|
|||||||
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
|
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
|
||||||
import net.sourceforge.pmd.lang.apex.ApexLanguageModule;
|
import net.sourceforge.pmd.lang.apex.ApexLanguageModule;
|
||||||
|
|
||||||
class ApexTokenizerTest extends CpdTextComparisonTest {
|
class ApexCpdLexerTest extends CpdTextComparisonTest {
|
||||||
|
|
||||||
ApexTokenizerTest() {
|
ApexCpdLexerTest() {
|
||||||
super(ApexLanguageModule.getInstance(), ".cls");
|
super(ApexLanguageModule.getInstance(), ".cls");
|
||||||
}
|
}
|
||||||
|
|
@ -4,10 +4,10 @@
|
|||||||
|
|
||||||
package net.sourceforge.pmd.lang.coco;
|
package net.sourceforge.pmd.lang.coco;
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
||||||
import net.sourceforge.pmd.lang.LanguageRegistry;
|
import net.sourceforge.pmd.lang.LanguageRegistry;
|
||||||
import net.sourceforge.pmd.lang.coco.cpd.CocoTokenizer;
|
import net.sourceforge.pmd.lang.coco.cpd.CocoCpdLexer;
|
||||||
import net.sourceforge.pmd.lang.impl.CpdOnlyLanguageModuleBase;
|
import net.sourceforge.pmd.lang.impl.CpdOnlyLanguageModuleBase;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@ -25,7 +25,7 @@ public class CocoLanguageModule extends CpdOnlyLanguageModuleBase {
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||||
return new CocoTokenizer();
|
return new CocoCpdLexer();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -7,13 +7,13 @@ package net.sourceforge.pmd.lang.coco.cpd;
|
|||||||
import org.antlr.v4.runtime.CharStream;
|
import org.antlr.v4.runtime.CharStream;
|
||||||
import org.antlr.v4.runtime.Lexer;
|
import org.antlr.v4.runtime.Lexer;
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.impl.AntlrTokenizer;
|
import net.sourceforge.pmd.cpd.impl.AntlrCpdLexer;
|
||||||
import net.sourceforge.pmd.lang.coco.ast.CocoLexer;
|
import net.sourceforge.pmd.lang.coco.ast.CocoLexer;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The Coco Tokenizer.
|
* The Coco Tokenizer.
|
||||||
*/
|
*/
|
||||||
public class CocoTokenizer extends AntlrTokenizer {
|
public class CocoCpdLexer extends AntlrCpdLexer {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected Lexer getLexerForSource(CharStream charStream) {
|
protected Lexer getLexerForSource(CharStream charStream) {
|
@ -9,8 +9,8 @@ import org.junit.jupiter.api.Test;
|
|||||||
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
|
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
|
||||||
import net.sourceforge.pmd.lang.coco.CocoLanguageModule;
|
import net.sourceforge.pmd.lang.coco.CocoLanguageModule;
|
||||||
|
|
||||||
class CocoTokenizerTest extends CpdTextComparisonTest {
|
class CocoCpdLexerTest extends CpdTextComparisonTest {
|
||||||
CocoTokenizerTest() {
|
CocoCpdLexerTest() {
|
||||||
super(CocoLanguageModule.getInstance(), ".coco");
|
super(CocoLanguageModule.getInstance(), ".coco");
|
||||||
}
|
}
|
||||||
|
|
@ -4,5 +4,5 @@
|
|||||||
|
|
||||||
package net.sourceforge.pmd.cpd;
|
package net.sourceforge.pmd.cpd;
|
||||||
|
|
||||||
public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptTokenizer {
|
public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptCpdLexer implements Tokenizer {
|
||||||
}
|
}
|
||||||
|
@ -4,5 +4,5 @@
|
|||||||
|
|
||||||
package net.sourceforge.pmd.cpd;
|
package net.sourceforge.pmd.cpd;
|
||||||
|
|
||||||
public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JSPTokenizer {
|
public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer implements Tokenizer {
|
||||||
}
|
}
|
||||||
|
@ -9,7 +9,7 @@ import java.util.Properties;
|
|||||||
import net.sourceforge.pmd.lang.java.JavaLanguageModule;
|
import net.sourceforge.pmd.lang.java.JavaLanguageModule;
|
||||||
import net.sourceforge.pmd.lang.java.internal.JavaLanguageProperties;
|
import net.sourceforge.pmd.lang.java.internal.JavaLanguageProperties;
|
||||||
|
|
||||||
public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaTokenizer {
|
public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaCpdLexer implements Tokenizer {
|
||||||
public JavaTokenizer(Properties properties) {
|
public JavaTokenizer(Properties properties) {
|
||||||
super(convertLanguageProperties(properties));
|
super(convertLanguageProperties(properties));
|
||||||
}
|
}
|
||||||
|
@ -0,0 +1,8 @@
|
|||||||
|
/**
|
||||||
|
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
|
||||||
|
*/
|
||||||
|
|
||||||
|
package net.sourceforge.pmd.cpd;
|
||||||
|
|
||||||
|
public interface Tokenizer extends CpdLexer {
|
||||||
|
}
|
@ -20,9 +20,10 @@ import net.sourceforge.pmd.util.StringUtil;
|
|||||||
* Higher-quality lexers should be implemented with a lexer generator.
|
* Higher-quality lexers should be implemented with a lexer generator.
|
||||||
*
|
*
|
||||||
* <p>In PMD 7, this replaces AbstractTokenizer, which provided nearly
|
* <p>In PMD 7, this replaces AbstractTokenizer, which provided nearly
|
||||||
* no more functionality.
|
* no more functionality.</p>
|
||||||
|
* <p>Note: This class has been called AnyTokenizer in PMD 6.</p>
|
||||||
*/
|
*/
|
||||||
public class AnyTokenizer implements Tokenizer {
|
public class AnyCpdLexer implements CpdLexer {
|
||||||
|
|
||||||
private static final Pattern DEFAULT_PATTERN = makePattern("");
|
private static final Pattern DEFAULT_PATTERN = makePattern("");
|
||||||
|
|
||||||
@ -40,15 +41,15 @@ public class AnyTokenizer implements Tokenizer {
|
|||||||
private final Pattern pattern;
|
private final Pattern pattern;
|
||||||
private final String commentStart;
|
private final String commentStart;
|
||||||
|
|
||||||
public AnyTokenizer() {
|
public AnyCpdLexer() {
|
||||||
this(DEFAULT_PATTERN, "");
|
this(DEFAULT_PATTERN, "");
|
||||||
}
|
}
|
||||||
|
|
||||||
public AnyTokenizer(String eolCommentStart) {
|
public AnyCpdLexer(String eolCommentStart) {
|
||||||
this(makePattern(eolCommentStart), eolCommentStart);
|
this(makePattern(eolCommentStart), eolCommentStart);
|
||||||
}
|
}
|
||||||
|
|
||||||
private AnyTokenizer(Pattern pattern, String commentStart) {
|
private AnyCpdLexer(Pattern pattern, String commentStart) {
|
||||||
this.pattern = pattern;
|
this.pattern = pattern;
|
||||||
this.commentStart = commentStart;
|
this.commentStart = commentStart;
|
||||||
}
|
}
|
@ -137,10 +137,10 @@ public final class CpdAnalysis implements AutoCloseable {
|
|||||||
this.listener = cpdListener;
|
this.listener = cpdListener;
|
||||||
}
|
}
|
||||||
|
|
||||||
private int doTokenize(TextDocument document, Tokenizer tokenizer, Tokens tokens) throws IOException, LexException {
|
private int doTokenize(TextDocument document, CpdLexer cpdLexer, Tokens tokens) throws IOException, LexException {
|
||||||
LOGGER.trace("Tokenizing {}", document.getFileId().getAbsolutePath());
|
LOGGER.trace("Tokenizing {}", document.getFileId().getAbsolutePath());
|
||||||
int lastTokenSize = tokens.size();
|
int lastTokenSize = tokens.size();
|
||||||
Tokenizer.tokenize(tokenizer, document, tokens);
|
CpdLexer.tokenize(cpdLexer, document, tokens);
|
||||||
return tokens.size() - lastTokenSize - 1; /* EOF */
|
return tokens.size() - lastTokenSize - 1; /* EOF */
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -152,12 +152,12 @@ public final class CpdAnalysis implements AutoCloseable {
|
|||||||
public void performAnalysis(Consumer<CPDReport> consumer) {
|
public void performAnalysis(Consumer<CPDReport> consumer) {
|
||||||
|
|
||||||
try (SourceManager sourceManager = new SourceManager(files.getCollectedFiles())) {
|
try (SourceManager sourceManager = new SourceManager(files.getCollectedFiles())) {
|
||||||
Map<Language, Tokenizer> tokenizers =
|
Map<Language, CpdLexer> tokenizers =
|
||||||
sourceManager.getTextFiles().stream()
|
sourceManager.getTextFiles().stream()
|
||||||
.map(it -> it.getLanguageVersion().getLanguage())
|
.map(it -> it.getLanguageVersion().getLanguage())
|
||||||
.distinct()
|
.distinct()
|
||||||
.filter(it -> it instanceof CpdCapableLanguage)
|
.filter(it -> it instanceof CpdCapableLanguage)
|
||||||
.collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdTokenizer(configuration.getLanguageProperties(lang))));
|
.collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdLexer(configuration.getLanguageProperties(lang))));
|
||||||
|
|
||||||
Map<FileId, Integer> numberOfTokensPerFile = new HashMap<>();
|
Map<FileId, Integer> numberOfTokensPerFile = new HashMap<>();
|
||||||
|
|
||||||
|
@ -16,7 +16,7 @@ public interface CpdCapableLanguage extends Language {
|
|||||||
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Create a new {@link Tokenizer} for this language, given
|
* Create a new {@link CpdLexer} for this language, given
|
||||||
* a property bundle with configuration. The bundle was created by
|
* a property bundle with configuration. The bundle was created by
|
||||||
* this instance using {@link #newPropertyBundle()}. It can be assumed
|
* this instance using {@link #newPropertyBundle()}. It can be assumed
|
||||||
* that the bundle will never be mutated anymore, and this method
|
* that the bundle will never be mutated anymore, and this method
|
||||||
@ -26,7 +26,7 @@ public interface CpdCapableLanguage extends Language {
|
|||||||
*
|
*
|
||||||
* @return A new language processor
|
* @return A new language processor
|
||||||
*/
|
*/
|
||||||
default Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
default CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||||
return new AnyTokenizer();
|
return new AnyCpdLexer();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -10,8 +10,10 @@ import net.sourceforge.pmd.lang.document.TextDocument;
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Tokenizes a source file into tokens consumable by CPD.
|
* Tokenizes a source file into tokens consumable by CPD.
|
||||||
|
*
|
||||||
|
* <p>Note: This interface has been called Tokenizer in PMD 6.</p>
|
||||||
*/
|
*/
|
||||||
public interface Tokenizer {
|
public interface CpdLexer {
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Tokenize the source code and record tokens using the provided token factory.
|
* Tokenize the source code and record tokens using the provided token factory.
|
||||||
@ -22,9 +24,9 @@ public interface Tokenizer {
|
|||||||
* Wraps a call to {@link #tokenize(TextDocument, TokenFactory)} to properly
|
* Wraps a call to {@link #tokenize(TextDocument, TokenFactory)} to properly
|
||||||
* create and close the token factory.
|
* create and close the token factory.
|
||||||
*/
|
*/
|
||||||
static void tokenize(Tokenizer tokenizer, TextDocument textDocument, Tokens tokens) throws IOException {
|
static void tokenize(CpdLexer cpdLexer, TextDocument textDocument, Tokens tokens) throws IOException {
|
||||||
try (TokenFactory tf = Tokens.factoryForFile(textDocument, tokens)) {
|
try (TokenFactory tf = Tokens.factoryForFile(textDocument, tokens)) {
|
||||||
tokenizer.tokenize(textDocument, tf);
|
cpdLexer.tokenize(textDocument, tf);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
@ -142,8 +142,8 @@ public class GUI implements CPDListener {
|
|||||||
.extensions(extension)
|
.extensions(extension)
|
||||||
.name("By extension...")) {
|
.name("By extension...")) {
|
||||||
@Override
|
@Override
|
||||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||||
return new AnyTokenizer();
|
return new AnyCpdLexer();
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
@ -12,7 +12,7 @@ import net.sourceforge.pmd.lang.document.FileLocation;
|
|||||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Proxy to record tokens from within {@link Tokenizer#tokenize(TextDocument, TokenFactory)}.
|
* Proxy to record tokens from within {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
|
||||||
*/
|
*/
|
||||||
public interface TokenFactory extends AutoCloseable {
|
public interface TokenFactory extends AutoCloseable {
|
||||||
|
|
||||||
@ -57,7 +57,7 @@ public interface TokenFactory extends AutoCloseable {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* This adds the EOF token, it must be called when
|
* This adds the EOF token, it must be called when
|
||||||
* {@link Tokenizer#tokenize(TextDocument, TokenFactory)} is done.
|
* {@link CpdLexer#tokenize(TextDocument, TokenFactory)} is done.
|
||||||
*/
|
*/
|
||||||
@Override
|
@Override
|
||||||
void close();
|
void close();
|
||||||
|
@ -93,7 +93,7 @@ public class Tokens {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Creates a token factory to process the given file with
|
* Creates a token factory to process the given file with
|
||||||
* {@link Tokenizer#tokenize(TextDocument, TokenFactory)}.
|
* {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
|
||||||
* Tokens are accumulated in the {@link Tokens} parameter.
|
* Tokens are accumulated in the {@link Tokens} parameter.
|
||||||
*
|
*
|
||||||
* @param file Document for the file to process
|
* @param file Document for the file to process
|
||||||
|
@ -10,16 +10,16 @@ import org.antlr.v4.runtime.CharStream;
|
|||||||
import org.antlr.v4.runtime.CharStreams;
|
import org.antlr.v4.runtime.CharStreams;
|
||||||
import org.antlr.v4.runtime.Lexer;
|
import org.antlr.v4.runtime.Lexer;
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.lang.TokenManager;
|
import net.sourceforge.pmd.lang.TokenManager;
|
||||||
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrToken;
|
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrToken;
|
||||||
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrTokenManager;
|
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrTokenManager;
|
||||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generic implementation of a {@link Tokenizer} useful to any Antlr grammar.
|
* Generic implementation of a {@link CpdLexer} useful to any Antlr grammar.
|
||||||
*/
|
*/
|
||||||
public abstract class AntlrTokenizer extends TokenizerBase<AntlrToken> {
|
public abstract class AntlrCpdLexer extends CpdLexerBase<AntlrToken> {
|
||||||
@Override
|
@Override
|
||||||
protected final TokenManager<AntlrToken> makeLexerImpl(TextDocument doc) throws IOException {
|
protected final TokenManager<AntlrToken> makeLexerImpl(TextDocument doc) throws IOException {
|
||||||
CharStream charStream = CharStreams.fromReader(doc.newReader(), doc.getFileId().getAbsolutePath());
|
CharStream charStream = CharStreams.fromReader(doc.newReader(), doc.getFileId().getAbsolutePath());
|
@ -6,16 +6,16 @@ package net.sourceforge.pmd.cpd.impl;
|
|||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.cpd.TokenFactory;
|
import net.sourceforge.pmd.cpd.TokenFactory;
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
|
||||||
import net.sourceforge.pmd.lang.TokenManager;
|
import net.sourceforge.pmd.lang.TokenManager;
|
||||||
import net.sourceforge.pmd.lang.ast.GenericToken;
|
import net.sourceforge.pmd.lang.ast.GenericToken;
|
||||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generic base class for a {@link Tokenizer}.
|
* Generic base class for a {@link CpdLexer}.
|
||||||
*/
|
*/
|
||||||
public abstract class TokenizerBase<T extends GenericToken<T>> implements Tokenizer {
|
public abstract class CpdLexerBase<T extends GenericToken<T>> implements CpdLexer {
|
||||||
|
|
||||||
protected abstract TokenManager<T> makeLexerImpl(TextDocument doc) throws IOException;
|
protected abstract TokenManager<T> makeLexerImpl(TextDocument doc) throws IOException;
|
||||||
|
|
@ -1,15 +0,0 @@
|
|||||||
/**
|
|
||||||
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
|
|
||||||
*/
|
|
||||||
|
|
||||||
package net.sourceforge.pmd.cpd.impl;
|
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
|
||||||
import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Base class for a {@link Tokenizer} for a language implemented by a JavaCC tokenizer.
|
|
||||||
*/
|
|
||||||
public abstract class JavaCCTokenizer extends TokenizerBase<JavaccToken> {
|
|
||||||
|
|
||||||
}
|
|
@ -0,0 +1,15 @@
|
|||||||
|
/**
|
||||||
|
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
|
||||||
|
*/
|
||||||
|
|
||||||
|
package net.sourceforge.pmd.cpd.impl;
|
||||||
|
|
||||||
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
|
import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Base class for a {@link CpdLexer} for a language implemented by a JavaCC tokenizer.
|
||||||
|
*/
|
||||||
|
public abstract class JavaccCpdLexer extends CpdLexerBase<JavaccToken> {
|
||||||
|
|
||||||
|
}
|
@ -3,6 +3,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.Tokenizer}.
|
* Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.CpdLexer}.
|
||||||
*/
|
*/
|
||||||
package net.sourceforge.pmd.cpd.impl;
|
package net.sourceforge.pmd.cpd.impl;
|
||||||
|
@ -6,6 +6,6 @@
|
|||||||
* Token-based copy-paste detection.
|
* Token-based copy-paste detection.
|
||||||
*
|
*
|
||||||
* @see net.sourceforge.pmd.cpd.CpdAnalysis
|
* @see net.sourceforge.pmd.cpd.CpdAnalysis
|
||||||
* @see net.sourceforge.pmd.cpd.Tokenizer
|
* @see net.sourceforge.pmd.cpd.CpdLexer
|
||||||
*/
|
*/
|
||||||
package net.sourceforge.pmd.cpd;
|
package net.sourceforge.pmd.cpd;
|
||||||
|
@ -5,9 +5,9 @@
|
|||||||
package net.sourceforge.pmd.lang;
|
package net.sourceforge.pmd.lang;
|
||||||
|
|
||||||
import net.sourceforge.pmd.annotation.Experimental;
|
import net.sourceforge.pmd.annotation.Experimental;
|
||||||
import net.sourceforge.pmd.cpd.AnyTokenizer;
|
import net.sourceforge.pmd.cpd.AnyCpdLexer;
|
||||||
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.lang.ast.AstInfo;
|
import net.sourceforge.pmd.lang.ast.AstInfo;
|
||||||
import net.sourceforge.pmd.lang.ast.Parser;
|
import net.sourceforge.pmd.lang.ast.Parser;
|
||||||
import net.sourceforge.pmd.lang.ast.Parser.ParserTask;
|
import net.sourceforge.pmd.lang.ast.Parser.ParserTask;
|
||||||
@ -47,8 +47,8 @@ public final class PlainTextLanguage extends SimpleLanguageModuleBase implements
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||||
return new AnyTokenizer();
|
return new AnyCpdLexer();
|
||||||
}
|
}
|
||||||
|
|
||||||
private static final class TextLvh implements LanguageVersionHandler {
|
private static final class TextLvh implements LanguageVersionHandler {
|
||||||
|
@ -10,7 +10,7 @@ import java.util.List;
|
|||||||
import org.checkerframework.checker.nullness.qual.NonNull;
|
import org.checkerframework.checker.nullness.qual.NonNull;
|
||||||
import org.checkerframework.checker.nullness.qual.Nullable;
|
import org.checkerframework.checker.nullness.qual.Nullable;
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.impl.JavaCCTokenizer;
|
import net.sourceforge.pmd.cpd.impl.JavaccCpdLexer;
|
||||||
import net.sourceforge.pmd.lang.ast.impl.TokenDocument;
|
import net.sourceforge.pmd.lang.ast.impl.TokenDocument;
|
||||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||||
|
|
||||||
@ -18,7 +18,7 @@ import net.sourceforge.pmd.lang.document.TextDocument;
|
|||||||
* Token document for Javacc implementations. This is a helper object
|
* Token document for Javacc implementations. This is a helper object
|
||||||
* for generated token managers. Note: the extension point is a custom
|
* for generated token managers. Note: the extension point is a custom
|
||||||
* implementation of {@link TokenDocumentBehavior}, see {@link JjtreeParserAdapter#tokenBehavior()},
|
* implementation of {@link TokenDocumentBehavior}, see {@link JjtreeParserAdapter#tokenBehavior()},
|
||||||
* {@link JavaCCTokenizer#tokenBehavior()}
|
* {@link JavaccCpdLexer#tokenBehavior()}
|
||||||
*/
|
*/
|
||||||
public final class JavaccTokenDocument extends TokenDocument<JavaccToken> {
|
public final class JavaccTokenDocument extends TokenDocument<JavaccToken> {
|
||||||
|
|
||||||
|
@ -5,7 +5,7 @@
|
|||||||
package net.sourceforge.pmd.lang.impl;
|
package net.sourceforge.pmd.lang.impl;
|
||||||
|
|
||||||
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
||||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||||
import net.sourceforge.pmd.lang.LanguageModuleBase;
|
import net.sourceforge.pmd.lang.LanguageModuleBase;
|
||||||
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
||||||
|
|
||||||
@ -27,5 +27,5 @@ public abstract class CpdOnlyLanguageModuleBase extends LanguageModuleBase imple
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public abstract Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle);
|
public abstract CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
|
||||||
}
|
}
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user