[core] Rename Tokenizer to CpdLexer

See #4065
This commit is contained in:
Andreas Dangel
2024-01-11 17:04:48 +01:00
parent 55d91791c3
commit 6163f67b06
119 changed files with 423 additions and 352 deletions

View File

@ -117,15 +117,15 @@ definitely don't come for free. It is much effort and requires perseverance to i
## 5. Create a TokenManager ## 5. Create a TokenManager
* This is needed to support CPD (copy paste detection) * This is needed to support CPD (copy paste detection)
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java). * We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrTokenManager.java).
* You must create your own "AntlrTokenizer" such as we do with * You must create your own "AntlrCpdLexer" such as we do with
[`SwiftTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftTokenizer.java). [`SwiftCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftCpdLexer.java).
* If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON") * If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON")
you can create your own implementation of you can create your own implementation of
[`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java). [`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java).
You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)` You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)`
and return your custom filter. See the tokenizer for C# as an exmaple: and return your custom filter. See the CpdLexer for C# as an exmaple:
[`CsTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsTokenizer.java). [`CsCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsCpdLexer.java).
If you don't need a custom token filter, you don't need to override the method. It returns the default If you don't need a custom token filter, you don't need to override the method. It returns the default
`AntlrTokenFilter` which doesn't filter anything. `AntlrTokenFilter` which doesn't filter anything.

View File

@ -11,7 +11,7 @@ author: Matías Fraga, Clément Fournier
## Adding support for a CPD language ## Adding support for a CPD language
CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}. CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}.
To add support for a new language, the crucial piece is writing a tokenizer that To add support for a new language, the crucial piece is writing a CpdLexer that
splits the source file into the tokens specific to your language. Thankfully you splits the source file into the tokens specific to your language. Thankfully you
can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC
grammar to generate a lexer for you. If you cannot use a lexer generator, for grammar to generate a lexer for you. If you cannot use a lexer generator, for
@ -31,12 +31,12 @@ Use the following guide to set up a new language module that supports CPD.
the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)). the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)).
Once that is done, `mvn generate-sources` should generate the lexer sources for you. Once that is done, `mvn generate-sources` should generate the lexer sources for you.
You can now implement a tokenizer, for instance by extending {% jdoc core::cpd.impl.AntlrTokenizer %}. The following reproduces the Go implementation: You can now implement a CpdLexer, for instance by extending {% jdoc core::cpd.impl.AntlrCpdLexer %}. The following reproduces the Go implementation:
```java ```java
// mind the package convention if you are going to make a PR // mind the package convention if you are going to make a PR
package net.sourceforge.pmd.lang.go.cpd; package net.sourceforge.pmd.lang.go.cpd;
public class GoTokenizer extends AntlrTokenizer { public class GoCpdLexer extends AntlrCpdLexer {
@Override @Override
protected Lexer getLexerForSource(CharStream charStream) { protected Lexer getLexerForSource(CharStream charStream) {
@ -64,9 +64,9 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
} }
@Override @Override
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) { public Tokenizer createCpdLexer(LanguagePropertyBundle bundle) {
// This method should return an instance of the tokenizer you created. // This method should return an instance of the CpdLexer you created.
return new GoTokenizer(); return new GoCpdLexer();
} }
} }
``` ```
@ -77,7 +77,7 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java). 4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java).
5. Add some tests for your tokenizer by following the [section below](#testing-your-implementation). 5. Add some tests for your CpdLexer by following the [section below](#testing-your-implementation).
6. Finishing up your new language module by adding a page in the documentation. Create a new markdown file 6. Finishing up your new language module by adding a page in the documentation. Create a new markdown file
`<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter: `<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:
@ -100,10 +100,10 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
{% endraw %} {% endraw %}
``` ```
### Declaring tokenizer options ### Declaring CpdLexer options
To make the tokenizer configurable, first define some property descriptors using To make the CpdLexer configurable, first define some property descriptors using
{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.Tokenizer %} {% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.CpdLexer %}
for some predefined ones which you can reuse (prefer reusing property descriptors if you can). for some predefined ones which you can reuse (prefer reusing property descriptors if you can).
You need to override {% jdoc core::Language#newPropertyBundle() %} You need to override {% jdoc core::Language#newPropertyBundle() %}
and call `definePropertyDescriptor` to register the descriptors. and call `definePropertyDescriptor` to register the descriptors.
@ -112,13 +112,13 @@ of {% jdoc core::cpd.CpdCapableLanguage#createCpdTokenizer(core::lang.LanguagePr
To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %} To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %}
as a base class, or another base class in {% jdoc_package core::cpd.impl %}. as a base class, or another base class in {% jdoc_package core::cpd.impl %}.
Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinTokenizer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaTokenizer.java). Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinCpdLexer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaCpdLexer.java).
### Testing your implementation ### Testing your implementation
Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`. Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`.
This contains utilities to test your tokenizer. This contains utilities to test your CpdLexer.
Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}. Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}.
To add tests, you need to write regular JUnit `@Test`-annotated methods, and To add tests, you need to write regular JUnit `@Test`-annotated methods, and

View File

@ -24,7 +24,7 @@ PropertyName is the name of the property converted to SCREAMING_SNAKE_CASE, that
As a convention, properties whose name start with an *x* are internal and may be removed or changed without notice. As a convention, properties whose name start with an *x* are internal and may be removed or changed without notice.
Properties whose name start with **CPD** are used to configure CPD tokenizer options. Properties whose name start with **CPD** are used to configure CPD CpdLexer options.
Programmatically, the language properties can be set on `PMDConfiguration` (or `CPDConfiguration`) before using the Programmatically, the language properties can be set on `PMDConfiguration` (or `CPDConfiguration`) before using the
{%jdoc core::PmdAnalyzer %} (or {%jdoc core::cpd.CpdAnalyzer %}) instance {%jdoc core::PmdAnalyzer %} (or {%jdoc core::cpd.CpdAnalyzer %}) instance

View File

@ -159,10 +159,14 @@ The following previously deprecated classes have been removed:
If the current version is needed, then `Node.getTextDocument().getLanguageVersion()` can be used. This If the current version is needed, then `Node.getTextDocument().getLanguageVersion()` can be used. This
is the version that has been selected via CLI `--use-version` parameter. is the version that has been selected via CLI `--use-version` parameter.
**Renamed classes** **Renamed classes and methods **
* pmd-core * pmd-core
* {%jdoc_old core::lang.ast.TokenMgrError %} has been renamed to {% jdoc core::lang.ast.LexException %} * {%jdoc_old core::lang.ast.TokenMgrError %} has been renamed to {% jdoc core::lang.ast.LexException %}
* {%jdoc_old core::cpd.Tokenizer %} has been renamed to {% jdoc core::cpd.CpdLexer %}. Along with this rename,
all the implementations have been renamed as well (`Tokenizer` -> `CpdLexer`), e.g. "CppCpdLexer", "JavaCpdLexer".
This affects all language modules.
* {%jdoc_old core::cpd.AnyTokenizer %} has been renamed to {% jdoc core::cpd.AnyCpdLexer %}.
#### External Contributions #### External Contributions
* [#4640](https://github.com/pmd/pmd/pull/4640): \[cli] Launch script fails if run via "bash pmd" - [Shai Bennathan](https://github.com/shai-bennathan) (@shai-bennathan) * [#4640](https://github.com/pmd/pmd/pull/4640): \[cli] Launch script fails if run via "bash pmd" - [Shai Bennathan](https://github.com/shai-bennathan) (@shai-bennathan)

View File

@ -5,12 +5,12 @@
package net.sourceforge.pmd.lang.apex; package net.sourceforge.pmd.lang.apex;
import net.sourceforge.pmd.cpd.CpdCapableLanguage; import net.sourceforge.pmd.cpd.CpdCapableLanguage;
import net.sourceforge.pmd.cpd.Tokenizer; import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguageModuleBase; import net.sourceforge.pmd.lang.LanguageModuleBase;
import net.sourceforge.pmd.lang.LanguageProcessor; import net.sourceforge.pmd.lang.LanguageProcessor;
import net.sourceforge.pmd.lang.LanguagePropertyBundle; import net.sourceforge.pmd.lang.LanguagePropertyBundle;
import net.sourceforge.pmd.lang.PmdCapableLanguage; import net.sourceforge.pmd.lang.PmdCapableLanguage;
import net.sourceforge.pmd.lang.apex.cpd.ApexTokenizer; import net.sourceforge.pmd.lang.apex.cpd.ApexCpdLexer;
public class ApexLanguageModule extends LanguageModuleBase implements PmdCapableLanguage, CpdCapableLanguage { public class ApexLanguageModule extends LanguageModuleBase implements PmdCapableLanguage, CpdCapableLanguage {
private static final String ID = "apex"; private static final String ID = "apex";
@ -47,7 +47,7 @@ public class ApexLanguageModule extends LanguageModuleBase implements PmdCapable
} }
@Override @Override
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) { public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
return new ApexTokenizer(); return new ApexCpdLexer();
} }
} }

View File

@ -12,16 +12,16 @@ import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.Lexer; import org.antlr.runtime.Lexer;
import org.antlr.runtime.Token; import org.antlr.runtime.Token;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.cpd.TokenFactory; import net.sourceforge.pmd.cpd.TokenFactory;
import net.sourceforge.pmd.cpd.Tokenizer;
import net.sourceforge.pmd.lang.apex.ApexJorjeLogging; import net.sourceforge.pmd.lang.apex.ApexJorjeLogging;
import net.sourceforge.pmd.lang.document.TextDocument; import net.sourceforge.pmd.lang.document.TextDocument;
import apex.jorje.parser.impl.ApexLexer; import apex.jorje.parser.impl.ApexLexer;
public class ApexTokenizer implements Tokenizer { public class ApexCpdLexer implements CpdLexer {
public ApexTokenizer() { public ApexCpdLexer() {
ApexJorjeLogging.disableLogging(); ApexJorjeLogging.disableLogging();
} }

View File

@ -9,9 +9,9 @@ import org.junit.jupiter.api.Test;
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest; import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
import net.sourceforge.pmd.lang.apex.ApexLanguageModule; import net.sourceforge.pmd.lang.apex.ApexLanguageModule;
class ApexTokenizerTest extends CpdTextComparisonTest { class ApexCpdLexerTest extends CpdTextComparisonTest {
ApexTokenizerTest() { ApexCpdLexerTest() {
super(ApexLanguageModule.getInstance(), ".cls"); super(ApexLanguageModule.getInstance(), ".cls");
} }

View File

@ -4,10 +4,10 @@
package net.sourceforge.pmd.lang.coco; package net.sourceforge.pmd.lang.coco;
import net.sourceforge.pmd.cpd.Tokenizer; import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguagePropertyBundle; import net.sourceforge.pmd.lang.LanguagePropertyBundle;
import net.sourceforge.pmd.lang.LanguageRegistry; import net.sourceforge.pmd.lang.LanguageRegistry;
import net.sourceforge.pmd.lang.coco.cpd.CocoTokenizer; import net.sourceforge.pmd.lang.coco.cpd.CocoCpdLexer;
import net.sourceforge.pmd.lang.impl.CpdOnlyLanguageModuleBase; import net.sourceforge.pmd.lang.impl.CpdOnlyLanguageModuleBase;
/** /**
@ -25,7 +25,7 @@ public class CocoLanguageModule extends CpdOnlyLanguageModuleBase {
} }
@Override @Override
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) { public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
return new CocoTokenizer(); return new CocoCpdLexer();
} }
} }

View File

@ -7,13 +7,13 @@ package net.sourceforge.pmd.lang.coco.cpd;
import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.Lexer; import org.antlr.v4.runtime.Lexer;
import net.sourceforge.pmd.cpd.impl.AntlrTokenizer; import net.sourceforge.pmd.cpd.impl.AntlrCpdLexer;
import net.sourceforge.pmd.lang.coco.ast.CocoLexer; import net.sourceforge.pmd.lang.coco.ast.CocoLexer;
/** /**
* The Coco Tokenizer. * The Coco Tokenizer.
*/ */
public class CocoTokenizer extends AntlrTokenizer { public class CocoCpdLexer extends AntlrCpdLexer {
@Override @Override
protected Lexer getLexerForSource(CharStream charStream) { protected Lexer getLexerForSource(CharStream charStream) {

View File

@ -9,8 +9,8 @@ import org.junit.jupiter.api.Test;
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest; import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
import net.sourceforge.pmd.lang.coco.CocoLanguageModule; import net.sourceforge.pmd.lang.coco.CocoLanguageModule;
class CocoTokenizerTest extends CpdTextComparisonTest { class CocoCpdLexerTest extends CpdTextComparisonTest {
CocoTokenizerTest() { CocoCpdLexerTest() {
super(CocoLanguageModule.getInstance(), ".coco"); super(CocoLanguageModule.getInstance(), ".coco");
} }

View File

@ -4,5 +4,5 @@
package net.sourceforge.pmd.cpd; package net.sourceforge.pmd.cpd;
public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptTokenizer { public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptCpdLexer implements Tokenizer {
} }

View File

@ -4,5 +4,5 @@
package net.sourceforge.pmd.cpd; package net.sourceforge.pmd.cpd;
public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JSPTokenizer { public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer implements Tokenizer {
} }

View File

@ -9,7 +9,7 @@ import java.util.Properties;
import net.sourceforge.pmd.lang.java.JavaLanguageModule; import net.sourceforge.pmd.lang.java.JavaLanguageModule;
import net.sourceforge.pmd.lang.java.internal.JavaLanguageProperties; import net.sourceforge.pmd.lang.java.internal.JavaLanguageProperties;
public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaTokenizer { public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaCpdLexer implements Tokenizer {
public JavaTokenizer(Properties properties) { public JavaTokenizer(Properties properties) {
super(convertLanguageProperties(properties)); super(convertLanguageProperties(properties));
} }

View File

@ -0,0 +1,8 @@
/**
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
*/
package net.sourceforge.pmd.cpd;
public interface Tokenizer extends CpdLexer {
}

View File

@ -20,9 +20,10 @@ import net.sourceforge.pmd.util.StringUtil;
* Higher-quality lexers should be implemented with a lexer generator. * Higher-quality lexers should be implemented with a lexer generator.
* *
* <p>In PMD 7, this replaces AbstractTokenizer, which provided nearly * <p>In PMD 7, this replaces AbstractTokenizer, which provided nearly
* no more functionality. * no more functionality.</p>
* <p>Note: This class has been called AnyTokenizer in PMD 6.</p>
*/ */
public class AnyTokenizer implements Tokenizer { public class AnyCpdLexer implements CpdLexer {
private static final Pattern DEFAULT_PATTERN = makePattern(""); private static final Pattern DEFAULT_PATTERN = makePattern("");
@ -40,15 +41,15 @@ public class AnyTokenizer implements Tokenizer {
private final Pattern pattern; private final Pattern pattern;
private final String commentStart; private final String commentStart;
public AnyTokenizer() { public AnyCpdLexer() {
this(DEFAULT_PATTERN, ""); this(DEFAULT_PATTERN, "");
} }
public AnyTokenizer(String eolCommentStart) { public AnyCpdLexer(String eolCommentStart) {
this(makePattern(eolCommentStart), eolCommentStart); this(makePattern(eolCommentStart), eolCommentStart);
} }
private AnyTokenizer(Pattern pattern, String commentStart) { private AnyCpdLexer(Pattern pattern, String commentStart) {
this.pattern = pattern; this.pattern = pattern;
this.commentStart = commentStart; this.commentStart = commentStart;
} }

View File

@ -137,10 +137,10 @@ public final class CpdAnalysis implements AutoCloseable {
this.listener = cpdListener; this.listener = cpdListener;
} }
private int doTokenize(TextDocument document, Tokenizer tokenizer, Tokens tokens) throws IOException, LexException { private int doTokenize(TextDocument document, CpdLexer cpdLexer, Tokens tokens) throws IOException, LexException {
LOGGER.trace("Tokenizing {}", document.getFileId().getAbsolutePath()); LOGGER.trace("Tokenizing {}", document.getFileId().getAbsolutePath());
int lastTokenSize = tokens.size(); int lastTokenSize = tokens.size();
Tokenizer.tokenize(tokenizer, document, tokens); CpdLexer.tokenize(cpdLexer, document, tokens);
return tokens.size() - lastTokenSize - 1; /* EOF */ return tokens.size() - lastTokenSize - 1; /* EOF */
} }
@ -152,12 +152,12 @@ public final class CpdAnalysis implements AutoCloseable {
public void performAnalysis(Consumer<CPDReport> consumer) { public void performAnalysis(Consumer<CPDReport> consumer) {
try (SourceManager sourceManager = new SourceManager(files.getCollectedFiles())) { try (SourceManager sourceManager = new SourceManager(files.getCollectedFiles())) {
Map<Language, Tokenizer> tokenizers = Map<Language, CpdLexer> tokenizers =
sourceManager.getTextFiles().stream() sourceManager.getTextFiles().stream()
.map(it -> it.getLanguageVersion().getLanguage()) .map(it -> it.getLanguageVersion().getLanguage())
.distinct() .distinct()
.filter(it -> it instanceof CpdCapableLanguage) .filter(it -> it instanceof CpdCapableLanguage)
.collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdTokenizer(configuration.getLanguageProperties(lang)))); .collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdLexer(configuration.getLanguageProperties(lang))));
Map<FileId, Integer> numberOfTokensPerFile = new HashMap<>(); Map<FileId, Integer> numberOfTokensPerFile = new HashMap<>();

View File

@ -16,7 +16,7 @@ public interface CpdCapableLanguage extends Language {
/** /**
* Create a new {@link Tokenizer} for this language, given * Create a new {@link CpdLexer} for this language, given
* a property bundle with configuration. The bundle was created by * a property bundle with configuration. The bundle was created by
* this instance using {@link #newPropertyBundle()}. It can be assumed * this instance using {@link #newPropertyBundle()}. It can be assumed
* that the bundle will never be mutated anymore, and this method * that the bundle will never be mutated anymore, and this method
@ -26,7 +26,7 @@ public interface CpdCapableLanguage extends Language {
* *
* @return A new language processor * @return A new language processor
*/ */
default Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) { default CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
return new AnyTokenizer(); return new AnyCpdLexer();
} }
} }

View File

@ -10,8 +10,10 @@ import net.sourceforge.pmd.lang.document.TextDocument;
/** /**
* Tokenizes a source file into tokens consumable by CPD. * Tokenizes a source file into tokens consumable by CPD.
*
* <p>Note: This interface has been called Tokenizer in PMD 6.</p>
*/ */
public interface Tokenizer { public interface CpdLexer {
/** /**
* Tokenize the source code and record tokens using the provided token factory. * Tokenize the source code and record tokens using the provided token factory.
@ -22,9 +24,9 @@ public interface Tokenizer {
* Wraps a call to {@link #tokenize(TextDocument, TokenFactory)} to properly * Wraps a call to {@link #tokenize(TextDocument, TokenFactory)} to properly
* create and close the token factory. * create and close the token factory.
*/ */
static void tokenize(Tokenizer tokenizer, TextDocument textDocument, Tokens tokens) throws IOException { static void tokenize(CpdLexer cpdLexer, TextDocument textDocument, Tokens tokens) throws IOException {
try (TokenFactory tf = Tokens.factoryForFile(textDocument, tokens)) { try (TokenFactory tf = Tokens.factoryForFile(textDocument, tokens)) {
tokenizer.tokenize(textDocument, tf); cpdLexer.tokenize(textDocument, tf);
} }
} }
} }

View File

@ -142,8 +142,8 @@ public class GUI implements CPDListener {
.extensions(extension) .extensions(extension)
.name("By extension...")) { .name("By extension...")) {
@Override @Override
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) { public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
return new AnyTokenizer(); return new AnyCpdLexer();
} }
}; };
} }

View File

@ -12,7 +12,7 @@ import net.sourceforge.pmd.lang.document.FileLocation;
import net.sourceforge.pmd.lang.document.TextDocument; import net.sourceforge.pmd.lang.document.TextDocument;
/** /**
* Proxy to record tokens from within {@link Tokenizer#tokenize(TextDocument, TokenFactory)}. * Proxy to record tokens from within {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
*/ */
public interface TokenFactory extends AutoCloseable { public interface TokenFactory extends AutoCloseable {
@ -57,7 +57,7 @@ public interface TokenFactory extends AutoCloseable {
/** /**
* This adds the EOF token, it must be called when * This adds the EOF token, it must be called when
* {@link Tokenizer#tokenize(TextDocument, TokenFactory)} is done. * {@link CpdLexer#tokenize(TextDocument, TokenFactory)} is done.
*/ */
@Override @Override
void close(); void close();

View File

@ -93,7 +93,7 @@ public class Tokens {
/** /**
* Creates a token factory to process the given file with * Creates a token factory to process the given file with
* {@link Tokenizer#tokenize(TextDocument, TokenFactory)}. * {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
* Tokens are accumulated in the {@link Tokens} parameter. * Tokens are accumulated in the {@link Tokens} parameter.
* *
* @param file Document for the file to process * @param file Document for the file to process

View File

@ -10,16 +10,16 @@ import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams; import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.Lexer; import org.antlr.v4.runtime.Lexer;
import net.sourceforge.pmd.cpd.Tokenizer; import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.TokenManager; import net.sourceforge.pmd.lang.TokenManager;
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrToken; import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrToken;
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrTokenManager; import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrTokenManager;
import net.sourceforge.pmd.lang.document.TextDocument; import net.sourceforge.pmd.lang.document.TextDocument;
/** /**
* Generic implementation of a {@link Tokenizer} useful to any Antlr grammar. * Generic implementation of a {@link CpdLexer} useful to any Antlr grammar.
*/ */
public abstract class AntlrTokenizer extends TokenizerBase<AntlrToken> { public abstract class AntlrCpdLexer extends CpdLexerBase<AntlrToken> {
@Override @Override
protected final TokenManager<AntlrToken> makeLexerImpl(TextDocument doc) throws IOException { protected final TokenManager<AntlrToken> makeLexerImpl(TextDocument doc) throws IOException {
CharStream charStream = CharStreams.fromReader(doc.newReader(), doc.getFileId().getAbsolutePath()); CharStream charStream = CharStreams.fromReader(doc.newReader(), doc.getFileId().getAbsolutePath());

View File

@ -6,16 +6,16 @@ package net.sourceforge.pmd.cpd.impl;
import java.io.IOException; import java.io.IOException;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.cpd.TokenFactory; import net.sourceforge.pmd.cpd.TokenFactory;
import net.sourceforge.pmd.cpd.Tokenizer;
import net.sourceforge.pmd.lang.TokenManager; import net.sourceforge.pmd.lang.TokenManager;
import net.sourceforge.pmd.lang.ast.GenericToken; import net.sourceforge.pmd.lang.ast.GenericToken;
import net.sourceforge.pmd.lang.document.TextDocument; import net.sourceforge.pmd.lang.document.TextDocument;
/** /**
* Generic base class for a {@link Tokenizer}. * Generic base class for a {@link CpdLexer}.
*/ */
public abstract class TokenizerBase<T extends GenericToken<T>> implements Tokenizer { public abstract class CpdLexerBase<T extends GenericToken<T>> implements CpdLexer {
protected abstract TokenManager<T> makeLexerImpl(TextDocument doc) throws IOException; protected abstract TokenManager<T> makeLexerImpl(TextDocument doc) throws IOException;

View File

@ -1,15 +0,0 @@
/**
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
*/
package net.sourceforge.pmd.cpd.impl;
import net.sourceforge.pmd.cpd.Tokenizer;
import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
/**
* Base class for a {@link Tokenizer} for a language implemented by a JavaCC tokenizer.
*/
public abstract class JavaCCTokenizer extends TokenizerBase<JavaccToken> {
}

View File

@ -0,0 +1,15 @@
/**
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
*/
package net.sourceforge.pmd.cpd.impl;
import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
/**
* Base class for a {@link CpdLexer} for a language implemented by a JavaCC tokenizer.
*/
public abstract class JavaccCpdLexer extends CpdLexerBase<JavaccToken> {
}

View File

@ -3,6 +3,6 @@
*/ */
/** /**
* Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.Tokenizer}. * Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.CpdLexer}.
*/ */
package net.sourceforge.pmd.cpd.impl; package net.sourceforge.pmd.cpd.impl;

View File

@ -6,6 +6,6 @@
* Token-based copy-paste detection. * Token-based copy-paste detection.
* *
* @see net.sourceforge.pmd.cpd.CpdAnalysis * @see net.sourceforge.pmd.cpd.CpdAnalysis
* @see net.sourceforge.pmd.cpd.Tokenizer * @see net.sourceforge.pmd.cpd.CpdLexer
*/ */
package net.sourceforge.pmd.cpd; package net.sourceforge.pmd.cpd;

View File

@ -5,9 +5,9 @@
package net.sourceforge.pmd.lang; package net.sourceforge.pmd.lang;
import net.sourceforge.pmd.annotation.Experimental; import net.sourceforge.pmd.annotation.Experimental;
import net.sourceforge.pmd.cpd.AnyTokenizer; import net.sourceforge.pmd.cpd.AnyCpdLexer;
import net.sourceforge.pmd.cpd.CpdCapableLanguage; import net.sourceforge.pmd.cpd.CpdCapableLanguage;
import net.sourceforge.pmd.cpd.Tokenizer; import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.ast.AstInfo; import net.sourceforge.pmd.lang.ast.AstInfo;
import net.sourceforge.pmd.lang.ast.Parser; import net.sourceforge.pmd.lang.ast.Parser;
import net.sourceforge.pmd.lang.ast.Parser.ParserTask; import net.sourceforge.pmd.lang.ast.Parser.ParserTask;
@ -47,8 +47,8 @@ public final class PlainTextLanguage extends SimpleLanguageModuleBase implements
} }
@Override @Override
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) { public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
return new AnyTokenizer(); return new AnyCpdLexer();
} }
private static final class TextLvh implements LanguageVersionHandler { private static final class TextLvh implements LanguageVersionHandler {

View File

@ -10,7 +10,7 @@ import java.util.List;
import org.checkerframework.checker.nullness.qual.NonNull; import org.checkerframework.checker.nullness.qual.NonNull;
import org.checkerframework.checker.nullness.qual.Nullable; import org.checkerframework.checker.nullness.qual.Nullable;
import net.sourceforge.pmd.cpd.impl.JavaCCTokenizer; import net.sourceforge.pmd.cpd.impl.JavaccCpdLexer;
import net.sourceforge.pmd.lang.ast.impl.TokenDocument; import net.sourceforge.pmd.lang.ast.impl.TokenDocument;
import net.sourceforge.pmd.lang.document.TextDocument; import net.sourceforge.pmd.lang.document.TextDocument;
@ -18,7 +18,7 @@ import net.sourceforge.pmd.lang.document.TextDocument;
* Token document for Javacc implementations. This is a helper object * Token document for Javacc implementations. This is a helper object
* for generated token managers. Note: the extension point is a custom * for generated token managers. Note: the extension point is a custom
* implementation of {@link TokenDocumentBehavior}, see {@link JjtreeParserAdapter#tokenBehavior()}, * implementation of {@link TokenDocumentBehavior}, see {@link JjtreeParserAdapter#tokenBehavior()},
* {@link JavaCCTokenizer#tokenBehavior()} * {@link JavaccCpdLexer#tokenBehavior()}
*/ */
public final class JavaccTokenDocument extends TokenDocument<JavaccToken> { public final class JavaccTokenDocument extends TokenDocument<JavaccToken> {

View File

@ -5,7 +5,7 @@
package net.sourceforge.pmd.lang.impl; package net.sourceforge.pmd.lang.impl;
import net.sourceforge.pmd.cpd.CpdCapableLanguage; import net.sourceforge.pmd.cpd.CpdCapableLanguage;
import net.sourceforge.pmd.cpd.Tokenizer; import net.sourceforge.pmd.cpd.CpdLexer;
import net.sourceforge.pmd.lang.LanguageModuleBase; import net.sourceforge.pmd.lang.LanguageModuleBase;
import net.sourceforge.pmd.lang.LanguagePropertyBundle; import net.sourceforge.pmd.lang.LanguagePropertyBundle;
@ -27,5 +27,5 @@ public abstract class CpdOnlyLanguageModuleBase extends LanguageModuleBase imple
} }
@Override @Override
public abstract Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle); public abstract CpdLexer createCpdLexer(LanguagePropertyBundle bundle);
} }

Some files were not shown because too many files have changed in this diff Show More