Merge pull request #4797 from adangel:lexexception-cpdlexer

[core] Rename TokenMgrError to LexException, Tokenizer to CpdLexer #4797
2024-02-02 19:59:41 +01:00 · 2024-02-02 19:59:41 +01:00 · fa97cff7ff
commit fa97cff7ff
parent a78e17b7a6 bd933461d3
137 changed files with 477 additions and 432 deletions
--- a/docs/pages/pmd/devdocs/major_contributions/adding_a_new_antlr_based_language.md
+++ b/docs/pages/pmd/devdocs/major_contributions/adding_a_new_antlr_based_language.md
@ -119,15 +119,15 @@ definitely don't come for free. It is much effort and requires perseverance to i

 ### 5.  Create a TokenManager
 *   This is needed to support CPD (copy paste detection)
-*   We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java).
-*   You must create your own "AntlrTokenizer" such as we do with
-    [`SwiftTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftTokenizer.java).
+*   We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrTokenManager.java).
+*   You must create your own "AntlrCpdLexer" such as we do with
+    [`SwiftCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftCpdLexer.java).
 *   If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON")
    you can create your own implementation of
    [`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java).
    You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)`
-    and return your custom filter. See the tokenizer for C# as an exmaple:
-    [`CsTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsTokenizer.java).
+    and return your custom filter. See the CpdLexer for C# as an exmaple:
+    [`CsCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsCpdLexer.java).
    
    If you don't need a custom token filter, you don't need to override the method. It returns the default
    `AntlrTokenFilter` which doesn't filter anything.
--- a/docs/pages/pmd/devdocs/major_contributions/adding_new_cpd_language.md
+++ b/docs/pages/pmd/devdocs/major_contributions/adding_new_cpd_language.md
@ -11,7 +11,7 @@ author: Matías Fraga, Clément Fournier
 ## Adding support for a CPD language

 CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}.
-To add support for a new language, the crucial piece is writing a tokenizer that
+To add support for a new language, the crucial piece is writing a CpdLexer that
 splits the source file into the tokens specific to your language. Thankfully you
 can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC
 grammar to generate a lexer for you. If you cannot use a lexer generator, for
@ -31,12 +31,12 @@ Use the following guide to set up a new language module that supports CPD.
    the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)).
      Once that is done, `mvn generate-sources` should generate the lexer sources for you.

-      You can now implement a tokenizer, for instance by extending {% jdoc core::cpd.impl.AntlrTokenizer %}. The following reproduces the Go implementation:
+      You can now implement a CpdLexer, for instance by extending {% jdoc core::cpd.impl.AntlrCpdLexer %}. The following reproduces the Go implementation:
    ```java
    // mind the package convention if you are going to make a PR
    package net.sourceforge.pmd.lang.go.cpd;

-    public class GoTokenizer extends AntlrTokenizer {
+    public class GoCpdLexer extends AntlrCpdLexer {

        @Override
        protected Lexer getLexerForSource(CharStream charStream) {
@ -64,9 +64,9 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
        }

        @Override
-        public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
-            // This method should return an instance of the tokenizer you created.
-            return new GoTokenizer();
+        public Tokenizer createCpdLexer(LanguagePropertyBundle bundle) {
+            // This method should return an instance of the CpdLexer you created.
+            return new GoCpdLexer();
        }
    } 
    ```
@ -77,7 +77,7 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp

 4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java).

-5. Add some tests for your tokenizer by following the [section below](#testing-your-implementation).
+5. Add some tests for your CpdLexer by following the [section below](#testing-your-implementation).

 6. Finishing up your new language module by adding a page in the documentation. Create a new markdown file
   `<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:
@ -100,10 +100,10 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
   {% endraw %}
   ```

-### Declaring tokenizer options
+### Declaring CpdLexer options

-To make the tokenizer configurable, first define some property descriptors using
-{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.Tokenizer %}
+To make the CpdLexer configurable, first define some property descriptors using
+{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.CpdLexer %}
 for some predefined ones which you can reuse (prefer reusing property descriptors if you can).
 You need to override {% jdoc core::Language#newPropertyBundle() %}
 and call `definePropertyDescriptor` to register the descriptors.
@ -112,13 +112,13 @@ of {% jdoc core::cpd.CpdCapableLanguage#createCpdTokenizer(core::lang.LanguagePr

 To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %}
 as a base class, or another base class in {% jdoc_package core::cpd.impl %}.
-Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinTokenizer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaTokenizer.java).
+Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinCpdLexer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaCpdLexer.java).


 ### Testing your implementation

 Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`.
-This contains utilities to test your tokenizer.
+This contains utilities to test your CpdLexer.

 Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}.
 To add tests, you need to write regular JUnit `@Test`-annotated methods, and
--- a/docs/pages/pmd/languages/language_properties.md
+++ b/docs/pages/pmd/languages/language_properties.md
@ -24,7 +24,7 @@ PropertyName is the name of the property converted to SCREAMING_SNAKE_CASE, that

 As a convention, properties whose name start with an *x* are internal and may be removed or changed without notice.

-Properties whose name start with **CPD** are used to configure CPD tokenizer options.
+Properties whose name start with **CPD** are used to configure CPD CpdLexer options.

 Programmatically, the language properties can be set on `PMDConfiguration` (or `CPDConfiguration`) before using the
 {%jdoc core::PmdAnalyzer %} (or {%jdoc core::cpd.CpdAnalyzer %}) instance
--- a/docs/pages/release_notes.md
+++ b/docs/pages/release_notes.md
@ -166,6 +166,7 @@ The rules have been moved into categories with PMD 6.
  * [#4723](https://github.com/pmd/pmd/issues/4723): \[cli] Launch fails for "bash pmd"
 * core
  * [#1027](https://github.com/pmd/pmd/issues/1027): \[core] Apply the new PropertyDescriptor&lt;Pattern&gt; type where applicable
+  * [#4065](https://github.com/pmd/pmd/issues/4065): \[core] Rename TokenMgrError to LexException, Tokenizer to CpdLexer
  * [#4313](https://github.com/pmd/pmd/issues/4313): \[core] Remove support for &lt;lang&gt;-&lt;ruleset&gt; hyphen notation for ruleset references
  * [#4314](https://github.com/pmd/pmd/issues/4314): \[core] Remove ruleset compatibility filter (RuleSetFactoryCompatibility) and CLI option `--no-ruleset-compatibility`
  * [#4378](https://github.com/pmd/pmd/issues/4378): \[core] Ruleset loading processes commented rules
@ -271,6 +272,15 @@ The following previously deprecated classes have been removed:
  * The node `ASTClassOrInterfaceBody` has been renamed to {% jdoc java::lang.ast.ASTClassBody %}. XPath rules
    need to be adjusted.

+**Renamed classes and methods**
+
+* pmd-core
+  * {%jdoc_old core::lang.ast.TokenMgrError %} has been renamed to {% jdoc core::lang.ast.LexException %}
+  * {%jdoc_old core::cpd.Tokenizer %} has been renamed to {% jdoc core::cpd.CpdLexer %}. Along with this rename,
+    all the implementations have been renamed as well (`Tokenizer` -> `CpdLexer`), e.g. "CppCpdLexer", "JavaCpdLexer".
+    This affects all language modules.
+  * {%jdoc_old core::cpd.AnyTokenizer %} has been renamed to {% jdoc core::cpd.AnyCpdLexer %}.
+
 **Removed functionality**

 * The CLI parameter `--no-ruleset-compatibility` has been removed. It was only used to allow loading
@ -684,6 +694,7 @@ See also [Detailed Release Notes for PMD 7]({{ baseurl }}pmd_release_notes_pmd7.
    * [#3919](https://github.com/pmd/pmd/issues/3919): \[core] Merge CPD and PMD language
    * [#3922](https://github.com/pmd/pmd/pull/3922):   \[core] Better error reporting for the ruleset parser
    * [#4035](https://github.com/pmd/pmd/issues/4035): \[core] ConcurrentModificationException in DefaultRuleViolationFactory
+    * [#4065](https://github.com/pmd/pmd/issues/4065): \[core] Rename TokenMgrError to LexException, Tokenizer to CpdLexer
    * [#4120](https://github.com/pmd/pmd/issues/4120): \[core] Explicitly name all language versions
    * [#4204](https://github.com/pmd/pmd/issues/4204): \[core] Provide a CpdAnalysis class as a programmatic entry point into CPD
    * [#4301](https://github.com/pmd/pmd/issues/4301): \[core] Remove deprecated property concrete classes
--- a/javacc-wrapper.xml
+++ b/javacc-wrapper.xml
@ -280,6 +280,13 @@
            <file name="${tokenmgr-file}" />
        </replaceregexp>

+        <!-- Use own LexException instead of JavaCC's TokenMgrError -->
+        <replaceregexp>
+            <regexp pattern='throw new TokenMgrError\(EOFSeen' />
+            <substitution expression='throw new net.sourceforge.pmd.lang.ast.LexException(EOFSeen' />
+            <file name="${tokenmgr-file}" />
+        </replaceregexp>
+
        <!-- Useless argument, also replace lex state ID with its name  -->
        <replaceregexp>
            <regexp pattern='curLexState, error_line, error_column, error_after, curChar, TokenMgrError.LEXICAL_ERROR\)' />
--- a/pmd-apex/src/main/java/net/sourceforge/pmd/lang/apex/ApexLanguageModule.java
+++ b/pmd-apex/src/main/java/net/sourceforge/pmd/lang/apex/ApexLanguageModule.java
@ -5,12 +5,12 @@
 package net.sourceforge.pmd.lang.apex;

 import net.sourceforge.pmd.cpd.CpdCapableLanguage;
-import net.sourceforge.pmd.cpd.Tokenizer;
+import net.sourceforge.pmd.cpd.CpdLexer;
 import net.sourceforge.pmd.lang.LanguageModuleBase;
 import net.sourceforge.pmd.lang.LanguageProcessor;
 import net.sourceforge.pmd.lang.LanguagePropertyBundle;
 import net.sourceforge.pmd.lang.PmdCapableLanguage;
-import net.sourceforge.pmd.lang.apex.cpd.ApexTokenizer;
+import net.sourceforge.pmd.lang.apex.cpd.ApexCpdLexer;

 public class ApexLanguageModule extends LanguageModuleBase implements PmdCapableLanguage, CpdCapableLanguage {
    private static final String ID = "apex";
@ -47,7 +47,7 @@ public class ApexLanguageModule extends LanguageModuleBase implements PmdCapable
    }

    @Override
-    public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
-        return new ApexTokenizer();
+    public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
+        return new ApexCpdLexer();
    }
 }
--- a/pmd-apex/src/main/java/net/sourceforge/pmd/lang/apex/cpd/ApexTokenizer.java
+++ b/pmd-apex/src/main/java/net/sourceforge/pmd/lang/apex/cpd/ApexTokenizer.java
@ -12,16 +12,16 @@ import org.antlr.runtime.ANTLRStringStream;
 import org.antlr.runtime.Lexer;
 import org.antlr.runtime.Token;

+import net.sourceforge.pmd.cpd.CpdLexer;
 import net.sourceforge.pmd.cpd.TokenFactory;
-import net.sourceforge.pmd.cpd.Tokenizer;
 import net.sourceforge.pmd.lang.apex.ApexJorjeLogging;
 import net.sourceforge.pmd.lang.document.TextDocument;

 import apex.jorje.parser.impl.ApexLexer;

-public class ApexTokenizer implements Tokenizer {
+public class ApexCpdLexer implements CpdLexer {

-    public ApexTokenizer() {
+    public ApexCpdLexer() {
        ApexJorjeLogging.disableLogging();
    }

--- a/pmd-apex/src/test/java/net/sourceforge/pmd/lang/apex/cpd/ApexTokenizerTest.java
+++ b/pmd-apex/src/test/java/net/sourceforge/pmd/lang/apex/cpd/ApexTokenizerTest.java
@ -9,9 +9,9 @@ import org.junit.jupiter.api.Test;
 import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
 import net.sourceforge.pmd.lang.apex.ApexLanguageModule;

-class ApexTokenizerTest extends CpdTextComparisonTest {
+class ApexCpdLexerTest extends CpdTextComparisonTest {

-    ApexTokenizerTest() {
+    ApexCpdLexerTest() {
        super(ApexLanguageModule.getInstance(), ".cls");
    }

--- a/pmd-coco/src/main/java/net/sourceforge/pmd/lang/coco/CocoLanguageModule.java
+++ b/pmd-coco/src/main/java/net/sourceforge/pmd/lang/coco/CocoLanguageModule.java
@ -4,10 +4,10 @@

 package net.sourceforge.pmd.lang.coco;

-import net.sourceforge.pmd.cpd.Tokenizer;
+import net.sourceforge.pmd.cpd.CpdLexer;
 import net.sourceforge.pmd.lang.LanguagePropertyBundle;
 import net.sourceforge.pmd.lang.LanguageRegistry;
-import net.sourceforge.pmd.lang.coco.cpd.CocoTokenizer;
+import net.sourceforge.pmd.lang.coco.cpd.CocoCpdLexer;
 import net.sourceforge.pmd.lang.impl.CpdOnlyLanguageModuleBase;

 /**
@ -25,7 +25,7 @@ public class CocoLanguageModule extends CpdOnlyLanguageModuleBase {
    }

    @Override
-    public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
-        return new CocoTokenizer();
+    public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
+        return new CocoCpdLexer();
    }
 }
--- a/pmd-coco/src/main/java/net/sourceforge/pmd/lang/coco/cpd/CocoTokenizer.java
+++ b/pmd-coco/src/main/java/net/sourceforge/pmd/lang/coco/cpd/CocoTokenizer.java
@ -7,13 +7,13 @@ package net.sourceforge.pmd.lang.coco.cpd;
 import org.antlr.v4.runtime.CharStream;
 import org.antlr.v4.runtime.Lexer;

-import net.sourceforge.pmd.cpd.impl.AntlrTokenizer;
+import net.sourceforge.pmd.cpd.impl.AntlrCpdLexer;
 import net.sourceforge.pmd.lang.coco.ast.CocoLexer;

 /**
 * The Coco Tokenizer.
 */
-public class CocoTokenizer extends AntlrTokenizer {
+public class CocoCpdLexer extends AntlrCpdLexer {

    @Override
    protected Lexer getLexerForSource(CharStream charStream) {
--- a/pmd-coco/src/test/java/net/sourceforge/pmd/lang/coco/cpd/CocoTokenizerTest.java
+++ b/pmd-coco/src/test/java/net/sourceforge/pmd/lang/coco/cpd/CocoTokenizerTest.java
@ -9,8 +9,8 @@ import org.junit.jupiter.api.Test;
 import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
 import net.sourceforge.pmd.lang.coco.CocoLanguageModule;

-class CocoTokenizerTest extends CpdTextComparisonTest {
-    CocoTokenizerTest() {
+class CocoCpdLexerTest extends CpdTextComparisonTest {
+    CocoCpdLexerTest() {
        super(CocoLanguageModule.getInstance(), ".coco");
    }

--- a/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/EcmascriptTokenizer.java
+++ b/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/EcmascriptTokenizer.java
@ -4,5 +4,5 @@

 package net.sourceforge.pmd.cpd;

-public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptTokenizer {
+public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptCpdLexer implements Tokenizer {
 }
--- a/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/JSPTokenizer.java
+++ b/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/JSPTokenizer.java
@ -4,5 +4,5 @@

 package net.sourceforge.pmd.cpd;

-public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JSPTokenizer {
+public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer implements Tokenizer {
 }
--- a/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/JavaTokenizer.java
+++ b/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/JavaTokenizer.java
@ -9,7 +9,7 @@ import java.util.Properties;
 import net.sourceforge.pmd.lang.java.JavaLanguageModule;
 import net.sourceforge.pmd.lang.java.internal.JavaLanguageProperties;

-public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaTokenizer {
+public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaCpdLexer implements Tokenizer {
    public JavaTokenizer(Properties properties) {
        super(convertLanguageProperties(properties));
    }
--- a/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/Tokenizer.java
+++ b/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/Tokenizer.java
@ -0,0 +1,8 @@
+/**
+ * BSD-style license; for more info see http://pmd.sourceforge.net/license.html
+ */
+
+package net.sourceforge.pmd.cpd;
+
+public interface Tokenizer extends CpdLexer {
+}
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/AnyTokenizer.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/AnyTokenizer.java
@ -20,9 +20,10 @@ import net.sourceforge.pmd.util.StringUtil;
 * Higher-quality lexers should be implemented with a lexer generator.
 *
 * <p>In PMD 7, this replaces AbstractTokenizer, which provided nearly
- * no more functionality.
+ * no more functionality.</p>
+ * <p>Note: This class has been called AnyTokenizer in PMD 6.</p>
 */
-public class AnyTokenizer implements Tokenizer {
+public class AnyCpdLexer implements CpdLexer {

    private static final Pattern DEFAULT_PATTERN = makePattern("");

@ -40,15 +41,15 @@ public class AnyTokenizer implements Tokenizer {
    private final Pattern pattern;
    private final String commentStart;

-    public AnyTokenizer() {
+    public AnyCpdLexer() {
        this(DEFAULT_PATTERN, "");
    }

-    public AnyTokenizer(String eolCommentStart) {
+    public AnyCpdLexer(String eolCommentStart) {
        this(makePattern(eolCommentStart), eolCommentStart);
    }

-    private AnyTokenizer(Pattern pattern, String commentStart) {
+    private AnyCpdLexer(Pattern pattern, String commentStart) {
        this.pattern = pattern;
        this.commentStart = commentStart;
    }
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/CpdAnalysis.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/CpdAnalysis.java
@ -23,7 +23,7 @@ import net.sourceforge.pmd.internal.util.IOUtil;
 import net.sourceforge.pmd.lang.Language;
 import net.sourceforge.pmd.lang.LanguagePropertyBundle;
 import net.sourceforge.pmd.lang.ast.FileAnalysisException;
-import net.sourceforge.pmd.lang.ast.TokenMgrError;
+import net.sourceforge.pmd.lang.ast.LexException;
 import net.sourceforge.pmd.lang.document.FileCollector;
 import net.sourceforge.pmd.lang.document.FileId;
 import net.sourceforge.pmd.lang.document.TextDocument;
@ -137,10 +137,10 @@ public final class CpdAnalysis implements AutoCloseable {
        this.listener = cpdListener;
    }

-    private int doTokenize(TextDocument document, Tokenizer tokenizer, Tokens tokens) throws IOException, TokenMgrError {
+    private int doTokenize(TextDocument document, CpdLexer cpdLexer, Tokens tokens) throws IOException, LexException {
        LOGGER.trace("Tokenizing {}", document.getFileId().getAbsolutePath());
        int lastTokenSize = tokens.size();
-        Tokenizer.tokenize(tokenizer, document, tokens);
+        CpdLexer.tokenize(cpdLexer, document, tokens);
        return tokens.size() - lastTokenSize - 1; /* EOF */
    }

@ -152,12 +152,12 @@ public final class CpdAnalysis implements AutoCloseable {
    public void performAnalysis(Consumer<CPDReport> consumer) {

        try (SourceManager sourceManager = new SourceManager(files.getCollectedFiles())) {
-            Map<Language, Tokenizer> tokenizers =
+            Map<Language, CpdLexer> tokenizers =
                sourceManager.getTextFiles().stream()
                             .map(it -> it.getLanguageVersion().getLanguage())
                             .distinct()
                             .filter(it -> it instanceof CpdCapableLanguage)
-                             .collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdTokenizer(configuration.getLanguageProperties(lang))));
+                             .collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdLexer(configuration.getLanguageProperties(lang))));

            Map<FileId, Integer> numberOfTokensPerFile = new HashMap<>();

@ -170,7 +170,7 @@ public final class CpdAnalysis implements AutoCloseable {
                    int newTokens = doTokenize(textDocument, tokenizers.get(textFile.getLanguageVersion().getLanguage()), tokens);
                    numberOfTokensPerFile.put(textDocument.getFileId(), newTokens);
                    listener.addedFile(1);
-                } catch (TokenMgrError | IOException e) {
+                } catch (LexException | IOException e) {
                    if (e instanceof FileAnalysisException) { // NOPMD
                        ((FileAnalysisException) e).setFileId(textFile.getFileId());
                    }
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/CpdCapableLanguage.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/CpdCapableLanguage.java
@ -16,7 +16,7 @@ public interface CpdCapableLanguage extends Language {


    /**
-     * Create a new {@link Tokenizer} for this language, given
+     * Create a new {@link CpdLexer} for this language, given
     * a property bundle with configuration. The bundle was created by
     * this instance using {@link #newPropertyBundle()}. It can be assumed
     * that the bundle will never be mutated anymore, and this method
@ -26,7 +26,7 @@ public interface CpdCapableLanguage extends Language {
     *
     * @return A new language processor
     */
-    default Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
-        return new AnyTokenizer();
+    default CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
+        return new AnyCpdLexer();
    }
 }
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/Tokenizer.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/Tokenizer.java
@ -10,8 +10,10 @@ import net.sourceforge.pmd.lang.document.TextDocument;

 /**
 * Tokenizes a source file into tokens consumable by CPD.
+ *
+ * <p>Note: This interface has been called Tokenizer in PMD 6.</p>
 */
-public interface Tokenizer {
+public interface CpdLexer {

    /**
     * Tokenize the source code and record tokens using the provided token factory.
@ -22,9 +24,9 @@ public interface Tokenizer {
     * Wraps a call to {@link #tokenize(TextDocument, TokenFactory)} to properly
     * create and close the token factory.
     */
-    static void tokenize(Tokenizer tokenizer, TextDocument textDocument, Tokens tokens) throws IOException {
+    static void tokenize(CpdLexer cpdLexer, TextDocument textDocument, Tokens tokens) throws IOException {
        try (TokenFactory tf = Tokens.factoryForFile(textDocument, tokens)) {
-            tokenizer.tokenize(textDocument, tf);
+            cpdLexer.tokenize(textDocument, tf);
        }
    }
 }
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/GUI.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/GUI.java
@ -142,8 +142,8 @@ public class GUI implements CPDListener {
                                .extensions(extension)
                                .name("By extension...")) {
                @Override
-                public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
-                    return new AnyTokenizer();
+                public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
+                    return new AnyCpdLexer();
                }
            };
        }
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/TokenFactory.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/TokenFactory.java
@ -7,12 +7,12 @@ package net.sourceforge.pmd.cpd;
 import org.checkerframework.checker.nullness.qual.NonNull;
 import org.checkerframework.checker.nullness.qual.Nullable;

-import net.sourceforge.pmd.lang.ast.TokenMgrError;
+import net.sourceforge.pmd.lang.ast.LexException;
 import net.sourceforge.pmd.lang.document.FileLocation;
 import net.sourceforge.pmd.lang.document.TextDocument;

 /**
- * Proxy to record tokens from within {@link Tokenizer#tokenize(TextDocument, TokenFactory)}.
+ * Proxy to record tokens from within {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
 */
 public interface TokenFactory extends AutoCloseable {

@ -43,7 +43,7 @@ public interface TokenFactory extends AutoCloseable {
        recordToken(image, location.getStartLine(), location.getStartColumn(), location.getEndLine(), location.getEndColumn());
    }

-    TokenMgrError makeLexException(int line, int column, String message, @Nullable Throwable cause);
+    LexException makeLexException(int line, int column, String message, @Nullable Throwable cause);

    /**
     * Sets the image of an existing token entry.
@ -57,7 +57,7 @@ public interface TokenFactory extends AutoCloseable {

    /**
     * This adds the EOF token, it must be called when
-     * {@link Tokenizer#tokenize(TextDocument, TokenFactory)} is done.
+     * {@link CpdLexer#tokenize(TextDocument, TokenFactory)} is done.
     */
    @Override
    void close();
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/Tokens.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/Tokens.java
@ -14,7 +14,7 @@ import org.checkerframework.checker.nullness.qual.NonNull;
 import org.checkerframework.checker.nullness.qual.Nullable;

 import net.sourceforge.pmd.annotation.InternalApi;
-import net.sourceforge.pmd.lang.ast.TokenMgrError;
+import net.sourceforge.pmd.lang.ast.LexException;
 import net.sourceforge.pmd.lang.document.FileId;
 import net.sourceforge.pmd.lang.document.TextDocument;

@ -93,7 +93,7 @@ public class Tokens {

    /**
     * Creates a token factory to process the given file with
-     * {@link Tokenizer#tokenize(TextDocument, TokenFactory)}.
+     * {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
     * Tokens are accumulated in the {@link Tokens} parameter.
     *
     * @param file   Document for the file to process
@ -117,8 +117,8 @@ public class Tokens {
            }

            @Override
-            public TokenMgrError makeLexException(int line, int column, String message, @Nullable Throwable cause) {
-                return new TokenMgrError(line, column, fileId, message, cause);
+            public LexException makeLexException(int line, int column, String message, @Nullable Throwable cause) {
+                return new LexException(line, column, fileId, message, cause);
            }

            @Override
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java
@ -10,16 +10,16 @@ import org.antlr.v4.runtime.CharStream;
 import org.antlr.v4.runtime.CharStreams;
 import org.antlr.v4.runtime.Lexer;

-import net.sourceforge.pmd.cpd.Tokenizer;
+import net.sourceforge.pmd.cpd.CpdLexer;
 import net.sourceforge.pmd.lang.TokenManager;
 import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrToken;
 import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrTokenManager;
 import net.sourceforge.pmd.lang.document.TextDocument;

 /**
- * Generic implementation of a {@link Tokenizer} useful to any Antlr grammar.
+ * Generic implementation of a {@link CpdLexer} useful to any Antlr grammar.
 */
-public abstract class AntlrTokenizer extends TokenizerBase<AntlrToken> {
+public abstract class AntlrCpdLexer extends CpdLexerBase<AntlrToken> {
    @Override
    protected final TokenManager<AntlrToken> makeLexerImpl(TextDocument doc) throws IOException {
        CharStream charStream = CharStreams.fromReader(doc.newReader(), doc.getFileId().getAbsolutePath());
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/TokenizerBase.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/TokenizerBase.java
@ -6,16 +6,16 @@ package net.sourceforge.pmd.cpd.impl;

 import java.io.IOException;

+import net.sourceforge.pmd.cpd.CpdLexer;
 import net.sourceforge.pmd.cpd.TokenFactory;
-import net.sourceforge.pmd.cpd.Tokenizer;
 import net.sourceforge.pmd.lang.TokenManager;
 import net.sourceforge.pmd.lang.ast.GenericToken;
 import net.sourceforge.pmd.lang.document.TextDocument;

 /**
- * Generic base class for a {@link Tokenizer}.
+ * Generic base class for a {@link CpdLexer}.
 */
-public abstract class TokenizerBase<T extends GenericToken<T>> implements Tokenizer {
+public abstract class CpdLexerBase<T extends GenericToken<T>> implements CpdLexer {

    protected abstract TokenManager<T> makeLexerImpl(TextDocument doc) throws IOException;

--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/JavaCCTokenizer.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/JavaCCTokenizer.java
@ -1,15 +0,0 @@
-/**
- * BSD-style license; for more info see http://pmd.sourceforge.net/license.html
- */
-
-package net.sourceforge.pmd.cpd.impl;
-
-import net.sourceforge.pmd.cpd.Tokenizer;
-import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
-
-/**
- * Base class for a {@link Tokenizer} for a language implemented by a JavaCC tokenizer.
- */
-public abstract class JavaCCTokenizer extends TokenizerBase<JavaccToken> {
-
-}
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/JavaccCpdLexer.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/JavaccCpdLexer.java
@ -0,0 +1,15 @@
+/**
+ * BSD-style license; for more info see http://pmd.sourceforge.net/license.html
+ */
+
+package net.sourceforge.pmd.cpd.impl;
+
+import net.sourceforge.pmd.cpd.CpdLexer;
+import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
+
+/**
+ * Base class for a {@link CpdLexer} for a language implemented by a JavaCC tokenizer.
+ */
+public abstract class JavaccCpdLexer extends CpdLexerBase<JavaccToken> {
+
+}
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/package-info.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/package-info.java
@ -3,6 +3,6 @@
 */

 /**
- * Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.Tokenizer}.
+ * Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.CpdLexer}.
 */
 package net.sourceforge.pmd.cpd.impl;
--- a/pmd-core/src/main/java/net/sourceforge/pmd/cpd/package-info.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/cpd/package-info.java
@ -6,6 +6,6 @@
 * Token-based copy-paste detection.
 *
 * @see net.sourceforge.pmd.cpd.CpdAnalysis
- * @see net.sourceforge.pmd.cpd.Tokenizer
+ * @see net.sourceforge.pmd.cpd.CpdLexer
 */
 package net.sourceforge.pmd.cpd;
--- a/pmd-core/src/main/java/net/sourceforge/pmd/lang/PlainTextLanguage.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/lang/PlainTextLanguage.java
@ -5,9 +5,9 @@
 package net.sourceforge.pmd.lang;

 import net.sourceforge.pmd.annotation.Experimental;
-import net.sourceforge.pmd.cpd.AnyTokenizer;
+import net.sourceforge.pmd.cpd.AnyCpdLexer;
 import net.sourceforge.pmd.cpd.CpdCapableLanguage;
-import net.sourceforge.pmd.cpd.Tokenizer;
+import net.sourceforge.pmd.cpd.CpdLexer;
 import net.sourceforge.pmd.lang.ast.AstInfo;
 import net.sourceforge.pmd.lang.ast.Parser;
 import net.sourceforge.pmd.lang.ast.Parser.ParserTask;
@ -47,8 +47,8 @@ public final class PlainTextLanguage extends SimpleLanguageModuleBase implements
    }

    @Override
-    public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
-        return new AnyTokenizer();
+    public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
+        return new AnyCpdLexer();
    }

    private static final class TextLvh implements LanguageVersionHandler {
--- a/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/FileAnalysisException.java
+++ b/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/FileAnalysisException.java
@ -16,7 +16,7 @@ import net.sourceforge.pmd.lang.document.FileLocation;
 /**
 * An exception that occurs while processing a file. Subtypes include
 * <ul>
- * <li>{@link TokenMgrError}: lexical syntax errors
+ * <li>{@link LexException}: lexical syntax errors
 * <li>{@link ParseException}: syntax errors
 * <li>{@link SemanticException}: exceptions occurring after the parsing
 * phase, because the source code is semantically invalid
--- a/Show More
+++ b/Show More