Merge pull request #4797 from adangel:lexexception-cpdlexer
[core] Rename TokenMgrError to LexException, Tokenizer to CpdLexer #4797
This commit is contained in:
commit
fa97cff7ff
@ -119,15 +119,15 @@ definitely don't come for free. It is much effort and requires perseverance to i
|
||||
|
||||
### 5. Create a TokenManager
|
||||
* This is needed to support CPD (copy paste detection)
|
||||
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java).
|
||||
* You must create your own "AntlrTokenizer" such as we do with
|
||||
[`SwiftTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftTokenizer.java).
|
||||
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrTokenManager.java).
|
||||
* You must create your own "AntlrCpdLexer" such as we do with
|
||||
[`SwiftCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftCpdLexer.java).
|
||||
* If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON")
|
||||
you can create your own implementation of
|
||||
[`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java).
|
||||
You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)`
|
||||
and return your custom filter. See the tokenizer for C# as an exmaple:
|
||||
[`CsTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsTokenizer.java).
|
||||
and return your custom filter. See the CpdLexer for C# as an exmaple:
|
||||
[`CsCpdLexer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsCpdLexer.java).
|
||||
|
||||
If you don't need a custom token filter, you don't need to override the method. It returns the default
|
||||
`AntlrTokenFilter` which doesn't filter anything.
|
||||
|
@ -11,7 +11,7 @@ author: Matías Fraga, Clément Fournier
|
||||
## Adding support for a CPD language
|
||||
|
||||
CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}.
|
||||
To add support for a new language, the crucial piece is writing a tokenizer that
|
||||
To add support for a new language, the crucial piece is writing a CpdLexer that
|
||||
splits the source file into the tokens specific to your language. Thankfully you
|
||||
can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC
|
||||
grammar to generate a lexer for you. If you cannot use a lexer generator, for
|
||||
@ -31,12 +31,12 @@ Use the following guide to set up a new language module that supports CPD.
|
||||
the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)).
|
||||
Once that is done, `mvn generate-sources` should generate the lexer sources for you.
|
||||
|
||||
You can now implement a tokenizer, for instance by extending {% jdoc core::cpd.impl.AntlrTokenizer %}. The following reproduces the Go implementation:
|
||||
You can now implement a CpdLexer, for instance by extending {% jdoc core::cpd.impl.AntlrCpdLexer %}. The following reproduces the Go implementation:
|
||||
```java
|
||||
// mind the package convention if you are going to make a PR
|
||||
package net.sourceforge.pmd.lang.go.cpd;
|
||||
|
||||
public class GoTokenizer extends AntlrTokenizer {
|
||||
public class GoCpdLexer extends AntlrCpdLexer {
|
||||
|
||||
@Override
|
||||
protected Lexer getLexerForSource(CharStream charStream) {
|
||||
@ -64,9 +64,9 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
|
||||
}
|
||||
|
||||
@Override
|
||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
||||
// This method should return an instance of the tokenizer you created.
|
||||
return new GoTokenizer();
|
||||
public Tokenizer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||
// This method should return an instance of the CpdLexer you created.
|
||||
return new GoCpdLexer();
|
||||
}
|
||||
}
|
||||
```
|
||||
@ -77,7 +77,7 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
|
||||
|
||||
4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java).
|
||||
|
||||
5. Add some tests for your tokenizer by following the [section below](#testing-your-implementation).
|
||||
5. Add some tests for your CpdLexer by following the [section below](#testing-your-implementation).
|
||||
|
||||
6. Finishing up your new language module by adding a page in the documentation. Create a new markdown file
|
||||
`<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:
|
||||
@ -100,10 +100,10 @@ If your language only supports CPD, then you can subclass {% jdoc core::lang.imp
|
||||
{% endraw %}
|
||||
```
|
||||
|
||||
### Declaring tokenizer options
|
||||
### Declaring CpdLexer options
|
||||
|
||||
To make the tokenizer configurable, first define some property descriptors using
|
||||
{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.Tokenizer %}
|
||||
To make the CpdLexer configurable, first define some property descriptors using
|
||||
{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.CpdLexer %}
|
||||
for some predefined ones which you can reuse (prefer reusing property descriptors if you can).
|
||||
You need to override {% jdoc core::Language#newPropertyBundle() %}
|
||||
and call `definePropertyDescriptor` to register the descriptors.
|
||||
@ -112,13 +112,13 @@ of {% jdoc core::cpd.CpdCapableLanguage#createCpdTokenizer(core::lang.LanguagePr
|
||||
|
||||
To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %}
|
||||
as a base class, or another base class in {% jdoc_package core::cpd.impl %}.
|
||||
Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinTokenizer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaTokenizer.java).
|
||||
Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinCpdLexer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaCpdLexer.java).
|
||||
|
||||
|
||||
### Testing your implementation
|
||||
|
||||
Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`.
|
||||
This contains utilities to test your tokenizer.
|
||||
This contains utilities to test your CpdLexer.
|
||||
|
||||
Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}.
|
||||
To add tests, you need to write regular JUnit `@Test`-annotated methods, and
|
||||
|
@ -24,7 +24,7 @@ PropertyName is the name of the property converted to SCREAMING_SNAKE_CASE, that
|
||||
|
||||
As a convention, properties whose name start with an *x* are internal and may be removed or changed without notice.
|
||||
|
||||
Properties whose name start with **CPD** are used to configure CPD tokenizer options.
|
||||
Properties whose name start with **CPD** are used to configure CPD CpdLexer options.
|
||||
|
||||
Programmatically, the language properties can be set on `PMDConfiguration` (or `CPDConfiguration`) before using the
|
||||
{%jdoc core::PmdAnalyzer %} (or {%jdoc core::cpd.CpdAnalyzer %}) instance
|
||||
|
@ -166,6 +166,7 @@ The rules have been moved into categories with PMD 6.
|
||||
* [#4723](https://github.com/pmd/pmd/issues/4723): \[cli] Launch fails for "bash pmd"
|
||||
* core
|
||||
* [#1027](https://github.com/pmd/pmd/issues/1027): \[core] Apply the new PropertyDescriptor<Pattern> type where applicable
|
||||
* [#4065](https://github.com/pmd/pmd/issues/4065): \[core] Rename TokenMgrError to LexException, Tokenizer to CpdLexer
|
||||
* [#4313](https://github.com/pmd/pmd/issues/4313): \[core] Remove support for <lang>-<ruleset> hyphen notation for ruleset references
|
||||
* [#4314](https://github.com/pmd/pmd/issues/4314): \[core] Remove ruleset compatibility filter (RuleSetFactoryCompatibility) and CLI option `--no-ruleset-compatibility`
|
||||
* [#4378](https://github.com/pmd/pmd/issues/4378): \[core] Ruleset loading processes commented rules
|
||||
@ -271,6 +272,15 @@ The following previously deprecated classes have been removed:
|
||||
* The node `ASTClassOrInterfaceBody` has been renamed to {% jdoc java::lang.ast.ASTClassBody %}. XPath rules
|
||||
need to be adjusted.
|
||||
|
||||
**Renamed classes and methods**
|
||||
|
||||
* pmd-core
|
||||
* {%jdoc_old core::lang.ast.TokenMgrError %} has been renamed to {% jdoc core::lang.ast.LexException %}
|
||||
* {%jdoc_old core::cpd.Tokenizer %} has been renamed to {% jdoc core::cpd.CpdLexer %}. Along with this rename,
|
||||
all the implementations have been renamed as well (`Tokenizer` -> `CpdLexer`), e.g. "CppCpdLexer", "JavaCpdLexer".
|
||||
This affects all language modules.
|
||||
* {%jdoc_old core::cpd.AnyTokenizer %} has been renamed to {% jdoc core::cpd.AnyCpdLexer %}.
|
||||
|
||||
**Removed functionality**
|
||||
|
||||
* The CLI parameter `--no-ruleset-compatibility` has been removed. It was only used to allow loading
|
||||
@ -684,6 +694,7 @@ See also [Detailed Release Notes for PMD 7]({{ baseurl }}pmd_release_notes_pmd7.
|
||||
* [#3919](https://github.com/pmd/pmd/issues/3919): \[core] Merge CPD and PMD language
|
||||
* [#3922](https://github.com/pmd/pmd/pull/3922): \[core] Better error reporting for the ruleset parser
|
||||
* [#4035](https://github.com/pmd/pmd/issues/4035): \[core] ConcurrentModificationException in DefaultRuleViolationFactory
|
||||
* [#4065](https://github.com/pmd/pmd/issues/4065): \[core] Rename TokenMgrError to LexException, Tokenizer to CpdLexer
|
||||
* [#4120](https://github.com/pmd/pmd/issues/4120): \[core] Explicitly name all language versions
|
||||
* [#4204](https://github.com/pmd/pmd/issues/4204): \[core] Provide a CpdAnalysis class as a programmatic entry point into CPD
|
||||
* [#4301](https://github.com/pmd/pmd/issues/4301): \[core] Remove deprecated property concrete classes
|
||||
|
@ -280,6 +280,13 @@
|
||||
<file name="${tokenmgr-file}" />
|
||||
</replaceregexp>
|
||||
|
||||
<!-- Use own LexException instead of JavaCC's TokenMgrError -->
|
||||
<replaceregexp>
|
||||
<regexp pattern='throw new TokenMgrError\(EOFSeen' />
|
||||
<substitution expression='throw new net.sourceforge.pmd.lang.ast.LexException(EOFSeen' />
|
||||
<file name="${tokenmgr-file}" />
|
||||
</replaceregexp>
|
||||
|
||||
<!-- Useless argument, also replace lex state ID with its name -->
|
||||
<replaceregexp>
|
||||
<regexp pattern='curLexState, error_line, error_column, error_after, curChar, TokenMgrError.LEXICAL_ERROR\)' />
|
||||
|
@ -5,12 +5,12 @@
|
||||
package net.sourceforge.pmd.lang.apex;
|
||||
|
||||
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.lang.LanguageModuleBase;
|
||||
import net.sourceforge.pmd.lang.LanguageProcessor;
|
||||
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
||||
import net.sourceforge.pmd.lang.PmdCapableLanguage;
|
||||
import net.sourceforge.pmd.lang.apex.cpd.ApexTokenizer;
|
||||
import net.sourceforge.pmd.lang.apex.cpd.ApexCpdLexer;
|
||||
|
||||
public class ApexLanguageModule extends LanguageModuleBase implements PmdCapableLanguage, CpdCapableLanguage {
|
||||
private static final String ID = "apex";
|
||||
@ -47,7 +47,7 @@ public class ApexLanguageModule extends LanguageModuleBase implements PmdCapable
|
||||
}
|
||||
|
||||
@Override
|
||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
||||
return new ApexTokenizer();
|
||||
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||
return new ApexCpdLexer();
|
||||
}
|
||||
}
|
||||
|
@ -12,16 +12,16 @@ import org.antlr.runtime.ANTLRStringStream;
|
||||
import org.antlr.runtime.Lexer;
|
||||
import org.antlr.runtime.Token;
|
||||
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.cpd.TokenFactory;
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.lang.apex.ApexJorjeLogging;
|
||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
|
||||
import apex.jorje.parser.impl.ApexLexer;
|
||||
|
||||
public class ApexTokenizer implements Tokenizer {
|
||||
public class ApexCpdLexer implements CpdLexer {
|
||||
|
||||
public ApexTokenizer() {
|
||||
public ApexCpdLexer() {
|
||||
ApexJorjeLogging.disableLogging();
|
||||
}
|
||||
|
@ -9,9 +9,9 @@ import org.junit.jupiter.api.Test;
|
||||
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
|
||||
import net.sourceforge.pmd.lang.apex.ApexLanguageModule;
|
||||
|
||||
class ApexTokenizerTest extends CpdTextComparisonTest {
|
||||
class ApexCpdLexerTest extends CpdTextComparisonTest {
|
||||
|
||||
ApexTokenizerTest() {
|
||||
ApexCpdLexerTest() {
|
||||
super(ApexLanguageModule.getInstance(), ".cls");
|
||||
}
|
||||
|
@ -4,10 +4,10 @@
|
||||
|
||||
package net.sourceforge.pmd.lang.coco;
|
||||
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
||||
import net.sourceforge.pmd.lang.LanguageRegistry;
|
||||
import net.sourceforge.pmd.lang.coco.cpd.CocoTokenizer;
|
||||
import net.sourceforge.pmd.lang.coco.cpd.CocoCpdLexer;
|
||||
import net.sourceforge.pmd.lang.impl.CpdOnlyLanguageModuleBase;
|
||||
|
||||
/**
|
||||
@ -25,7 +25,7 @@ public class CocoLanguageModule extends CpdOnlyLanguageModuleBase {
|
||||
}
|
||||
|
||||
@Override
|
||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
||||
return new CocoTokenizer();
|
||||
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||
return new CocoCpdLexer();
|
||||
}
|
||||
}
|
||||
|
@ -7,13 +7,13 @@ package net.sourceforge.pmd.lang.coco.cpd;
|
||||
import org.antlr.v4.runtime.CharStream;
|
||||
import org.antlr.v4.runtime.Lexer;
|
||||
|
||||
import net.sourceforge.pmd.cpd.impl.AntlrTokenizer;
|
||||
import net.sourceforge.pmd.cpd.impl.AntlrCpdLexer;
|
||||
import net.sourceforge.pmd.lang.coco.ast.CocoLexer;
|
||||
|
||||
/**
|
||||
* The Coco Tokenizer.
|
||||
*/
|
||||
public class CocoTokenizer extends AntlrTokenizer {
|
||||
public class CocoCpdLexer extends AntlrCpdLexer {
|
||||
|
||||
@Override
|
||||
protected Lexer getLexerForSource(CharStream charStream) {
|
@ -9,8 +9,8 @@ import org.junit.jupiter.api.Test;
|
||||
import net.sourceforge.pmd.cpd.test.CpdTextComparisonTest;
|
||||
import net.sourceforge.pmd.lang.coco.CocoLanguageModule;
|
||||
|
||||
class CocoTokenizerTest extends CpdTextComparisonTest {
|
||||
CocoTokenizerTest() {
|
||||
class CocoCpdLexerTest extends CpdTextComparisonTest {
|
||||
CocoCpdLexerTest() {
|
||||
super(CocoLanguageModule.getInstance(), ".coco");
|
||||
}
|
||||
|
@ -4,5 +4,5 @@
|
||||
|
||||
package net.sourceforge.pmd.cpd;
|
||||
|
||||
public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptTokenizer {
|
||||
public class EcmascriptTokenizer extends net.sourceforge.pmd.lang.ecmascript.cpd.EcmascriptCpdLexer implements Tokenizer {
|
||||
}
|
||||
|
@ -4,5 +4,5 @@
|
||||
|
||||
package net.sourceforge.pmd.cpd;
|
||||
|
||||
public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JSPTokenizer {
|
||||
public class JSPTokenizer extends net.sourceforge.pmd.lang.jsp.cpd.JspCpdLexer implements Tokenizer {
|
||||
}
|
||||
|
@ -9,7 +9,7 @@ import java.util.Properties;
|
||||
import net.sourceforge.pmd.lang.java.JavaLanguageModule;
|
||||
import net.sourceforge.pmd.lang.java.internal.JavaLanguageProperties;
|
||||
|
||||
public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaTokenizer {
|
||||
public class JavaTokenizer extends net.sourceforge.pmd.lang.java.cpd.JavaCpdLexer implements Tokenizer {
|
||||
public JavaTokenizer(Properties properties) {
|
||||
super(convertLanguageProperties(properties));
|
||||
}
|
||||
|
@ -0,0 +1,8 @@
|
||||
/**
|
||||
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
|
||||
*/
|
||||
|
||||
package net.sourceforge.pmd.cpd;
|
||||
|
||||
public interface Tokenizer extends CpdLexer {
|
||||
}
|
@ -20,9 +20,10 @@ import net.sourceforge.pmd.util.StringUtil;
|
||||
* Higher-quality lexers should be implemented with a lexer generator.
|
||||
*
|
||||
* <p>In PMD 7, this replaces AbstractTokenizer, which provided nearly
|
||||
* no more functionality.
|
||||
* no more functionality.</p>
|
||||
* <p>Note: This class has been called AnyTokenizer in PMD 6.</p>
|
||||
*/
|
||||
public class AnyTokenizer implements Tokenizer {
|
||||
public class AnyCpdLexer implements CpdLexer {
|
||||
|
||||
private static final Pattern DEFAULT_PATTERN = makePattern("");
|
||||
|
||||
@ -40,15 +41,15 @@ public class AnyTokenizer implements Tokenizer {
|
||||
private final Pattern pattern;
|
||||
private final String commentStart;
|
||||
|
||||
public AnyTokenizer() {
|
||||
public AnyCpdLexer() {
|
||||
this(DEFAULT_PATTERN, "");
|
||||
}
|
||||
|
||||
public AnyTokenizer(String eolCommentStart) {
|
||||
public AnyCpdLexer(String eolCommentStart) {
|
||||
this(makePattern(eolCommentStart), eolCommentStart);
|
||||
}
|
||||
|
||||
private AnyTokenizer(Pattern pattern, String commentStart) {
|
||||
private AnyCpdLexer(Pattern pattern, String commentStart) {
|
||||
this.pattern = pattern;
|
||||
this.commentStart = commentStart;
|
||||
}
|
@ -23,7 +23,7 @@ import net.sourceforge.pmd.internal.util.IOUtil;
|
||||
import net.sourceforge.pmd.lang.Language;
|
||||
import net.sourceforge.pmd.lang.LanguagePropertyBundle;
|
||||
import net.sourceforge.pmd.lang.ast.FileAnalysisException;
|
||||
import net.sourceforge.pmd.lang.ast.TokenMgrError;
|
||||
import net.sourceforge.pmd.lang.ast.LexException;
|
||||
import net.sourceforge.pmd.lang.document.FileCollector;
|
||||
import net.sourceforge.pmd.lang.document.FileId;
|
||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
@ -137,10 +137,10 @@ public final class CpdAnalysis implements AutoCloseable {
|
||||
this.listener = cpdListener;
|
||||
}
|
||||
|
||||
private int doTokenize(TextDocument document, Tokenizer tokenizer, Tokens tokens) throws IOException, TokenMgrError {
|
||||
private int doTokenize(TextDocument document, CpdLexer cpdLexer, Tokens tokens) throws IOException, LexException {
|
||||
LOGGER.trace("Tokenizing {}", document.getFileId().getAbsolutePath());
|
||||
int lastTokenSize = tokens.size();
|
||||
Tokenizer.tokenize(tokenizer, document, tokens);
|
||||
CpdLexer.tokenize(cpdLexer, document, tokens);
|
||||
return tokens.size() - lastTokenSize - 1; /* EOF */
|
||||
}
|
||||
|
||||
@ -152,12 +152,12 @@ public final class CpdAnalysis implements AutoCloseable {
|
||||
public void performAnalysis(Consumer<CPDReport> consumer) {
|
||||
|
||||
try (SourceManager sourceManager = new SourceManager(files.getCollectedFiles())) {
|
||||
Map<Language, Tokenizer> tokenizers =
|
||||
Map<Language, CpdLexer> tokenizers =
|
||||
sourceManager.getTextFiles().stream()
|
||||
.map(it -> it.getLanguageVersion().getLanguage())
|
||||
.distinct()
|
||||
.filter(it -> it instanceof CpdCapableLanguage)
|
||||
.collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdTokenizer(configuration.getLanguageProperties(lang))));
|
||||
.collect(Collectors.toMap(lang -> lang, lang -> ((CpdCapableLanguage) lang).createCpdLexer(configuration.getLanguageProperties(lang))));
|
||||
|
||||
Map<FileId, Integer> numberOfTokensPerFile = new HashMap<>();
|
||||
|
||||
@ -170,7 +170,7 @@ public final class CpdAnalysis implements AutoCloseable {
|
||||
int newTokens = doTokenize(textDocument, tokenizers.get(textFile.getLanguageVersion().getLanguage()), tokens);
|
||||
numberOfTokensPerFile.put(textDocument.getFileId(), newTokens);
|
||||
listener.addedFile(1);
|
||||
} catch (TokenMgrError | IOException e) {
|
||||
} catch (LexException | IOException e) {
|
||||
if (e instanceof FileAnalysisException) { // NOPMD
|
||||
((FileAnalysisException) e).setFileId(textFile.getFileId());
|
||||
}
|
||||
|
@ -16,7 +16,7 @@ public interface CpdCapableLanguage extends Language {
|
||||
|
||||
|
||||
/**
|
||||
* Create a new {@link Tokenizer} for this language, given
|
||||
* Create a new {@link CpdLexer} for this language, given
|
||||
* a property bundle with configuration. The bundle was created by
|
||||
* this instance using {@link #newPropertyBundle()}. It can be assumed
|
||||
* that the bundle will never be mutated anymore, and this method
|
||||
@ -26,7 +26,7 @@ public interface CpdCapableLanguage extends Language {
|
||||
*
|
||||
* @return A new language processor
|
||||
*/
|
||||
default Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
||||
return new AnyTokenizer();
|
||||
default CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||
return new AnyCpdLexer();
|
||||
}
|
||||
}
|
||||
|
@ -10,8 +10,10 @@ import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
|
||||
/**
|
||||
* Tokenizes a source file into tokens consumable by CPD.
|
||||
*
|
||||
* <p>Note: This interface has been called Tokenizer in PMD 6.</p>
|
||||
*/
|
||||
public interface Tokenizer {
|
||||
public interface CpdLexer {
|
||||
|
||||
/**
|
||||
* Tokenize the source code and record tokens using the provided token factory.
|
||||
@ -22,9 +24,9 @@ public interface Tokenizer {
|
||||
* Wraps a call to {@link #tokenize(TextDocument, TokenFactory)} to properly
|
||||
* create and close the token factory.
|
||||
*/
|
||||
static void tokenize(Tokenizer tokenizer, TextDocument textDocument, Tokens tokens) throws IOException {
|
||||
static void tokenize(CpdLexer cpdLexer, TextDocument textDocument, Tokens tokens) throws IOException {
|
||||
try (TokenFactory tf = Tokens.factoryForFile(textDocument, tokens)) {
|
||||
tokenizer.tokenize(textDocument, tf);
|
||||
cpdLexer.tokenize(textDocument, tf);
|
||||
}
|
||||
}
|
||||
}
|
@ -142,8 +142,8 @@ public class GUI implements CPDListener {
|
||||
.extensions(extension)
|
||||
.name("By extension...")) {
|
||||
@Override
|
||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
||||
return new AnyTokenizer();
|
||||
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||
return new AnyCpdLexer();
|
||||
}
|
||||
};
|
||||
}
|
||||
|
@ -7,12 +7,12 @@ package net.sourceforge.pmd.cpd;
|
||||
import org.checkerframework.checker.nullness.qual.NonNull;
|
||||
import org.checkerframework.checker.nullness.qual.Nullable;
|
||||
|
||||
import net.sourceforge.pmd.lang.ast.TokenMgrError;
|
||||
import net.sourceforge.pmd.lang.ast.LexException;
|
||||
import net.sourceforge.pmd.lang.document.FileLocation;
|
||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
|
||||
/**
|
||||
* Proxy to record tokens from within {@link Tokenizer#tokenize(TextDocument, TokenFactory)}.
|
||||
* Proxy to record tokens from within {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
|
||||
*/
|
||||
public interface TokenFactory extends AutoCloseable {
|
||||
|
||||
@ -43,7 +43,7 @@ public interface TokenFactory extends AutoCloseable {
|
||||
recordToken(image, location.getStartLine(), location.getStartColumn(), location.getEndLine(), location.getEndColumn());
|
||||
}
|
||||
|
||||
TokenMgrError makeLexException(int line, int column, String message, @Nullable Throwable cause);
|
||||
LexException makeLexException(int line, int column, String message, @Nullable Throwable cause);
|
||||
|
||||
/**
|
||||
* Sets the image of an existing token entry.
|
||||
@ -57,7 +57,7 @@ public interface TokenFactory extends AutoCloseable {
|
||||
|
||||
/**
|
||||
* This adds the EOF token, it must be called when
|
||||
* {@link Tokenizer#tokenize(TextDocument, TokenFactory)} is done.
|
||||
* {@link CpdLexer#tokenize(TextDocument, TokenFactory)} is done.
|
||||
*/
|
||||
@Override
|
||||
void close();
|
||||
|
@ -14,7 +14,7 @@ import org.checkerframework.checker.nullness.qual.NonNull;
|
||||
import org.checkerframework.checker.nullness.qual.Nullable;
|
||||
|
||||
import net.sourceforge.pmd.annotation.InternalApi;
|
||||
import net.sourceforge.pmd.lang.ast.TokenMgrError;
|
||||
import net.sourceforge.pmd.lang.ast.LexException;
|
||||
import net.sourceforge.pmd.lang.document.FileId;
|
||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
|
||||
@ -93,7 +93,7 @@ public class Tokens {
|
||||
|
||||
/**
|
||||
* Creates a token factory to process the given file with
|
||||
* {@link Tokenizer#tokenize(TextDocument, TokenFactory)}.
|
||||
* {@link CpdLexer#tokenize(TextDocument, TokenFactory)}.
|
||||
* Tokens are accumulated in the {@link Tokens} parameter.
|
||||
*
|
||||
* @param file Document for the file to process
|
||||
@ -117,8 +117,8 @@ public class Tokens {
|
||||
}
|
||||
|
||||
@Override
|
||||
public TokenMgrError makeLexException(int line, int column, String message, @Nullable Throwable cause) {
|
||||
return new TokenMgrError(line, column, fileId, message, cause);
|
||||
public LexException makeLexException(int line, int column, String message, @Nullable Throwable cause) {
|
||||
return new LexException(line, column, fileId, message, cause);
|
||||
}
|
||||
|
||||
@Override
|
||||
|
@ -10,16 +10,16 @@ import org.antlr.v4.runtime.CharStream;
|
||||
import org.antlr.v4.runtime.CharStreams;
|
||||
import org.antlr.v4.runtime.Lexer;
|
||||
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.lang.TokenManager;
|
||||
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrToken;
|
||||
import net.sourceforge.pmd.lang.ast.impl.antlr4.AntlrTokenManager;
|
||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
|
||||
/**
|
||||
* Generic implementation of a {@link Tokenizer} useful to any Antlr grammar.
|
||||
* Generic implementation of a {@link CpdLexer} useful to any Antlr grammar.
|
||||
*/
|
||||
public abstract class AntlrTokenizer extends TokenizerBase<AntlrToken> {
|
||||
public abstract class AntlrCpdLexer extends CpdLexerBase<AntlrToken> {
|
||||
@Override
|
||||
protected final TokenManager<AntlrToken> makeLexerImpl(TextDocument doc) throws IOException {
|
||||
CharStream charStream = CharStreams.fromReader(doc.newReader(), doc.getFileId().getAbsolutePath());
|
@ -6,16 +6,16 @@ package net.sourceforge.pmd.cpd.impl;
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.cpd.TokenFactory;
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.lang.TokenManager;
|
||||
import net.sourceforge.pmd.lang.ast.GenericToken;
|
||||
import net.sourceforge.pmd.lang.document.TextDocument;
|
||||
|
||||
/**
|
||||
* Generic base class for a {@link Tokenizer}.
|
||||
* Generic base class for a {@link CpdLexer}.
|
||||
*/
|
||||
public abstract class TokenizerBase<T extends GenericToken<T>> implements Tokenizer {
|
||||
public abstract class CpdLexerBase<T extends GenericToken<T>> implements CpdLexer {
|
||||
|
||||
protected abstract TokenManager<T> makeLexerImpl(TextDocument doc) throws IOException;
|
||||
|
@ -1,15 +0,0 @@
|
||||
/**
|
||||
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
|
||||
*/
|
||||
|
||||
package net.sourceforge.pmd.cpd.impl;
|
||||
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
|
||||
|
||||
/**
|
||||
* Base class for a {@link Tokenizer} for a language implemented by a JavaCC tokenizer.
|
||||
*/
|
||||
public abstract class JavaCCTokenizer extends TokenizerBase<JavaccToken> {
|
||||
|
||||
}
|
@ -0,0 +1,15 @@
|
||||
/**
|
||||
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
|
||||
*/
|
||||
|
||||
package net.sourceforge.pmd.cpd.impl;
|
||||
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.lang.ast.impl.javacc.JavaccToken;
|
||||
|
||||
/**
|
||||
* Base class for a {@link CpdLexer} for a language implemented by a JavaCC tokenizer.
|
||||
*/
|
||||
public abstract class JavaccCpdLexer extends CpdLexerBase<JavaccToken> {
|
||||
|
||||
}
|
@ -3,6 +3,6 @@
|
||||
*/
|
||||
|
||||
/**
|
||||
* Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.Tokenizer}.
|
||||
* Utilities to implement a CPD {@link net.sourceforge.pmd.cpd.CpdLexer}.
|
||||
*/
|
||||
package net.sourceforge.pmd.cpd.impl;
|
||||
|
@ -6,6 +6,6 @@
|
||||
* Token-based copy-paste detection.
|
||||
*
|
||||
* @see net.sourceforge.pmd.cpd.CpdAnalysis
|
||||
* @see net.sourceforge.pmd.cpd.Tokenizer
|
||||
* @see net.sourceforge.pmd.cpd.CpdLexer
|
||||
*/
|
||||
package net.sourceforge.pmd.cpd;
|
||||
|
@ -5,9 +5,9 @@
|
||||
package net.sourceforge.pmd.lang;
|
||||
|
||||
import net.sourceforge.pmd.annotation.Experimental;
|
||||
import net.sourceforge.pmd.cpd.AnyTokenizer;
|
||||
import net.sourceforge.pmd.cpd.AnyCpdLexer;
|
||||
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
|
||||
import net.sourceforge.pmd.cpd.Tokenizer;
|
||||
import net.sourceforge.pmd.cpd.CpdLexer;
|
||||
import net.sourceforge.pmd.lang.ast.AstInfo;
|
||||
import net.sourceforge.pmd.lang.ast.Parser;
|
||||
import net.sourceforge.pmd.lang.ast.Parser.ParserTask;
|
||||
@ -47,8 +47,8 @@ public final class PlainTextLanguage extends SimpleLanguageModuleBase implements
|
||||
}
|
||||
|
||||
@Override
|
||||
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
|
||||
return new AnyTokenizer();
|
||||
public CpdLexer createCpdLexer(LanguagePropertyBundle bundle) {
|
||||
return new AnyCpdLexer();
|
||||
}
|
||||
|
||||
private static final class TextLvh implements LanguageVersionHandler {
|
||||
|
@ -16,7 +16,7 @@ import net.sourceforge.pmd.lang.document.FileLocation;
|
||||
/**
|
||||
* An exception that occurs while processing a file. Subtypes include
|
||||
* <ul>
|
||||
* <li>{@link TokenMgrError}: lexical syntax errors
|
||||
* <li>{@link LexException}: lexical syntax errors
|
||||
* <li>{@link ParseException}: syntax errors
|
||||
* <li>{@link SemanticException}: exceptions occurring after the parsing
|
||||
* phase, because the source code is semantically invalid
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user