When an error occured the exit status of PMD was 0. The error occured
because the logic of 'NO_EXIT_AFTER_RUN' was incorrect/inverted. A
'System.exit()' was performed when 'NO_EXIT_AFTER_RUN' was set while it
should be skipped. I copied the fix from the CPDCommandLineInterface
class.
Furthermore I made sure all error messages are printed to System.err
instead of System.out, so they can easily extracted/found when PMD is
invoked by external tools.
* Upgrade asm so that it understands default method and static methods in interfaces jdk8 class files
* see 9b91690f218408ba1c65e633a824660e18080b00 for ASM4 support
Some of the tokenizers ignore comments and therefore the line count of a
duplication can differ per file. Take for example the following files:
FileA.java:
1: public class FileA {
2: pulbic String Foo() {
3: return "Foo";
4: }
5: }
FileB.java:
1: public class FileB {
2: pulbic String Foo() {
3: // This is a comment
4: return "Foo";
5: }
6: }
When comments are ignored and not tokenized, the duplication consist of
the following tokens:
'{', 'public', 'String', 'Foo', '(', ')', '{', 'return', 'Foo', ';',
'}', '}'
For 'FileA.java' the duplication is 5 lines long, it starts at line 1
and ends at line 5. For 'FileB.java' the duplication is 6 lines long, it
starts at line 1 and ends at line 6.
Note that this is just 1 example, because for most tokenizers comments
and white spaces are not significant. For example the following file
contains the same duplication all on 1 line:
FileC.java
1: public class FileC { public String Foo() { return "Foo"; } }
For us the correct line count per file is important, because we
highlight the duplications in an annotated source view and show the
percentage of duplicated code the file contains. The current output
formats only contain 1 line count per duplication and file set. For the
above example CPD would output the following:
Found a 4 line (12 tokens) duplication in the following files:
Starting at line 1 of FileA.java
Starting at line 1 of FileB.java
For FileB.java this is not correct and would lead to incorrect
percentage of duplicated code. (66% (4 of 6 lines) instead of the
correct 83% (5 of 6 lines)).
To fix the problem, I created an extra output format
'csv_with_linecount_per_file' which outputs the correct line count per
file. The format contains the following:
tokens,occurrences
<nr of tokens>,<nr of occurrences>(,<begin line>,<line count>,<file
name>)+
For the above example the output would be
tokens,occurrences
12,2,1,4,FileA.java,1,5,FileB.java
The '--files' command line option of CPD lets you specify which
directories and files should be scanned for duplicated code.
Unfortunately it didn't work when you specified files instead of
directories, for example: '--files foo.c bar.c'. In this example CPD
executed successful, but the files 'foo.c' and 'bar.c' are completely
ignored.