Ensure CPD uses tab width of 1 for tabs consistently

The columns that are reported by CPD were inconsistent across languages
before. A language like Java (using a JavaCC-based tokenizer) would use
a width of 8 for tabs, whereas a language like C# (using an Antlr-based
tokenizer) would use 1 instead.

This includes unit tests for most languages to ensure a tab character is
counted as 1. The configuration for JavaCC has been adjusted to respect
this as well.
This commit is contained in:
Maikel Steneker
2020-07-20 10:42:21 +02:00
parent 25405eb870
commit 6fb5ac59b9
45 changed files with 724 additions and 62 deletions

View File

@ -41,4 +41,9 @@ public class ScalaTokenizerTest extends CpdTextComparisonTest {
ex.expect(TokenMgrError.class);
doTest("unlexable_sample");
}
@Test
public void testTabWidth() {
doTest("tabWidth");
}
}

View File

@ -0,0 +1,5 @@
object Main {
def main(args: Array[String]): Unit = {
println("Hello, World!")
}
}

View File

@ -0,0 +1,30 @@
[Image] or [Truncated image[ Bcol Ecol
L1
[object] 1 7
[Main] 8 12
[{] 13 14
L2
[def] 2 5
[main] 6 10
[(] 10 11
[args] 11 15
[:] 15 16
[Array] 17 22
[\[] 22 23
[String] 23 29
[\]] 29 30
[)] 30 31
[:] 31 32
[Unit] 33 37
[=] 38 39
[{] 40 41
L3
[println] 3 10
[(] 10 11
["Hello, World!"] 11 26
[)] 26 27
L4
[}] 2 3
L5
[}] 1 2
EOF