6fb5ac59b9
The columns that are reported by CPD were inconsistent across languages before. A language like Java (using a JavaCC-based tokenizer) would use a width of 8 for tabs, whereas a language like C# (using an Antlr-based tokenizer) would use 1 instead. This includes unit tests for most languages to ensure a tab character is counted as 1. The configuration for JavaCC has been adjusted to respect this as well.