The columns that are reported by CPD were inconsistent across languages
before. A language like Java (using a JavaCC-based tokenizer) would use
a width of 8 for tabs, whereas a language like C# (using an Antlr-based
tokenizer) would use 1 instead.
This includes unit tests for most languages to ensure a tab character is
counted as 1. The configuration for JavaCC has been adjusted to respect
this as well.