Our approach of using the underlying allocator inside thrust was a bad approach,
for some reason it fails to properly allocate uint8's or int8's on the correct
boundaries. I expect that this logic is somewhere else in the code and
instead we should use thrust::system::cuda::vector which does this properly.
The namespaces need to be different for each test, or else only the first
implementation of the function will be used for all tests that call that
function.
Also updated the test to verify that we can count starting from a non zero
number.
Porting the dax device adapter over to vtkm. Unlike the dax version, doesn't
use the thrust::device_vector, but instead uses thrust::system calls so that
we can support multiple thrust based backends.
Also this has Texture Memory support for input array handles. Some more work
will need to be done to ArrayHandle so that everything works when using an
ArrayHandle inplace with texture memory bindings.