Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
When the intel compiler has vectorization enabled ( -O2/-O3 ) it converts the
`adjacent/hypotenuse` divide operation into reciprocal (rcpps) and
multiply (mulps) operations. This causes a change in the expected result that
is larger than the default tolerance of test_equal. So to resolve the problem
we increase the tolerance for acos when using icc.
When the intel compiler has vectorization enabled ( -O2/-O3 ) it converts the
`adjacent/hypotenuse` divide operation into reciprocal (rcpps) and
multiply (mulps) operations. This causes a change in the expected result that
is larger than the tolerance of test_equal.
There were many tests that created code paths for every base and Vec
type that VTK-m supports (up to 4 components). Although this is
admirable, it is also excessive, and our compile times for the tests are
very long.
To shorten compile times, remove the TryAllTypes method. Replace it with
a version of TryTypes that uses a default list of "exemplar" set of
integers, floats, and Vecs.
The PGI compiler appearently saw the condition (var != var) and optimized
it to always be false. However, in this particular case var was holding
a NaN value that we were testing to make sure it does not equal itself.
Get around the problem by comparing the variable to the result of a
function call.
Previously, the templated Vec classes had overloaded multiply operators
to allow it to be multiplied by any basic type (float, double, int,
long, char, etc.). This was there to make it easier to multiply a vector
by a scalar without having to jump through introspection hoops for the
component types. However, using them almost always resulted in a
conversion warning on some type of compiler, so I think it is easier
just to remove them.
Also fix some issues that caused the compile to fail when trying to
run some of the math functions on a CUDA device. In particular, CUDA
is picky about using a global const on a device when the const type is
not one of the basic C types.