Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
The implementation was calling PrepareForOutput on the delegate arrays
rather than PrepareForInPlace, do when used with CUDA you did not get
the data on the device.
Also added a regression test to check this.
Class that need to be passed across dynamic library boundaries such as
DynamicArrayHandle need to be properly export. One of 'tricks' of this
is that templated classes such as PolymorphicArrayHandleContainer need
the Type and Storage types to have public visibility.
This makes sure that all vtkm storage tags have public visibility so
in the future we can transfer dynamic array handles across libraries.
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
These asserts are consolidated into the unified Assert.h. Also made some
minor edits to add asserts where appropriate and a little bit of
reconfiguring as found.
The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
Instead of having a single specialization for sort and zip handles,
we know handle any fancy handles being passed to the cuda device adapter.
This was done by reworking how we represent fancy iterators inside thrust,
and instead of using a transform iterator + counting iterator we just use
a iterator_facade.