1. Use cudaPerThreadStream instead of the default streams
2. Since there have been changes to ArrayHandle code, the API to create
ArrayHandle from a device pointer has changed.
When copying small arrays from cpu memory to pascal memory we would
see subsequent kernels fail as the memory transfer hadn't finished.
This is a bug as each stream should act like a FIFO queue. So
for now when encountering this use case we explicitly synchronize
after the memcpy.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
Previously TransferToOpenGL would rely on every array handle implementing
the CopyInto method for transferring to work properly. This was problematic
as most Implicit arrays don't implement CopyInto.
Now we use the Devices built in Copy infrastructure to facilitate moving
data from an implicit array to concrete memory which we can be passed
to OpenGL. As an additional optimization, the temporary memory for this
interop is cached in the bufferstate.
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
These asserts are consolidated into the unified Assert.h. Also made some
minor edits to add asserts where appropriate and a little bit of
reconfiguring as found.