For some reason, `UnitTestContourTreeUniformAugmentedFilterCUDA` was
deadlocking on some of the dashboards when `VTKM_ASSERT` was changed to
be empty when compiling CUDA kernels (for compiler performance reasons).
Fix this by calling `assert` directly in this case.
For the life of me, I cannot figure out why this would be an issue.
Clearly the `assert` is never actually called or else the test would
error out (unless a special condition in CUDA is causing it to be
hidden). But if you take out the code, something changes to lock up the
kernel.
This merge request is Phase 1 of several to implement the distributed parallel
contour tree in VTKm. This merge requests adds the base outline for the
algorithm. The implementation of the details of the algorithm in the
BoundaryRestrictedAugmentedContourTree.h is currently still missing.
However, these will require a substantial (~3000) lines of additional code.
The goal is to stage the integration process across merge requests to make
the review process simpler.
Marked the old versions of PrepareFor* that do not use tokens as
deprecated and moved all of the code to use the new versions that
require a token. This makes the scope of the execution object more
explicit so that it will be kept while in use and can potentially be
reclaimed afterward.
Replace code such as `myArray.GetPortalControl().Get(0)` with
`vtkm::cont::ArrayGetValue(0, myArray)`, which will just retrieve
the single value without transferring the entire contents of
`myArray` to the host. This should prevent memory thrashing.
Also did some general style clean-ups and fixed loops that called
`GetPortalControl` in the loop body.
The Invoker is a control side object that handles the construction
of the relevant worklet dispatcher. Moving it to control makes it
obvious that it isn't an algorithm itself but a way to launch
worklets.
We previously had to mark the make_FunctionInterface function as
host/device as it could be used in either place. The ramifications
of this is that each time we launched a worklet any input parameter
had to either suppress cuda exceptions, or had to rely on the
DispatcherBase to suppress the warnings.
By making make_FunctionInterface only host callable ( as it is ),
we can remove all of our unneccesary suppression logic and better
expose real issues with code that is marked host/device but can't
be due to calling things such as std::abort