880d8a989 Add `vtkm/Geometry.h` and test it.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1262
This commit adds several geometric constructs to vtk-m
in the `vtkm/Geometry.h` header. They may be used from
both the execution and control environments.
We also add methods to perform projection and Gram-Schmidt
orthonormalization to `vtkm/VectorAnalysis.h`.
See `docs/changelog/geometry.md` included in this commit
for more information.
When copying small arrays from cpu memory to pascal memory we would
see subsequent kernels fail as the memory transfer hadn't finished.
This is a bug as each stream should act like a FIFO queue. So
for now when encountering this use case we explicitly synchronize
after the memcpy.
This means that we not only setup the runtime device tracker
to force the intended device, it also means making sure
the default device is the error device.
The previous implementation of DeviceAdapterRuntimeDetector caused
multiple differing definitions of the same class to exist and
was causing the runtime device tracker to report CUDA as disabled
when it actually was enabled.
The ODR was caused by having a default implementation for
DeviceAdapterRuntimeDetector and a specific specialization for
CUDA. If a library had both CUDA and C++ sources it would pick up
both implementations and would have undefined behavior. In general
it would think the CUDA backend was disabled.
To avoid this kind of situation in the future I have reworked VTK-m
so that each device adapter must implement DeviceAdapterRuntimeDetector
for that device.
This is done to not only reduce the amount of code that users need
to generate but to reduce the amount of errors when using
the RuntimeDeviceTracker. If the runtime device tracker is initially
used in a library by a c++ file it will never properly detect the
cuda backend. By moving the code into vtkm_cont we can make sure
this problem doesn't occur.
The usage of uint was causing problems with CUDA + MSVC2015 as
type was not defined. Instead we use vtkm::Id as that was the expect
type to be passed to the task
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
Previously memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors
to support a Storage object that is being moved, we can clearly express that
the ArrayHandle now owns memory it didn't allocate.
Here is an example of how this is done:
```cpp
T* buffer = new T[100];
auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };
vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
storage(buffer, 100, user_free_function);
vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));
```
This fixes the three following issues with StorageBasic.
1. Memory that was allocated by VTK-m and Stolen by the user needed the
proper free function called which is generally StorageBasicAllocator::deallocate.
But that was hard for the user to hold onto. So now we provide a function
pointer to the correct free function.
2. Memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. This is now resolved by allowing the
user to specify a free function to be called on release.
3. When the CUDA backend allocates memory for an ArrayHandle that has no
control representation, and the location we are running on supports concurrent
managed access we want to specify that cuda managed memory as also the host memory.
This requires that StorageBasic be able to call an arbitrary new delete function
which is chosen at runtime.
The new and improved vtkm::cont::ColorTable provides a more feature complete
color table implementation that is modeled after
vtkDiscretizableColorTransferFunction. This class therefore supports different
color spaces ( rgb, lab, hsv, diverging ) and supports execution across all
device adapters.
By hard coding the PrepareForDevice to know about all the different VTK-m
devices, we can have a single base class do the execution allocation, and not
have that logic repeated in each child class.