Commit Graph

277 Commits

Author SHA1 Message Date
Sujin Philip
1212081de1 Support deferred freeing for CUDA memory
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of
times this happens by having a deferred free mechanism that frees a pool
of pointers together when a threshold is reached.

Especially helpful during virtual object transfers that requires a few small
allocations and frees.
2018-08-16 12:05:36 -04:00
Allison Vacanti
f6da092146 Use CUDA_ARCH instead of CUDACC to guard device-only code.
CUDACC is defined when compiling host code under nvcc, while
CUDA_ARCH is only defined for host code.
2018-08-09 11:57:05 -04:00
Allison Vacanti
2c079b96dd Make AtomicArrays work on CUDA 8.
CUDA 8.0 is erroring out in the cuda AtomicArray implementation:

https://open.cdash.org/viewBuildError.php?buildid=5489156

This patch fixes the error. See comments in source for more info.
2018-08-08 15:26:32 -04:00
Haocheng LIU
ce9cd8072a Use std::call_once to construct singeltons
By using `call_once` from C++11, we can simplify the logic in code
where we are querying same value variables from multiple threads.
2018-08-06 16:36:03 -04:00
Sujin Philip
259d670ab5 Merge topic 'cuda-per-thread-streams-2'
06dee259f Minimize cuda synchronizations

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1288
2018-07-25 15:07:39 -04:00
Robert Maynard
e031e64967 ExecutionArrayInterfaceBasic<T> explicitly construct DeviceAdapterId objects
Rather than implicitly presume the `VTKM_DEVICE_ADAPTER_` macros can
convert to DeviceAdapterId.
2018-07-25 12:04:30 -04:00
Robert Maynard
86b9ab9969 Refactor ExecutionArrayInterfaceBasic to use inheriting constructors 2018-07-25 12:03:48 -04:00
Robert Maynard
4240111dd8 Make sure VirtualObjectHandle tests include RuntimeDeviceTracker 2018-07-18 10:37:46 -04:00
Robert Maynard
8077b031a8 Merge topic 'uncomment_cuda_range_test'
1e478bbe6 Re-enable UnitTestCudaComputeRange

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1321
2018-07-17 13:28:05 -04:00
Robert Maynard
1e478bbe63 Re-enable UnitTestCudaComputeRange 2018-07-17 11:43:19 -04:00
Robert Maynard
bf49575e00 Remove unneeded typeinfo includes 2018-07-17 11:41:53 -04:00
Robert Maynard
64958b014b VTK-m now supports passing pointers when invoking worklets.
The original design of invoke and the transport infrastructure
relied on the implementation behavior of vtkm::cont types
such as ArrayHandle that used an internal shared_ptr to managed
state. This allowed passing by value instead of passing by
non-const ref when needing to transfer information to the device.

As VTK-m adds support for classes that use virtuals the ability
to pass by base pointer type allows for us to invoke worklets
using a base type without the risk of type slicing.

Additional by moving over to a non-const ref Invocation we
can update all transports that have 'output' to now be
by ref and therefore support types that can't be copied while
being 'more' correct.
2018-07-06 14:27:36 -04:00
Robert Maynard
9238cedcab Merge topic 'ice_nvcc_on_renar'
5ced0da8f Try to ice the ubuntu 17.10 + cuda 9.1 compiler

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1305
2018-07-05 11:36:16 -04:00
Robert Maynard
5ced0da8f5 Try to ice the ubuntu 17.10 + cuda 9.1 compiler 2018-07-05 09:14:52 -04:00
Robert Maynard
e5090e1289 Make sure the PointLocatorUniform uses the correct runtime device 2018-07-03 17:42:57 -04:00
Sujin Philip
06dee259f7 Minimize cuda synchronizations
1. Have a per-thread pinned array for cuda errors
2. Check for errors before scheduling new tasks and at explicit sync points
3. Remove explicit synchronizations from most places

Addresses part 2 of #168
2018-07-03 14:19:06 -04:00
ayenpure
e2dccee099 Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into spatialsearch 2018-06-30 11:56:33 -06:00
Kenneth Moreland
4459ab9174 Merge branch 'master' into 'pointlocator-general-interface'
# Conflicts:
#   vtkm/cont/PointLocatorUniformGrid.h
2018-06-28 12:51:08 -04:00
Kenneth Moreland
439beaaed9 Make point locator tests have consistent devices 2018-06-27 10:37:59 +02:00
Allison Vacanti
a8d8b3670d Suppress host/device warnings on CUDA atomics. 2018-06-25 14:53:53 -04:00
David Thompson
d8cf1f7b51 Merge topic 'geometry-squashed'
880d8a989 Add `vtkm/Geometry.h` and test it.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1262
2018-06-20 14:15:50 -04:00
David Thompson
880d8a989e Add vtkm/Geometry.h and test it.
This commit adds several geometric constructs to vtk-m
in the `vtkm/Geometry.h` header. They may be used from
both the execution and control environments.

We also add methods to perform projection and Gram-Schmidt
orthonormalization to `vtkm/VectorAnalysis.h`.

See `docs/changelog/geometry.md` included in this commit
for more information.
2018-06-20 11:58:14 -04:00
ayenpure
d8e8078099 Fixing the typos with ScanExclusiveByKey
- Fixed the typo
- Moved the test to vtkm/worklet/testing as vtkm/cont/testing does not execute with CUDA
2018-06-15 16:39:00 -07:00
Robert Maynard
8276e35cf4 Mark classes that should not be derived from as final. 2018-06-15 10:49:59 -04:00
Robert Maynard
82cdae0025 VTK-m waits for cuda streams to finish before host access
Previously it was possible for VTK-m to access memory from
the host before the computations in a stream finished.
2018-06-01 10:28:55 -04:00
Robert Maynard
9c3547bc7c VTK-m cuda runtime now handles no cuda runtime properly
Previously it would throw an uncaught exception and crash.
2018-05-31 10:07:37 -04:00
Allison Vacanti
1f6a662c0a Merge DevAdaptAlgoThrust --> DevAdaptAlgoCuda. 2018-05-29 14:07:29 -04:00
Allison Vacanti
be0c6a17a9 Move DevAdaptAtomicArrayImplementation to its own file. 2018-05-29 14:07:29 -04:00
Allison Vacanti
3af9f66083 Merge ArrayManagerExecutionThrustDevice into AMECuda. 2018-05-29 14:07:29 -04:00
Robert Maynard
4a520b7bdd Merge topic 'pascal_managed_memory_copy_non_blocking'
e0b6e698 copying cpu memory to pascal managed memory now works consistently.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1211
2018-05-18 15:15:17 -04:00
Robert Maynard
e0b6e69878 copying cpu memory to pascal managed memory now works consistently.
When copying small arrays from cpu memory to pascal memory we would
see subsequent kernels fail as the memory transfer hadn't finished.
This is a bug as each stream should act like a FIFO queue. So
for now when encountering this use case we explicitly synchronize
after the memcpy.
2018-05-16 17:56:50 -04:00
Robert Maynard
1c5feeb185 Make sure all device specific tests use the intended device.
This means that we not only setup the runtime device tracker
to force the intended device, it also means making sure
the default device is the error device.
2018-05-16 08:21:16 -04:00
Robert Maynard
e28244f345 Re-implement DeviceAdapterRuntimeDetector to avoid ODR violations.
The previous implementation of DeviceAdapterRuntimeDetector caused
multiple differing definitions of the same class to exist and
was causing the runtime device tracker to report CUDA as disabled
when it actually was enabled.

The ODR was caused by having a default implementation for
DeviceAdapterRuntimeDetector and a specific specialization for
CUDA. If a library had both CUDA and C++ sources it would pick up
both implementations and would have undefined behavior. In general
it would think the CUDA backend was disabled.

To avoid this kind of situation in the future I have reworked VTK-m
so that each device adapter must implement DeviceAdapterRuntimeDetector
for that device.
2018-05-15 13:08:34 -04:00
Robert Maynard
571556d984 CUDA's RuntimeDeviceTracker and Timer are now built as part of vtkm_cont
This is done to not only reduce the amount of code that users need
to generate but to reduce the amount of errors when using
the RuntimeDeviceTracker. If the runtime device tracker is initially
used in a library by a c++ file it will never properly detect the
cuda backend. By moving the code into vtkm_cont we can make sure
this problem doesn't occur.
2018-05-10 10:57:06 -04:00
Robert Maynard
364b366ab3 Correct signed/unsigned cast warnings from DeviceAdapterAlgorithmThrust
Found with CUDA 7.5
2018-05-08 15:29:11 -04:00
Robert Maynard
c9ba80ad93 Replace uint with vtkm::Id in DeviceAdapterAlgorithmThrust
The usage of uint was causing problems with CUDA + MSVC2015 as
type was not defined. Instead we use vtkm::Id as that was the expect
type to be passed to the task
2018-05-02 09:55:56 -04:00
Robert Maynard
b56894dd09 Move VTK-m Cuda backend over to a grid-stride iteration pattern.
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
2018-04-30 17:29:26 -04:00
Robert Maynard
b7e6371842 Correct issues found be enabling more CUDA warnings. 2018-04-23 14:27:53 -04:00
Matt Larsen
715141737f Merge topic 'typos'
efdf8543 Misc. Typos

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Matt Larsen <mlarsen@cs.uoregon.edu>
Merge-request: !1113
2018-04-06 18:04:46 -04:00
Robert Maynard
84311a2453 Merge branch 'master' into cmake_refactor 2018-04-05 10:18:36 -04:00
Robert Maynard
c123796949 VTK-m ArrayHandle can now take ownership of a user allocated memory location
Previously memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors
to support a Storage object that is being moved, we can clearly express that
the ArrayHandle now owns memory it didn't allocate.

Here is an example of how this is done:
```cpp
  T* buffer = new T[100];
  auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };

  vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
      storage(buffer, 100, user_free_function);
  vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));
```
2018-04-04 11:28:25 -04:00
Robert Maynard
707970f492 VTK-m StorageBasic is now able to give/take ownership of user allocated memory.
This fixes the three following issues with StorageBasic.

1. Memory that was allocated by VTK-m and Stolen by the user needed the
proper free function called which is generally StorageBasicAllocator::deallocate.
But that was hard for the user to hold onto. So now we provide a function
pointer to the correct free function.

2. Memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. This is now resolved by allowing the
user to specify a free function to be called on release.

3. When the CUDA backend allocates memory for an ArrayHandle that has no
control representation, and the location we are running on supports concurrent
managed access we want to specify that cuda managed memory as also the host memory.
This requires that StorageBasic be able to call an arbitrary new delete function
which is chosen at runtime.
2018-04-04 11:27:57 -04:00
Robert Maynard
8808b41fbd Merge branch 'master' into vtk-m-cmake_refactor 2018-03-29 22:51:26 -04:00
Robert Maynard
944bc3c0d6 Introduce vtkm::cont::ColorTable replacing vtkm::rendering::ColorTable
The new and improved vtkm::cont::ColorTable provides a more feature complete
color table implementation that is modeled after
vtkDiscretizableColorTransferFunction. This class therefore supports different
color spaces ( rgb, lab, hsv, diverging ) and supports execution across all
device adapters.
2018-03-28 16:11:23 -04:00
luz.paz
efdf854306 Misc. Typos
Found via `codespell` and `grep`
2018-03-28 09:45:07 -04:00
Robert Maynard
2bfbf0a902 Transfer of virtuals to the CUDA device now properly uses streams
This way when multiple threads are using VTK-m they all won't block while
one transfer a class with virtuals to the device.
2018-03-20 17:04:41 -04:00
Robert Maynard
6202d8d22d CudaAllocator guards all CUDA 8.0+ calls behind ifdef's. 2018-02-26 16:37:57 -05:00
Robert Maynard
e630ac5aa4 Merge branch 'master' into vtk-m-cmake_refactor 2018-02-23 14:52:00 -05:00
Robert Maynard
705528bf17 vtk-m ArrayHandle + basic storage has an optimized PrepareForDevice method
By hard coding the PrepareForDevice to know about all the different VTK-m
devices, we can have a single base class do the execution allocation, and not
have that logic repeated in each child class.
2018-02-16 10:00:28 -05:00
Robert Maynard
22f9ae3d24 vtk-m ArrayHandle + basic holds control data by StorageBasicBase
By making the array handle hold the control side data by the parent storage
class we remove significant code generation.
2018-02-16 09:59:20 -05:00