Commit Graph

232 Commits

Author SHA1 Message Date
Kenneth Moreland
4d9ce24888 Synchronize CUDA timer when stopping it
Previously, when Stop was called on a Cuda timer, it would record a stop
event but it would not synchronize it at that time. Instead, the
synchronize was only called when GetElapsedTime was called. The problem
is that the time of the event is only marked when synchronize is called.
Thus, if the event completed before GetElapsedTime was called, it would
record the time from when the event acutally happened to the time when
GetElapsedTime was called as part of the elapsed time, which is
incorrect.

Fix the problem by synchronizing when Stop is called. Although this
makes the Timer more invasive, generally using the Timer can cause
synchronization to happen. This behavior is consistent with the Timer
implementation for other devices.
2019-02-28 15:08:32 -07:00
Kenneth Moreland
85265a9c84 Add const correctness to Timer
It should be possible to query a vtkm::cont::Timer without modifying it.
As such, its query functions (such as Stopped and GetElapsedTime) should
be const.
2019-02-28 15:08:16 -07:00
Haocheng LIU
0696ae135e Merge topic 'asynchronize-timer'
415252c66 Introduce asynchronous and device independent timer

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Haocheng LIU <haocheng.liu@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1530
2019-02-05 12:02:59 -05:00
Haocheng LIU
415252c662 Introduce asynchronous and device independent timer
The timer class now is asynchronous and device independent. it's using an
similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(),
and GetElapsedTime() function. For convenience and backward compability, Each
Start() function call will call Reset() internally and each GetElapsedTime()
function call will call Stop() function if it hasn't been called yet for keeping
backward compatibility purpose.

Bascially it can be used in two modes:

* Create a Timer without any device info. vtkm::cont::Timer time;

  * It would enable timers for all enabled devices on the machine. Users can get a
specific elapsed time by passing a device id into the GetElapsedtime function.
If no device is provided, it would pick the maximum of all timer results - the
logic behind this decision is that if cuda is disabled, openmp, serial and tbb
roughly give the same results; if cuda is enabled it's safe to return the
maximum elapsed time since users are more interested in the device execution
time rather than the kernal launch time. The Ready function can be handy here
to query the status of the timer.

* Create a Timer with a device id. vtkm::cont::Timer time((vtkm::cont::DeviceAdapterTagCuda()));

  * It works as the old timer that times for a specific device id.
2019-02-05 12:01:56 -05:00
Robert Maynard
d0a70946b8 Simplify the DeviceAdapterRuntimeDetectorCuda to not do a kernel launch.
The kernel launch component of the runtime device adapter is fairly
pointless. If the hardware supports CUDA we should expect that
VTK-m has the correct kernel versions.

Plus in the original version if the CUDA device was being used
and the kernel launch returns cudaErrorDevicesUnavailable it
was never possible to restore CUDA support. Now what happens
is that the runtime tracker is marked as failed, but the
calling code can always go back and trying the device again.
2019-02-04 13:27:20 -05:00
Robert Maynard
5508d17c31 Merge topic 'correct_broken_install'
24e71d251 VTK-m yet again has properly installed headers.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1525
2019-01-24 14:59:41 -05:00
Robert Maynard
24e71d251b VTK-m yet again has properly installed headers.
Fixes the install issues mentioned in #342
2019-01-24 14:26:40 -05:00
Allison Vacanti
03fc7b66d0 Add VTKM_CUDA_DEVICE_PASS preprocessing definition.
This is only set while compiling device code, and is useful
for code that needs different implementations on devices (e.g.
they call CUDA device intrinsics, etc).
2019-01-24 11:23:45 -05:00
Robert Maynard
4ec5bae02d Remove VTK-m TestBuild infrastructure
The purpose of the TestBuild infrastructure was to confirm that
VTK-m didn't have any lexical issues when it was a pure header
only project. As we now move to have more compiled components
the need for this form of testing is mitigated. Combined
with the issue of TestBuilds causing MSVC issues, we should
just remove this infrastructure.
2019-01-16 10:04:33 -06:00
Robert Maynard
f1e1a524e9 Require CMake 3.8 to build VTK-m. 2019-01-09 16:01:22 -05:00
Robert Maynard
718caaaeac CudaAllocator allows managed memory to be explicitly disabled 2018-12-28 11:30:29 -05:00
Robert Maynard
90bb23de6b CudaAllocator::Initialize correctly uses managed memory when possible
Previously the logic would always think managed memory wasn't supported
2018-12-20 17:21:55 -05:00
Allison Vacanti
16c4dde2ee Merge topic 'cuda10_warning'
0e105eae6 cudaPointerAttributes::isManaged deprecated in CUDA 10.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1430
2018-10-10 15:05:57 -04:00
Allison Vacanti
0e105eae6d cudaPointerAttributes::isManaged deprecated in CUDA 10.
Update code to support both the old and new way of checking this.
2018-10-10 13:51:56 -04:00
Allison Vacanti
bd337854ec Initial implementation of general logging.
Addresses #291.
2018-10-02 11:37:55 -04:00
Kenneth Moreland
98a0a20feb Allow ArrayHandleTransform to work with ExecObject
This change allows you to set a subclass of
vtkm::cont::ExecutionObjectBase as a functor
used in ArrayHandleTransform. This latter class will then detect that
the functor is an ExecObject and will call PrepareForExecution with the
appropriate device to get the actual Functor object.

This change allows you to use virtual objects and other device dependent
objects as functors for ArrayHandleTransform without knowing a priori
what device the portal will be used on.
2018-09-05 13:11:04 -06:00
Kenneth Moreland
d879188de0 Make DispatcherBase invoke using a TryExecute
Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.

Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.
2018-08-29 19:18:54 -07:00
Allison Vacanti
024a75821d Make DeviceAdapterId constructor protected.
This forces users to use a defined tag, since they shouldn't need
to create their own.
2018-08-24 16:38:08 -04:00
Haocheng LIU
7d22132253 Merge topic 'allow-disabling/enabling-cuda-managed-memory'
e34301eca Allow disabling/enabling of CUDA managed memory via an env variable

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1359
2018-08-17 13:14:02 -04:00
Haocheng LIU
e34301eca8 Allow disabling/enabling of CUDA managed memory via an env variable
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is
capable of doing so.
2018-08-17 11:10:15 -04:00
Sujin Philip
1212081de1 Support deferred freeing for CUDA memory
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of
times this happens by having a deferred free mechanism that frees a pool
of pointers together when a threshold is reached.

Especially helpful during virtual object transfers that requires a few small
allocations and frees.
2018-08-16 12:05:36 -04:00
Allison Vacanti
f6da092146 Use CUDA_ARCH instead of CUDACC to guard device-only code.
CUDACC is defined when compiling host code under nvcc, while
CUDA_ARCH is only defined for host code.
2018-08-09 11:57:05 -04:00
Allison Vacanti
2c079b96dd Make AtomicArrays work on CUDA 8.
CUDA 8.0 is erroring out in the cuda AtomicArray implementation:

https://open.cdash.org/viewBuildError.php?buildid=5489156

This patch fixes the error. See comments in source for more info.
2018-08-08 15:26:32 -04:00
Haocheng LIU
ce9cd8072a Use std::call_once to construct singeltons
By using `call_once` from C++11, we can simplify the logic in code
where we are querying same value variables from multiple threads.
2018-08-06 16:36:03 -04:00
Sujin Philip
259d670ab5 Merge topic 'cuda-per-thread-streams-2'
06dee259f Minimize cuda synchronizations

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1288
2018-07-25 15:07:39 -04:00
Robert Maynard
e031e64967 ExecutionArrayInterfaceBasic<T> explicitly construct DeviceAdapterId objects
Rather than implicitly presume the `VTKM_DEVICE_ADAPTER_` macros can
convert to DeviceAdapterId.
2018-07-25 12:04:30 -04:00
Robert Maynard
86b9ab9969 Refactor ExecutionArrayInterfaceBasic to use inheriting constructors 2018-07-25 12:03:48 -04:00
Robert Maynard
bf49575e00 Remove unneeded typeinfo includes 2018-07-17 11:41:53 -04:00
Sujin Philip
06dee259f7 Minimize cuda synchronizations
1. Have a per-thread pinned array for cuda errors
2. Check for errors before scheduling new tasks and at explicit sync points
3. Remove explicit synchronizations from most places

Addresses part 2 of #168
2018-07-03 14:19:06 -04:00
ayenpure
e2dccee099 Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into spatialsearch 2018-06-30 11:56:33 -06:00
Allison Vacanti
a8d8b3670d Suppress host/device warnings on CUDA atomics. 2018-06-25 14:53:53 -04:00
ayenpure
d8e8078099 Fixing the typos with ScanExclusiveByKey
- Fixed the typo
- Moved the test to vtkm/worklet/testing as vtkm/cont/testing does not execute with CUDA
2018-06-15 16:39:00 -07:00
Robert Maynard
8276e35cf4 Mark classes that should not be derived from as final. 2018-06-15 10:49:59 -04:00
Robert Maynard
82cdae0025 VTK-m waits for cuda streams to finish before host access
Previously it was possible for VTK-m to access memory from
the host before the computations in a stream finished.
2018-06-01 10:28:55 -04:00
Robert Maynard
9c3547bc7c VTK-m cuda runtime now handles no cuda runtime properly
Previously it would throw an uncaught exception and crash.
2018-05-31 10:07:37 -04:00
Allison Vacanti
1f6a662c0a Merge DevAdaptAlgoThrust --> DevAdaptAlgoCuda. 2018-05-29 14:07:29 -04:00
Allison Vacanti
be0c6a17a9 Move DevAdaptAtomicArrayImplementation to its own file. 2018-05-29 14:07:29 -04:00
Allison Vacanti
3af9f66083 Merge ArrayManagerExecutionThrustDevice into AMECuda. 2018-05-29 14:07:29 -04:00
Robert Maynard
e0b6e69878 copying cpu memory to pascal managed memory now works consistently.
When copying small arrays from cpu memory to pascal memory we would
see subsequent kernels fail as the memory transfer hadn't finished.
This is a bug as each stream should act like a FIFO queue. So
for now when encountering this use case we explicitly synchronize
after the memcpy.
2018-05-16 17:56:50 -04:00
Robert Maynard
e28244f345 Re-implement DeviceAdapterRuntimeDetector to avoid ODR violations.
The previous implementation of DeviceAdapterRuntimeDetector caused
multiple differing definitions of the same class to exist and
was causing the runtime device tracker to report CUDA as disabled
when it actually was enabled.

The ODR was caused by having a default implementation for
DeviceAdapterRuntimeDetector and a specific specialization for
CUDA. If a library had both CUDA and C++ sources it would pick up
both implementations and would have undefined behavior. In general
it would think the CUDA backend was disabled.

To avoid this kind of situation in the future I have reworked VTK-m
so that each device adapter must implement DeviceAdapterRuntimeDetector
for that device.
2018-05-15 13:08:34 -04:00
Robert Maynard
571556d984 CUDA's RuntimeDeviceTracker and Timer are now built as part of vtkm_cont
This is done to not only reduce the amount of code that users need
to generate but to reduce the amount of errors when using
the RuntimeDeviceTracker. If the runtime device tracker is initially
used in a library by a c++ file it will never properly detect the
cuda backend. By moving the code into vtkm_cont we can make sure
this problem doesn't occur.
2018-05-10 10:57:06 -04:00
Robert Maynard
364b366ab3 Correct signed/unsigned cast warnings from DeviceAdapterAlgorithmThrust
Found with CUDA 7.5
2018-05-08 15:29:11 -04:00
Robert Maynard
c9ba80ad93 Replace uint with vtkm::Id in DeviceAdapterAlgorithmThrust
The usage of uint was causing problems with CUDA + MSVC2015 as
type was not defined. Instead we use vtkm::Id as that was the expect
type to be passed to the task
2018-05-02 09:55:56 -04:00
Robert Maynard
b56894dd09 Move VTK-m Cuda backend over to a grid-stride iteration pattern.
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
2018-04-30 17:29:26 -04:00
Robert Maynard
b7e6371842 Correct issues found be enabling more CUDA warnings. 2018-04-23 14:27:53 -04:00
Robert Maynard
84311a2453 Merge branch 'master' into cmake_refactor 2018-04-05 10:18:36 -04:00
Robert Maynard
707970f492 VTK-m StorageBasic is now able to give/take ownership of user allocated memory.
This fixes the three following issues with StorageBasic.

1. Memory that was allocated by VTK-m and Stolen by the user needed the
proper free function called which is generally StorageBasicAllocator::deallocate.
But that was hard for the user to hold onto. So now we provide a function
pointer to the correct free function.

2. Memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. This is now resolved by allowing the
user to specify a free function to be called on release.

3. When the CUDA backend allocates memory for an ArrayHandle that has no
control representation, and the location we are running on supports concurrent
managed access we want to specify that cuda managed memory as also the host memory.
This requires that StorageBasic be able to call an arbitrary new delete function
which is chosen at runtime.
2018-04-04 11:27:57 -04:00
Robert Maynard
8808b41fbd Merge branch 'master' into vtk-m-cmake_refactor 2018-03-29 22:51:26 -04:00
Robert Maynard
2bfbf0a902 Transfer of virtuals to the CUDA device now properly uses streams
This way when multiple threads are using VTK-m they all won't block while
one transfer a class with virtuals to the device.
2018-03-20 17:04:41 -04:00
Robert Maynard
6202d8d22d CudaAllocator guards all CUDA 8.0+ calls behind ifdef's. 2018-02-26 16:37:57 -05:00