vtk-m2

Author	SHA1	Message	Date
Kenneth Moreland	4d9ce24888	Synchronize CUDA timer when stopping it Previously, when Stop was called on a Cuda timer, it would record a stop event but it would not synchronize it at that time. Instead, the synchronize was only called when GetElapsedTime was called. The problem is that the time of the event is only marked when synchronize is called. Thus, if the event completed before GetElapsedTime was called, it would record the time from when the event acutally happened to the time when GetElapsedTime was called as part of the elapsed time, which is incorrect. Fix the problem by synchronizing when Stop is called. Although this makes the Timer more invasive, generally using the Timer can cause synchronization to happen. This behavior is consistent with the Timer implementation for other devices.	2019-02-28 15:08:32 -07:00
Kenneth Moreland	85265a9c84	Add const correctness to Timer It should be possible to query a vtkm::cont::Timer without modifying it. As such, its query functions (such as Stopped and GetElapsedTime) should be const.	2019-02-28 15:08:16 -07:00
Haocheng LIU	0696ae135e	Merge topic 'asynchronize-timer' 415252c66 Introduce asynchronous and device independent timer Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Haocheng LIU <haocheng.liu@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1530	2019-02-05 12:02:59 -05:00
Haocheng LIU	415252c662	Introduce asynchronous and device independent timer The timer class now is asynchronous and device independent. it's using an similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(), and GetElapsedTime() function. For convenience and backward compability, Each Start() function call will call Reset() internally and each GetElapsedTime() function call will call Stop() function if it hasn't been called yet for keeping backward compatibility purpose. Bascially it can be used in two modes: * Create a Timer without any device info. vtkm::cont::Timer time; * It would enable timers for all enabled devices on the machine. Users can get a specific elapsed time by passing a device id into the GetElapsedtime function. If no device is provided, it would pick the maximum of all timer results - the logic behind this decision is that if cuda is disabled, openmp, serial and tbb roughly give the same results; if cuda is enabled it's safe to return the maximum elapsed time since users are more interested in the device execution time rather than the kernal launch time. The Ready function can be handy here to query the status of the timer. * Create a Timer with a device id. vtkm::cont::Timer time((vtkm::cont::DeviceAdapterTagCuda())); * It works as the old timer that times for a specific device id.	2019-02-05 12:01:56 -05:00
Robert Maynard	d0a70946b8	Simplify the DeviceAdapterRuntimeDetectorCuda to not do a kernel launch. The kernel launch component of the runtime device adapter is fairly pointless. If the hardware supports CUDA we should expect that VTK-m has the correct kernel versions. Plus in the original version if the CUDA device was being used and the kernel launch returns cudaErrorDevicesUnavailable it was never possible to restore CUDA support. Now what happens is that the runtime tracker is marked as failed, but the calling code can always go back and trying the device again.	2019-02-04 13:27:20 -05:00
Robert Maynard	5508d17c31	Merge topic 'correct_broken_install' 24e71d251 VTK-m yet again has properly installed headers. Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1525	2019-01-24 14:59:41 -05:00
Robert Maynard	24e71d251b	VTK-m yet again has properly installed headers. Fixes the install issues mentioned in #342	2019-01-24 14:26:40 -05:00
Allison Vacanti	03fc7b66d0	Add VTKM_CUDA_DEVICE_PASS preprocessing definition. This is only set while compiling device code, and is useful for code that needs different implementations on devices (e.g. they call CUDA device intrinsics, etc).	2019-01-24 11:23:45 -05:00
Robert Maynard	4ec5bae02d	Remove VTK-m TestBuild infrastructure The purpose of the TestBuild infrastructure was to confirm that VTK-m didn't have any lexical issues when it was a pure header only project. As we now move to have more compiled components the need for this form of testing is mitigated. Combined with the issue of TestBuilds causing MSVC issues, we should just remove this infrastructure.	2019-01-16 10:04:33 -06:00
Robert Maynard	f1e1a524e9	Require CMake 3.8 to build VTK-m.	2019-01-09 16:01:22 -05:00
Robert Maynard	718caaaeac	CudaAllocator allows managed memory to be explicitly disabled	2018-12-28 11:30:29 -05:00
Robert Maynard	90bb23de6b	CudaAllocator::Initialize correctly uses managed memory when possible Previously the logic would always think managed memory wasn't supported	2018-12-20 17:21:55 -05:00
Allison Vacanti	16c4dde2ee	Merge topic 'cuda10_warning' 0e105eae6 cudaPointerAttributes::isManaged deprecated in CUDA 10. Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1430	2018-10-10 15:05:57 -04:00
Allison Vacanti	0e105eae6d	cudaPointerAttributes::isManaged deprecated in CUDA 10. Update code to support both the old and new way of checking this.	2018-10-10 13:51:56 -04:00
Allison Vacanti	bd337854ec	Initial implementation of general logging. Addresses #291.	2018-10-02 11:37:55 -04:00
Kenneth Moreland	98a0a20feb	Allow ArrayHandleTransform to work with ExecObject This change allows you to set a subclass of vtkm::cont::ExecutionObjectBase as a functor used in ArrayHandleTransform. This latter class will then detect that the functor is an ExecObject and will call PrepareForExecution with the appropriate device to get the actual Functor object. This change allows you to use virtual objects and other device dependent objects as functors for ArrayHandleTransform without knowing a priori what device the portal will be used on.	2018-09-05 13:11:04 -06:00
Kenneth Moreland	d879188de0	Make DispatcherBase invoke using a TryExecute Rather than force all dispatchers to be templated on a device adapter, instead use a TryExecute internally within the invoke to select a device adapter. Because this removes the need to declare a device when invoking a worklet, this commit also removes the need to declare a device in several other areas of the code.	2018-08-29 19:18:54 -07:00
Allison Vacanti	024a75821d	Make DeviceAdapterId constructor protected. This forces users to use a defined tag, since they shouldn't need to create their own.	2018-08-24 16:38:08 -04:00
Haocheng LIU	7d22132253	Merge topic 'allow-disabling/enabling-cuda-managed-memory' e34301eca Allow disabling/enabling of CUDA managed memory via an env variable Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1359	2018-08-17 13:14:02 -04:00
Haocheng LIU	e34301eca8	Allow disabling/enabling of CUDA managed memory via an env variable By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1, users are able to disable CUDA managed memory even though the hardware is capable of doing so.	2018-08-17 11:10:15 -04:00
Sujin Philip	1212081de1	Support deferred freeing for CUDA memory Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of times this happens by having a deferred free mechanism that frees a pool of pointers together when a threshold is reached. Especially helpful during virtual object transfers that requires a few small allocations and frees.	2018-08-16 12:05:36 -04:00
Allison Vacanti	f6da092146	Use CUDA_ARCH instead of CUDACC to guard device-only code. CUDACC is defined when compiling host code under nvcc, while CUDA_ARCH is only defined for host code.	2018-08-09 11:57:05 -04:00
Allison Vacanti	2c079b96dd	Make AtomicArrays work on CUDA 8. CUDA 8.0 is erroring out in the cuda AtomicArray implementation: https://open.cdash.org/viewBuildError.php?buildid=5489156 This patch fixes the error. See comments in source for more info.	2018-08-08 15:26:32 -04:00
Haocheng LIU	ce9cd8072a	Use std::call_once to construct singeltons By using `call_once` from C++11, we can simplify the logic in code where we are querying same value variables from multiple threads.	2018-08-06 16:36:03 -04:00
Sujin Philip	259d670ab5	Merge topic 'cuda-per-thread-streams-2' 06dee259f Minimize cuda synchronizations Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1288	2018-07-25 15:07:39 -04:00
Robert Maynard	e031e64967	ExecutionArrayInterfaceBasic<T> explicitly construct DeviceAdapterId objects Rather than implicitly presume the `VTKM_DEVICE_ADAPTER_` macros can convert to DeviceAdapterId.	2018-07-25 12:04:30 -04:00
Robert Maynard	86b9ab9969	Refactor ExecutionArrayInterfaceBasic to use inheriting constructors	2018-07-25 12:03:48 -04:00
Robert Maynard	bf49575e00	Remove unneeded typeinfo includes	2018-07-17 11:41:53 -04:00
Sujin Philip	06dee259f7	Minimize cuda synchronizations 1. Have a per-thread pinned array for cuda errors 2. Check for errors before scheduling new tasks and at explicit sync points 3. Remove explicit synchronizations from most places Addresses part 2 of #168	2018-07-03 14:19:06 -04:00
ayenpure	e2dccee099	Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into spatialsearch	2018-06-30 11:56:33 -06:00
Allison Vacanti	a8d8b3670d	Suppress host/device warnings on CUDA atomics.	2018-06-25 14:53:53 -04:00
ayenpure	d8e8078099	Fixing the typos with ScanExclusiveByKey - Fixed the typo - Moved the test to vtkm/worklet/testing as vtkm/cont/testing does not execute with CUDA	2018-06-15 16:39:00 -07:00
Robert Maynard	8276e35cf4	Mark classes that should not be derived from as final.	2018-06-15 10:49:59 -04:00
Robert Maynard	82cdae0025	VTK-m waits for cuda streams to finish before host access Previously it was possible for VTK-m to access memory from the host before the computations in a stream finished.	2018-06-01 10:28:55 -04:00
Robert Maynard	9c3547bc7c	VTK-m cuda runtime now handles no cuda runtime properly Previously it would throw an uncaught exception and crash.	2018-05-31 10:07:37 -04:00
Allison Vacanti	1f6a662c0a	Merge DevAdaptAlgoThrust --> DevAdaptAlgoCuda.	2018-05-29 14:07:29 -04:00
Allison Vacanti	be0c6a17a9	Move DevAdaptAtomicArrayImplementation to its own file.	2018-05-29 14:07:29 -04:00
Allison Vacanti	3af9f66083	Merge ArrayManagerExecutionThrustDevice into AMECuda.	2018-05-29 14:07:29 -04:00
Robert Maynard	e0b6e69878	copying cpu memory to pascal managed memory now works consistently. When copying small arrays from cpu memory to pascal memory we would see subsequent kernels fail as the memory transfer hadn't finished. This is a bug as each stream should act like a FIFO queue. So for now when encountering this use case we explicitly synchronize after the memcpy.	2018-05-16 17:56:50 -04:00
Robert Maynard	e28244f345	Re-implement DeviceAdapterRuntimeDetector to avoid ODR violations. The previous implementation of DeviceAdapterRuntimeDetector caused multiple differing definitions of the same class to exist and was causing the runtime device tracker to report CUDA as disabled when it actually was enabled. The ODR was caused by having a default implementation for DeviceAdapterRuntimeDetector and a specific specialization for CUDA. If a library had both CUDA and C++ sources it would pick up both implementations and would have undefined behavior. In general it would think the CUDA backend was disabled. To avoid this kind of situation in the future I have reworked VTK-m so that each device adapter must implement DeviceAdapterRuntimeDetector for that device.	2018-05-15 13:08:34 -04:00
Robert Maynard	571556d984	CUDA's RuntimeDeviceTracker and Timer are now built as part of vtkm_cont This is done to not only reduce the amount of code that users need to generate but to reduce the amount of errors when using the RuntimeDeviceTracker. If the runtime device tracker is initially used in a library by a c++ file it will never properly detect the cuda backend. By moving the code into vtkm_cont we can make sure this problem doesn't occur.	2018-05-10 10:57:06 -04:00
Robert Maynard	364b366ab3	Correct signed/unsigned cast warnings from DeviceAdapterAlgorithmThrust Found with CUDA 7.5	2018-05-08 15:29:11 -04:00
Robert Maynard	c9ba80ad93	Replace uint with vtkm::Id in DeviceAdapterAlgorithmThrust The usage of uint was causing problems with CUDA + MSVC2015 as type was not defined. Instead we use vtkm::Id as that was the expect type to be passed to the task	2018-05-02 09:55:56 -04:00
Robert Maynard	b56894dd09	Move VTK-m Cuda backend over to a grid-stride iteration pattern. This allows for easier host side logic when determining grid and block sizes, and allows for a smaller library side by moving some logic into compiled in functions.	2018-04-30 17:29:26 -04:00
Robert Maynard	b7e6371842	Correct issues found be enabling more CUDA warnings.	2018-04-23 14:27:53 -04:00
Robert Maynard	84311a2453	Merge branch 'master' into cmake_refactor	2018-04-05 10:18:36 -04:00
Robert Maynard	707970f492	VTK-m StorageBasic is now able to give/take ownership of user allocated memory. This fixes the three following issues with StorageBasic. 1. Memory that was allocated by VTK-m and Stolen by the user needed the proper free function called which is generally StorageBasicAllocator::deallocate. But that was hard for the user to hold onto. So now we provide a function pointer to the correct free function. 2. Memory that was allocated outside of VTK-m was impossible to transfer to VTK-m as we didn't know how to free it. This is now resolved by allowing the user to specify a free function to be called on release. 3. When the CUDA backend allocates memory for an ArrayHandle that has no control representation, and the location we are running on supports concurrent managed access we want to specify that cuda managed memory as also the host memory. This requires that StorageBasic be able to call an arbitrary new delete function which is chosen at runtime.	2018-04-04 11:27:57 -04:00
Robert Maynard	8808b41fbd	Merge branch 'master' into vtk-m-cmake_refactor	2018-03-29 22:51:26 -04:00
Robert Maynard	2bfbf0a902	Transfer of virtuals to the CUDA device now properly uses streams This way when multiple threads are using VTK-m they all won't block while one transfer a class with virtuals to the device.	2018-03-20 17:04:41 -04:00
Robert Maynard	6202d8d22d	CudaAllocator guards all CUDA 8.0+ calls behind ifdef's.	2018-02-26 16:37:57 -05:00

1 2 3 4 5

232 Commits