vtk-m

mirror of https://gitlab.kitware.com/vtk/vtk-m synced 2024-09-08 21:33:55 +00:00

Author	SHA1	Message	Date
Kenneth Moreland	f2b73fe2e4	Merge topic 'safe-cuda-runtime-config' f42f59d68 Fix crash when CUDA device is disabled Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Gunther Weber <ghweber@lbl.gov> Merge-request: !3216	2024-05-17 08:20:39 -04:00
Kenneth Moreland	f42f59d685	Fix crash when CUDA device is disabled There was an issue where if VTK-m was compiled with CUDA support but then run on a computer where no CUDA device was available, an inappropriate exception was thrown (instead of just disabling the device). The initialization code should now properly check for the existance of a CUDA device.	2024-05-03 17:03:20 -04:00
Kenneth Moreland	c44f686496	Add hints to device adapter scheduler The `DeviceAdapter` provides an abstract interface to the accelerator devices worklets and other algorithms run on. As such, the programmer has less control about how the device launches each worklet. Each device adapter has its own configuration parameters and other ways to attempt to optimize how things are run, but these are always a universal set of options that are applied to everything run on the device. There is no way to specify launch parameters for a particular worklet. To provide this information, VTK-m now supports `Hint`s to the device adapter. The `DeviceAdapterAlgorithm::Schedule` method takes a templated argument that is of the type `HintList`. This object contains a template list of `Hint` types that provide suggestions on how to launch the parallel execution. The device adapter will pick out hints that pertain to it and adjust its launching accordingly. These are called hints rather than, say, directives, because they don't force the device adapter to do anything. The device adapter is free to ignore any (and all) hints. The point is that the device adapter can take into account the information to try to optimize for itself. A provided hint can be tied to specific device adapters. In this way, an worklet can further optimize itself. If multiple hints match a device adapter, the last one in the list will be selected. The `Worklet` base now has an internal type named `Hints` that points to a `HintList` that is applied when the worklet is scheduled. Derived worklet classes can provide hints by simply defining their own `Hints` type.	2024-02-09 10:42:23 -05:00
Kenneth Moreland	5b8c282e9f	Continue transfer of Users' Guide text These changes complete the using part of the guide.	2023-11-21 10:33:45 -07:00
Vicente Adolfo Bolea Sanchez	1e27495809	diy,mpi: Enable GPU AWARE MPI buffers This commit adds the flag VTKm_ENABLE_GPU_MPI which when enable it will use GPU AWARE MPI. - This will only work with GPUs and MPI implementation that supports GPU AWARE MPI calls. - Enabling VTKm_ENABLE_GPU_MPI without MPI/GPU support might results in errors when running VTK-m with DIY/MPI. - Only the following tests can run with this feature if enabled: - UnitTestSerializationDataSet - UnitTestSerializationArrayHandle	2023-05-30 12:13:03 -04:00
Vicente Adolfo Bolea Sanchez	049d0cca8c	cmake: namespace vtkm export targets	2022-12-09 18:46:56 -05:00
Kenneth Moreland	ea560e9486	Remove deprecated virtual methods Several revisions ago, the ability to use virtual methods in the execution environment was deprecated. Completely remove this functionality for the VTK-m 2.0 release.	2022-10-28 10:56:52 -06:00
Dave Pugmire	115db2175e	fix compile error for cuda.	2022-08-16 10:57:01 -04:00
Dave Pugmire	baac7e1fb9	Put sync memory into RuntimeDeviceTracker	2022-08-15 14:30:13 -04:00
Dave Pugmire	3920806a66	Address reviewer comments and suggestions.	2022-08-11 13:43:49 -04:00
Dave Pugmire	b441c9e23d	Add Sync/Async memory allocator functions.	2022-08-10 15:09:53 -04:00
Dave Pugmire	0d30520bb3	Use async memory allocation for cuda when managed memory is off.	2022-08-10 10:21:48 -04:00
Kenneth Moreland	9855db0961	Add test for array and datas that are cleaned up after finalize It is the case that arrays might be deallocated from a device after the device is closed. This can happen, for example, when an `ArrayHandle` is declared globally. It gets constructed before VTK-m is initialized. This is OK as long as you do not otherwise use it until VTK-m is initialized. However, if you use that `ArrayHandle` to move data to a device and that data is left on the device when the object closes, then the `ArrayHandle` will be left holding a reference to invalid device memory once the device is shut down. This can cause problems when the `ArrayHandle` destructs itself and attempts to release this memory. The VTK-m devices should gracefully handle deallocations that happen after device shutdown.	2022-07-01 12:36:55 -06:00
Nickolas Davis	9730de8074	remove cudaGetDevice calls, favor runtime device config	2021-09-02 09:17:36 -06:00
Nickolas Davis	adac415f15	implement cuda runtime device configuraton	2021-09-02 09:12:21 -06:00
nadavi	cd2a6c1385	implement kokkos runtime device configuration	2021-08-18 13:23:30 -06:00
nadavi	fe3b82b99c	implement return codes and protected virtual parsing of arguments	2021-06-23 17:58:38 +00:00
nadavi	408beefc0a	Implement RuntimeDeviceConfiguration	2021-06-17 17:56:38 +00:00
Kenneth Moreland	8eed21d085	Do not declare headers for virtual classes that are removed	2021-04-28 15:28:06 -06:00
Kenneth Moreland	cb3bb43ff9	Completely deprecate virtual methods Deprecate `VirtualObjectHandle` and all other classes that are used to implement objects with virtual methods in the execution environment. Additionally, the code is updated so that if the `VTKm_NO_DEPRECATED_VIRTUAL` flag is set none of the code is compiled at all. This opens us up to opportunities that do not work with virtual methods such as backends that do not support virtual methods and dynamic libraries for CUDA.	2021-04-28 07:28:32 -06:00
Kenneth Moreland	a7100c845a	Do not assume CUDA reduce operator is unary The `Reduce` algorithm is sometimes used to convert an input type to a different output type. For example, you can compute the min and max at the same time by making the output of the binary functor a pair of the input type. However, for this to work with the CUDA algorithm, you have to be able to also convert the input type to the output type. This was previously done by treating the binary operator as also a unary operator. That's fine for custom operators, but if you are using something like `thrust::plus`, it has no unary operation. (Why would it?) So, detect whether the operator has a unary operation. If it does, use it to cast from the input portal to the output type. If it does not, just use `static_cast`. Thus, the operator only has to have the unary operation if `static_cast` does not work.	2021-03-03 09:39:51 -07:00
Kenneth Moreland	34412ff298	Deprecate ArrayHandle::Shrink This method has been subsumed by Allocate with vtkm::CopyFlag::On.	2021-02-01 08:07:40 -07:00
Kenneth Moreland	90050b96e4	Remove `ArrayManagerExecution` This class was used indirectly by the old `ArrayHandle`, through `ArrayHandleTransfer`, to move data to and from a device. This functionality has been replaced in the new `ArrayHandle`s through the `Buffer` class (which can be compiled into libraries rather than make every translation unit compile their own template). This commit removes `ArrayManagerExecution` and all the implementations that the device adapters were required to make. None of this code was in any use anymore.	2020-12-08 13:18:44 -07:00
Kenneth Moreland	b33c54bf61	Add ScheduleTask to performance log When `DeviceAdapterAlgorithm::ScheduleTask` was called directly (i.e. not through `Schedule`), nothing was added to the log. Adding `VTKM_LOG_SCOPE` to these methods so that all scheduling is added to the performance log.	2020-10-25 17:22:38 -06:00
Kitware Robot	cf0cdcf7d1	clang-format: reformat the repository with clang-format-9	2020-08-24 14:01:08 -04:00
Robert Maynard	b48c19f251	Correct warnings found by using clang as the host compiler for cuda	2020-08-24 08:57:38 -04:00
Kenneth Moreland	13056b3af5	Deprecate AtomicInterfaceControl and AtomicInterfaceExecution Now that we have the functions in `vtkm/Atomic.h`, we can deprecate (and eventually remove) the more cumbersome classes `AtomicInterfaceControl` and `AtomicInterfaceExecution`. Also reversed the order of the `expected` and `desired` parameters of `vtkm::AtomicCompareAndSwap`. I think the former order makes more sense and matches more other implementations (such as `std::atomic` and the GCC `__atomic` built ins). However, there are still some non-deprecated classes with similar methods that cannot easily be switched. Thus, it's better to be inconsistent with most other libraries and consistent with ourself than to be inconsitent with ourself.	2020-08-20 13:40:44 -06:00
Kenneth Moreland	d3503bfaba	Implement AtomicInterfaceControl/Execution with free functions Now that we have atomic free functions (e.g. `vtkm::AtomicAdd()`), we no longer need special implementations for control and each execution device. (Well, technically we do have special implementations for each, but they are handled with compiler directives in the free functions.) Convert the old atomic interface classes (`AtomicInterfaceControl` and `AtomicInterfaceExecution`) to use the new atomic free functions. This will allow us to test the new atomic functions everywhere that atomics are used in VTK-m. Once verified, we can deprecate the old atomic interface classes.	2020-08-20 13:40:44 -06:00
Sujin Philip	452f61e290	Add Kokkos backend	2020-08-12 13:55:24 -04:00
Kenneth Moreland	18b5be92d6	Fix issue with CUDA and ArrayHandleMultiplexer When you try to call the `Reduce` operation in the CUDA device adapter with a sufficently complex interator type, you get a compile error that says `error: cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch`. This appears to be a bug in either nvcc or Thrust. I believe it is related to the following reported issues: * https://github.com/thrust/thrust/issues/928 * https://github.com/thrust/thrust/issues/1044 Work around this problem by making a special condition for calling `Reduce` with an `ArrayHandleMultiplexer` that calls the generic algorithm in `DeviceAdapterAlgorithmGeneral` instead of the algorithm in Thrust.	2020-07-06 13:51:36 -06:00
Kenneth Moreland	a47fd42bc1	Pin user provided memory in ArrayHandle Often when a user gives memory to an `ArrayHandle`, she wants data to be written into the memory given to be used elsewhere. Previously, the `Buffer` objects would delete the given buffer as soon as a write buffer was created elsewhere. That was a problem if a user wants VTK-m to write results right into a given buffer. Instead, when a user provides memory, "pin" that memory so that the `ArrayHandle` never deletes it.	2020-06-25 14:02:46 -06:00
Kenneth Moreland	56bec1dd7b	Replace basic ArrayHandle implementation to use Buffers This encapsulates a lot of the required memory management into the Buffer object and related code. Many now unneeded classes were deleted.	2020-06-25 14:02:26 -06:00
Kenneth Moreland	8f7b0d18be	Add Buffer class The buffer class encapsulates the movement of raw C arrays between host and devices. The `Buffer` class itself is not associated with any device. Instead, `Buffer` is used in conjunction with a new templated class named `DeviceAdapterMemoryManager` that can allocate data on a given device and transfer data as necessary. `DeviceAdapterMemoryManager` will eventually replace the more complicated device adapter classes that manage data on a device. The code in `DeviceAdapterMemoryManager` is actually enclosed in virtual methods. This allows us to limit the number of classes that need to be compiled for a device. Rather, the implementation of `DeviceAdapterMemoryManager` is compiled once with whatever compiler is necessary, and then the `RuntimeDeviceInformation` is used to get the correct object instance.	2020-06-25 14:01:39 -06:00
Kenneth Moreland	4f9fa08fa1	Remove ArrayHandleStreaming capabilities The `ArrayHandleStreaming` class stems from an old research project experimenting with bringing data from an `ArrayHandle` in parts and overlapping device transfer and execution. It works, but only in very limited contexts. Thus, it is not actually used today. Plus, the feature requires global indexing to be permutated throughout the worklet dispatching classes of VTK-m for no further reason. Because it is not really used, there are other more promising approaches on the horizon, and it makes further scheduling improvements difficult, we are removing this functionality.	2020-03-24 15:01:56 -06:00
Robert Maynard	8377806778	Merge topic 'introduce_mapfield_3d_scheduling' 1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1938	2020-02-27 08:02:52 -05:00
Robert Maynard	1f1688483e	Initial infrastructure to allow WorkletMapField to have 3D scheduling	2020-02-25 15:23:41 -05:00
Kenneth Moreland	3671cbe168	Fix token issues with CUDA	2020-02-25 09:39:30 -07:00
Kenneth Moreland	ad0a53af71	Convert execution preparation to use tokens Marked the old versions of PrepareFor* that do not use tokens as deprecated and moved all of the code to use the new versions that require a token. This makes the scope of the execution object more explicit so that it will be kept while in use and can potentially be reclaimed afterward.	2020-02-25 09:39:19 -07:00
Kenneth Moreland	76ce9c87f0	Support using Token calling PrepareForExecution in ExecutionObject The old version of ExecutionObject (that only takes a device) is still supported, but you will get a deprecated warning if that is what is defined. Supporing this also included sending vtkm::cont::Token through the vtkm::cont::arg::Transport mechanism, which was a change that propogated through a lot of code.	2020-02-25 07:41:39 -07:00
Allison Vacanti	46b7155bdb	Add 64-bit CUDA atomic store.	2020-01-08 10:58:51 -05:00
Allison Vacanti	539f6e5ad7	Port benchmarking framework to Google Benchmark.	2020-01-08 10:58:51 -05:00
Kenneth Moreland	92db376236	Convert uses of ListTagBase to List	2019-12-06 15:37:46 -07:00
Allison Vacanti	b9affb7edc	Disable copy for RAII helper.	2019-09-09 17:59:38 -04:00
Allison Vacanti	ea0bbfeefc	Increase CUDA stack size for ParticleAdvection worklets. Sometimes the CUDA runtime would not allocate sufficient stack space for the particle advection code to run. This issue was exposed by !1737 -- for some reason, once those changes to unrelated filters/worklets are added to VTK, CUDA allocates less stack and the following tests would fail: UnitTestLagrangianFilterCUDA UnitTestLagrangianStructuresFilterCUDA UnitTestStreamlineFilterCUDA UnitTestStreamSurfaceFilterCUDA These were fixed by increasing the stack size in the particle advection worklet Run(...) methods. An RAII helper has been added that will restore the previous stack size in case an exception is thrown, and the KDTree code has been updated to use this helper when it adjusts the CUDA stack allocation.	2019-09-09 16:06:23 -04:00
Allison Vacanti	884616788a	Simplify and extend AtomicArray implementation. - Use AtomicInterface to implement device-specific atomic operations. - Remove DeviceAdapterAtomicArrayImplementations. - Extend supported atomic types to include unsigned 32/64-bit ints. - Add a static_assert to check that AtomicArray type is supported. - Add documentation for AtomicArrayExecutionObject, including a CAS example. - Add a `T Get(idx)` method to AtomicArrayExecutionObject that does an atomic load, and update existing CAS usage to use this instead of `Add(idx, 0)`.	2019-08-23 15:40:37 -04:00
Allison Vacanti	0e728c8000	Update atomic interfaces to support Add/CAS for UInt32/64. These will be used for the AtomicArray implementation.	2019-08-23 15:40:37 -04:00
Allison Vacanti	112024dae2	Fix CUDA shfl usage. There was a bug in the implementations of CountSetBits and BitFieldToUnorderedSet.	2019-08-01 10:57:57 -04:00
Allison Vacanti	920ef9b3b9	Merge topic 'bit_algorithms' f370857c1 Add CountSetBits and Fill device algorithms. Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1696	2019-06-25 15:42:16 -04:00
Allison Vacanti	f370857c15	Add CountSetBits and Fill device algorithms.	2019-06-25 11:30:39 -04:00
Robert Maynard	1ea386222e	cuda copy functions don't launch on length zero arrays	2019-06-20 16:54:23 -04:00

1 2 3 4 5 ...

299 Commits