vtk-m2

Author	SHA1	Message	Date
Robert Maynard	8377806778	Merge topic 'introduce_mapfield_3d_scheduling' 1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1938	2020-02-27 08:02:52 -05:00
Robert Maynard	1f1688483e	Initial infrastructure to allow WorkletMapField to have 3D scheduling	2020-02-25 15:23:41 -05:00
Kenneth Moreland	3671cbe168	Fix token issues with CUDA	2020-02-25 09:39:30 -07:00
Kenneth Moreland	ad0a53af71	Convert execution preparation to use tokens Marked the old versions of PrepareFor* that do not use tokens as deprecated and moved all of the code to use the new versions that require a token. This makes the scope of the execution object more explicit so that it will be kept while in use and can potentially be reclaimed afterward.	2020-02-25 09:39:19 -07:00
Allison Vacanti	b9affb7edc	Disable copy for RAII helper.	2019-09-09 17:59:38 -04:00
Allison Vacanti	ea0bbfeefc	Increase CUDA stack size for ParticleAdvection worklets. Sometimes the CUDA runtime would not allocate sufficient stack space for the particle advection code to run. This issue was exposed by !1737 -- for some reason, once those changes to unrelated filters/worklets are added to VTK, CUDA allocates less stack and the following tests would fail: UnitTestLagrangianFilterCUDA UnitTestLagrangianStructuresFilterCUDA UnitTestStreamlineFilterCUDA UnitTestStreamSurfaceFilterCUDA These were fixed by increasing the stack size in the particle advection worklet Run(...) methods. An RAII helper has been added that will restore the previous stack size in case an exception is thrown, and the KDTree code has been updated to use this helper when it adjusts the CUDA stack allocation.	2019-09-09 16:06:23 -04:00
Allison Vacanti	884616788a	Simplify and extend AtomicArray implementation. - Use AtomicInterface to implement device-specific atomic operations. - Remove DeviceAdapterAtomicArrayImplementations. - Extend supported atomic types to include unsigned 32/64-bit ints. - Add a static_assert to check that AtomicArray type is supported. - Add documentation for AtomicArrayExecutionObject, including a CAS example. - Add a `T Get(idx)` method to AtomicArrayExecutionObject that does an atomic load, and update existing CAS usage to use this instead of `Add(idx, 0)`.	2019-08-23 15:40:37 -04:00
Allison Vacanti	112024dae2	Fix CUDA shfl usage. There was a bug in the implementations of CountSetBits and BitFieldToUnorderedSet.	2019-08-01 10:57:57 -04:00
Allison Vacanti	920ef9b3b9	Merge topic 'bit_algorithms' f370857c1 Add CountSetBits and Fill device algorithms. Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1696	2019-06-25 15:42:16 -04:00
Allison Vacanti	f370857c15	Add CountSetBits and Fill device algorithms.	2019-06-25 11:30:39 -04:00
Robert Maynard	1ea386222e	cuda copy functions don't launch on length zero arrays	2019-06-20 16:54:23 -04:00
Robert Maynard	8aaf922aa4	Introduce a log level that details kernel launch parameters	2019-06-18 15:01:07 -04:00
Robert Maynard	770912f991	Correct compiler issues found with GCC 4.8.5 + CUDA 9.2 on summit	2019-05-02 10:27:48 -04:00
Robert Maynard	63c931e639	Correct location of ThrustPatches which clang formatter moved	2019-04-23 15:02:58 -04:00
nadavi	fbcea82e78	conslidate the license statement	2019-04-17 10:57:13 -06:00
Robert Maynard	047b646517	VTK-m now provides better scheduling parameters controls VTK-m now offers a more GPU aware set of defaults for kernel scheduling. When VTK-m first launches a kernel we do system introspection and determine what GPU's are on the machine and than match this information to a preset table of values. The implementation is designed in a way that allows for VTK-m to offer both specific presets for a given GPU ( V100 ) or for an entire generation of cards ( Pascal ). Currently VTK-m offers preset tables for the following GPU's: - Tesla V100 - Tesla P100 If the hardware doesn't match a specific GPU card we than try to find the nearest know hardware generation and use those defaults. Currently we offer defaults for - Older than Pascal Hardware - Pascal Hardware - Volta+ Hardware Some users have workloads that don't align with the defaults provided by VTK-m. When that is the cause, it is possible to override the defaults by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`. As shown below: ```cpp ScheduleParameters CustomScheduleValues(char const* name, int major, int minor, int multiProcessorCount, int maxThreadsPerMultiProcessor, int maxThreadsPerBlock) { ScheduleParameters params { 64 * multiProcessorCount, //1d blocks 64, //1d threads per block 64 * multiProcessorCount, //2d blocks { 8, 8, 1 }, //2d threads per block 64 * multiProcessorCount, //3d blocks { 4, 4, 4 } }; //3d threads per block return params; } vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues); ```	2019-04-17 08:32:16 -04:00
Robert Maynard	a5dbe1ece3	Merge topic 'bitfields' 661fb64de AtomicInterfaceControl functions are marked with VTKM_SUPPRESS_EXEC_WARNINGS 0c70f9b9a Add BitFieldIn/Out/InOut worklet signature tags. a66510e81 Add ArrayHandleBitField, a boolean-valued AH backed by a BitField. 56cc5c3d3 Add support for BitFields. d01b97382 Allow VTKM_SUPPRESS_EXEC_WARNINGS to be used inside macros. 2f2ca9370 Add bit operations FindFirstSetBit and CountSetBits to Math.h. Acked-by: Kitware Robot <kwrobot@kitware.com> Merge-request: !1629	2019-04-11 12:32:03 -04:00
Allison Vacanti	56cc5c3d3a	Add support for BitFields. BitFields are: - Stored in memory using a contiguous buffer of bits. - Accessible via portals, a la ArrayHandle. - Portals operate on individual bits or words. - Operations may be atomic for safe use from concurrent kernels. The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle containing the indices of all set bits, in no particular order. The new AtomicInterface classes provide an abstraction into bitwise atomic operations across control and execution environments and are used to implement the BitPortals.	2019-04-11 08:27:17 -04:00
Robert Maynard	89ec4aae2f	Reduction on CUDA handles different input and output types better When reducing an input type that differs from the output type you need to write a custom binary operator that also implements how to do the unary transformation.	2019-04-10 14:44:44 -04:00
Robert Maynard	f4840618cf	Make sure ThrustPatches is included before thrust.	2019-04-03 08:51:05 -04:00
Allison Vacanti	bd337854ec	Initial implementation of general logging. Addresses #291.	2018-10-02 11:37:55 -04:00
Kenneth Moreland	98a0a20feb	Allow ArrayHandleTransform to work with ExecObject This change allows you to set a subclass of vtkm::cont::ExecutionObjectBase as a functor used in ArrayHandleTransform. This latter class will then detect that the functor is an ExecObject and will call PrepareForExecution with the appropriate device to get the actual Functor object. This change allows you to use virtual objects and other device dependent objects as functors for ArrayHandleTransform without knowing a priori what device the portal will be used on.	2018-09-05 13:11:04 -06:00
Sujin Philip	259d670ab5	Merge topic 'cuda-per-thread-streams-2' 06dee259f Minimize cuda synchronizations Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1288	2018-07-25 15:07:39 -04:00
Sujin Philip	06dee259f7	Minimize cuda synchronizations 1. Have a per-thread pinned array for cuda errors 2. Check for errors before scheduling new tasks and at explicit sync points 3. Remove explicit synchronizations from most places Addresses part 2 of #168	2018-07-03 14:19:06 -04:00
ayenpure	e2dccee099	Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into spatialsearch	2018-06-30 11:56:33 -06:00
ayenpure	d8e8078099	Fixing the typos with ScanExclusiveByKey - Fixed the typo - Moved the test to vtkm/worklet/testing as vtkm/cont/testing does not execute with CUDA	2018-06-15 16:39:00 -07:00
Robert Maynard	82cdae0025	VTK-m waits for cuda streams to finish before host access Previously it was possible for VTK-m to access memory from the host before the computations in a stream finished.	2018-06-01 10:28:55 -04:00
Allison Vacanti	1f6a662c0a	Merge DevAdaptAlgoThrust --> DevAdaptAlgoCuda.	2018-05-29 14:07:29 -04:00
Allison Vacanti	be0c6a17a9	Move DevAdaptAtomicArrayImplementation to its own file.	2018-05-29 14:07:29 -04:00
Robert Maynard	571556d984	CUDA's RuntimeDeviceTracker and Timer are now built as part of vtkm_cont This is done to not only reduce the amount of code that users need to generate but to reduce the amount of errors when using the RuntimeDeviceTracker. If the runtime device tracker is initially used in a library by a c++ file it will never properly detect the cuda backend. By moving the code into vtkm_cont we can make sure this problem doesn't occur.	2018-05-10 10:57:06 -04:00
Robert Maynard	b56894dd09	Move VTK-m Cuda backend over to a grid-stride iteration pattern. This allows for easier host side logic when determining grid and block sizes, and allows for a smaller library side by moving some logic into compiled in functions.	2018-04-30 17:29:26 -04:00
Robert Maynard	2bfbf0a902	Transfer of virtuals to the CUDA device now properly uses streams This way when multiple threads are using VTK-m they all won't block while one transfer a class with virtuals to the device.	2018-03-20 17:04:41 -04:00
Robert Maynard	ef611239f6	Don't allow DeviceTaskTypes to construct tasks from rvalues.	2018-01-18 13:55:37 -05:00
Kenneth Moreland	c3a3184d51	Update copyright for Sandia Sandia National Laboratories recently changed management from the Sandia Corporation to the National Technology & Engineering Solutions of Sandia, LLC (NTESS). The copyright statements need to be updated accordingly.	2017-09-20 15:33:44 -06:00
Allison Vacanti	28ab480a40	Fix warnings on renar.	2017-09-18 15:33:02 -04:00
Robert Maynard	b9e69217ae	Merge topic 'typedef_to_using_round_4' f6863594 Convert VTK-m over to use 'using' instead of 'typedef' Acked-by: Kitware Robot <kwrobot@kitware.com> Merge-request: !885	2017-08-17 16:38:49 -04:00
Sujin Philip	72a6cf4a21	Change cuda calls to use the per-thread stream.	2017-08-17 11:03:02 -04:00
Robert Maynard	f68635941e	Convert VTK-m over to use 'using' instead of 'typedef'	2017-08-17 10:47:25 -04:00
Robert Maynard	5dd346007b	Respect VTK-m convention of parameters all or nothing on a line clang-format BinPack settings have been disabled to make sure that the VTK-m style guideline is obeyed.	2017-05-26 13:53:28 -04:00
Robert Maynard	60a405ef65	Add TaskTiling1D/3D which use faux virtuals to reduce binary size. Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that we can re-use the same launching logic for all Worklets, instead of generating per worlet code. To keep the performance the same the TilingTask now is past a range of indices to work on, rather than a single index. Binary size reduction: WorkletTests_SERIAL old - 19MB WorkletTests_SERIAL new - 18MB WorkletTests_TBB old - 39MB WorkletTests_TBB new - 18MB libvtkAcceleratorsVTKm old - 48MB libvtkAcceleratorsVTKm new - 19MB	2017-05-25 11:00:01 -04:00
Kitware Robot	4ade5f5770	clang-format: apply to the entire tree	2017-05-25 07:51:37 -04:00
Kitware Robot	efbde1d54b	clang-format: sort include directives	2017-05-18 12:59:33 -04:00
Sujin Philip	8c4bbc39ad	Use C++11 =delete keyword	2017-02-24 09:39:22 -05:00
David C. Lonie	f601e38ba8	Simplify exception hierarchy. Remove the ErrorControl class such that all subclasses now inherit from error. Renamed all exception classes via s/ErrorControl/Error/. See issue #57.	2017-02-07 15:42:38 -05:00
Kenneth Moreland	55c159d6f0	Check error codes from CUDA functions Most functions in the CUDA runtime API return an error code that must be checked to determine whether the operation completed successfully. Most operations in VTK-m just called the function and assumed it completed correctly, which could lead to further errors. This change wraps most CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and throws an exception if the call fails.	2016-12-14 10:43:44 -07:00
Kenneth Moreland	fdaccc22db	Remove exports for header-only functions/methods Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and EXEC_CONT.) Remove the inline from these macros so that they can be applied to everything, including implementations in a library. Because inline is not declared in these modifies, you have to add the keyword to functions and methods where the implementation is not inlined in the class.	2016-11-15 22:22:13 -07:00
Matt Larsen	e5c4aa3f78	Fixing cuda index error	2016-03-08 12:41:11 -08:00
Matt Larsen	249cce352b	Adding type restrictions to serial atomics	2016-03-08 10:39:23 -08:00
Matt Larsen	40b6db7eee	Inserted missing ,	2016-03-08 09:51:50 -08:00
Matt Larsen	3b46706e1f	Adding compare and swap and removing unsigned atomics	2016-03-08 09:41:02 -08:00

1 2

58 Commits