vtk-m2

Author	SHA1	Message	Date
Kenneth Moreland	18b5be92d6	Fix issue with CUDA and ArrayHandleMultiplexer When you try to call the `Reduce` operation in the CUDA device adapter with a sufficently complex interator type, you get a compile error that says `error: cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch`. This appears to be a bug in either nvcc or Thrust. I believe it is related to the following reported issues: * https://github.com/thrust/thrust/issues/928 * https://github.com/thrust/thrust/issues/1044 Work around this problem by making a special condition for calling `Reduce` with an `ArrayHandleMultiplexer` that calls the generic algorithm in `DeviceAdapterAlgorithmGeneral` instead of the algorithm in Thrust.	2020-07-06 13:51:36 -06:00
Kenneth Moreland	a47fd42bc1	Pin user provided memory in ArrayHandle Often when a user gives memory to an `ArrayHandle`, she wants data to be written into the memory given to be used elsewhere. Previously, the `Buffer` objects would delete the given buffer as soon as a write buffer was created elsewhere. That was a problem if a user wants VTK-m to write results right into a given buffer. Instead, when a user provides memory, "pin" that memory so that the `ArrayHandle` never deletes it.	2020-06-25 14:02:46 -06:00
Kenneth Moreland	56bec1dd7b	Replace basic ArrayHandle implementation to use Buffers This encapsulates a lot of the required memory management into the Buffer object and related code. Many now unneeded classes were deleted.	2020-06-25 14:02:26 -06:00
Kenneth Moreland	8f7b0d18be	Add Buffer class The buffer class encapsulates the movement of raw C arrays between host and devices. The `Buffer` class itself is not associated with any device. Instead, `Buffer` is used in conjunction with a new templated class named `DeviceAdapterMemoryManager` that can allocate data on a given device and transfer data as necessary. `DeviceAdapterMemoryManager` will eventually replace the more complicated device adapter classes that manage data on a device. The code in `DeviceAdapterMemoryManager` is actually enclosed in virtual methods. This allows us to limit the number of classes that need to be compiled for a device. Rather, the implementation of `DeviceAdapterMemoryManager` is compiled once with whatever compiler is necessary, and then the `RuntimeDeviceInformation` is used to get the correct object instance.	2020-06-25 14:01:39 -06:00
Kenneth Moreland	4f9fa08fa1	Remove ArrayHandleStreaming capabilities The `ArrayHandleStreaming` class stems from an old research project experimenting with bringing data from an `ArrayHandle` in parts and overlapping device transfer and execution. It works, but only in very limited contexts. Thus, it is not actually used today. Plus, the feature requires global indexing to be permutated throughout the worklet dispatching classes of VTK-m for no further reason. Because it is not really used, there are other more promising approaches on the horizon, and it makes further scheduling improvements difficult, we are removing this functionality.	2020-03-24 15:01:56 -06:00
Robert Maynard	8377806778	Merge topic 'introduce_mapfield_3d_scheduling' 1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1938	2020-02-27 08:02:52 -05:00
Kenneth Moreland	ec34cb56c4	Use new ways to get array portal in control environment Also fix deadlocks that occur when portals are not destroyed in time.	2020-02-26 13:10:46 -07:00
Robert Maynard	1f1688483e	Initial infrastructure to allow WorkletMapField to have 3D scheduling	2020-02-25 15:23:41 -05:00
Kenneth Moreland	3671cbe168	Fix token issues with CUDA	2020-02-25 09:39:30 -07:00
Kenneth Moreland	ad0a53af71	Convert execution preparation to use tokens Marked the old versions of PrepareFor* that do not use tokens as deprecated and moved all of the code to use the new versions that require a token. This makes the scope of the execution object more explicit so that it will be kept while in use and can potentially be reclaimed afterward.	2020-02-25 09:39:19 -07:00
Kenneth Moreland	76ce9c87f0	Support using Token calling PrepareForExecution in ExecutionObject The old version of ExecutionObject (that only takes a device) is still supported, but you will get a deprecated warning if that is what is defined. Supporing this also included sending vtkm::cont::Token through the vtkm::cont::arg::Transport mechanism, which was a change that propogated through a lot of code.	2020-02-25 07:41:39 -07:00
Allison Vacanti	46b7155bdb	Add 64-bit CUDA atomic store.	2020-01-08 10:58:51 -05:00
Allison Vacanti	539f6e5ad7	Port benchmarking framework to Google Benchmark.	2020-01-08 10:58:51 -05:00
Allison Vacanti	44c4f0838f	Add vtkm/Algorithms.h header with device-friendly binary search algorithms.	2019-12-20 12:35:10 -05:00
Allison Vacanti	813f5a422f	Fixup custom portal iterator logic. The convenience functions `ArrayPortalToIteratorBegin()` and `ArrayPortalToIteratorEnd()` wouldn't detect specializations of `ArrayPortalToIterators<PortalType>` since the specializations aren't visible when the `Begin`/`End` functions are declared. Since the CUDA iterators rely on a specialization, the convenience functions would not compile on CUDA. Now, instead of specializing `ArrayPortalToIterators` to provide custom iterators for a particular portal, the portal may advertise custom iterators by defining `IteratorType`, `GetIteratorBegin()`, and `GetIteratorEnd()`. `ArrayPortalToIterators` will detect such portals and automatically switch to using the specialized portals. This eliminates the need for the specializations to be visible to the convenience functions and allows them to be usable on CUDA.	2019-12-17 15:39:51 -05:00
Kenneth Moreland	92db376236	Convert uses of ListTagBase to List	2019-12-06 15:37:46 -07:00
Kenneth Moreland	5ab0b5bb1d	Access ArrayHandle internals in a critical section Repeat the changes of the previous commit with the specialized ArrayHandle for basic storage.	2019-11-20 14:42:58 -07:00
Robert Maynard	5c56ff945f	Label tests which exercise a given Device Adapter This allows developers an easy way to run all OpenMP tests	2019-09-13 15:52:40 -04:00
Allison Vacanti	b9affb7edc	Disable copy for RAII helper.	2019-09-09 17:59:38 -04:00
Allison Vacanti	ea0bbfeefc	Increase CUDA stack size for ParticleAdvection worklets. Sometimes the CUDA runtime would not allocate sufficient stack space for the particle advection code to run. This issue was exposed by !1737 -- for some reason, once those changes to unrelated filters/worklets are added to VTK, CUDA allocates less stack and the following tests would fail: UnitTestLagrangianFilterCUDA UnitTestLagrangianStructuresFilterCUDA UnitTestStreamlineFilterCUDA UnitTestStreamSurfaceFilterCUDA These were fixed by increasing the stack size in the particle advection worklet Run(...) methods. An RAII helper has been added that will restore the previous stack size in case an exception is thrown, and the KDTree code has been updated to use this helper when it adjusts the CUDA stack allocation.	2019-09-09 16:06:23 -04:00
Allison Vacanti	884616788a	Simplify and extend AtomicArray implementation. - Use AtomicInterface to implement device-specific atomic operations. - Remove DeviceAdapterAtomicArrayImplementations. - Extend supported atomic types to include unsigned 32/64-bit ints. - Add a static_assert to check that AtomicArray type is supported. - Add documentation for AtomicArrayExecutionObject, including a CAS example. - Add a `T Get(idx)` method to AtomicArrayExecutionObject that does an atomic load, and update existing CAS usage to use this instead of `Add(idx, 0)`.	2019-08-23 15:40:37 -04:00
Allison Vacanti	0e728c8000	Update atomic interfaces to support Add/CAS for UInt32/64. These will be used for the AtomicArray implementation.	2019-08-23 15:40:37 -04:00
Allison Vacanti	112024dae2	Fix CUDA shfl usage. There was a bug in the implementations of CountSetBits and BitFieldToUnorderedSet.	2019-08-01 10:57:57 -04:00
Kenneth Moreland	5e23853521	Create ArrayHandleMultiplexer	2019-07-22 08:36:28 -06:00
Mark Kim	8dbb1c4de3	Merge branch 'master' of gitlab.kitware.com:m-kim/vtk-m into advdatamodel	2019-06-26 19:37:47 -04:00
Allison Vacanti	920ef9b3b9	Merge topic 'bit_algorithms' f370857c1 Add CountSetBits and Fill device algorithms. Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1696	2019-06-25 15:42:16 -04:00
Allison Vacanti	f370857c15	Add CountSetBits and Fill device algorithms.	2019-06-25 11:30:39 -04:00
Mark Kim	699b57191f	Merge branch 'master' of gitlab.kitware.com:m-kim/vtk-m into advdatamodel	2019-06-25 10:36:47 -04:00
Mark Kim	cffd3873fc	Merge branch 'advdatamodel'	2019-06-20 22:20:44 -04:00
Mark Kim	6e1d3a84f0	First Extrude commit. how did any of this work? match other CellSet file layouts. ??? compile in CUDA. unit tests. also only serial. make error message accurate Well, this compiles and works now. Did it ever? use CellShapeTagGeneric UnitTest matches previous changes. whoops Fix linking problems. Need the same interface as other ThreadIndices. add filter test okay, let's try duplicating CellSetStructure. okay inching... change to wedge in CellSetListTag Means changing these to support it. switch back to wedge from generic compiles and runs remove ExtrudedType need vtkm_worklet vtkm_worklet needs to be included fix segment count for wedge specialization need to actually save the index for the other constructor. specialize on Explicit clean up warning angled brackets not quotes. formatting	2019-06-20 22:17:24 -04:00
Robert Maynard	1ea386222e	cuda copy functions don't launch on length zero arrays	2019-06-20 16:54:23 -04:00
Robert Maynard	8aaf922aa4	Introduce a log level that details kernel launch parameters	2019-06-18 15:01:07 -04:00
Kenneth Moreland	f11702ae92	Fix for rogue definition of PASCAL macro	2019-06-05 10:09:49 -06:00
Robert Maynard	4020f51988	RuntimeDeviceTracker can't be copied and is only accessible via reference. As the RuntimeDeviceTracker is a per thread construct we now make it explicit that you can only get a reference to the per-thread version and can't copy it.	2019-05-20 11:43:05 -04:00
Robert Maynard	d1ce4a0bca	Fix the default launch sizes for Tesla hardware. The 8x8x8 is a better launch strategy for most VTK-m kernels. The current problem is that a couple of VTK-m kernels use a high number of registers and this number of threads combines to require too many registers. What we should do in the longer run is have more controls over kernel launches on a per kernel basis. This will require VTK-m to extract the number of registers being used by each kernel	2019-05-06 16:12:15 -04:00
Robert Maynard	770912f991	Correct compiler issues found with GCC 4.8.5 + CUDA 9.2 on summit	2019-05-02 10:27:48 -04:00
Robert Maynard	065d117838	Testing Device Adapter now uses ArrayHandle for all device transfers The consistent API for control to execution memory transfers is the ArrayHandle class. Previously the tests would verify memory transfer by calling the ArrayManagerExecution class directly. This is problematic as the class isn't used by ArrayHandle<T, StorageBasic>.	2019-04-30 13:50:08 -04:00
Robert Maynard	63c931e639	Correct location of ThrustPatches which clang formatter moved	2019-04-23 15:02:58 -04:00
Robert Maynard	ff687016ee	For VTK-m libs all includes of DeviceAdapterTagCuda happen from cuda files It is very easy to cause ODR violations with DeviceAdapterTagCuda. If you include that header from a C++ file and a CUDA file inside the same program we an ODR violation. The reasons is that the C++ versions will say the tag is invalid, and the CUDA will say the tag is valid. The solution to this is that any compilation unit that includes DeviceAdapterTagCuda from a version of VTK-m that has CUDA enabled must be invoked by the cuda compiler.	2019-04-22 10:39:54 -04:00
nadavi	fbcea82e78	conslidate the license statement	2019-04-17 10:57:13 -06:00
Robert Maynard	6c5c197a37	Merge topic 'support_cuda_scheduling_parameters_via_runtime' 047b64651 VTK-m now provides better scheduling parameters controls Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Kenneth Moreland <kmorel@sandia.gov> Merge-request: !1643	2019-04-17 10:04:19 -04:00
Robert Maynard	047b646517	VTK-m now provides better scheduling parameters controls VTK-m now offers a more GPU aware set of defaults for kernel scheduling. When VTK-m first launches a kernel we do system introspection and determine what GPU's are on the machine and than match this information to a preset table of values. The implementation is designed in a way that allows for VTK-m to offer both specific presets for a given GPU ( V100 ) or for an entire generation of cards ( Pascal ). Currently VTK-m offers preset tables for the following GPU's: - Tesla V100 - Tesla P100 If the hardware doesn't match a specific GPU card we than try to find the nearest know hardware generation and use those defaults. Currently we offer defaults for - Older than Pascal Hardware - Pascal Hardware - Volta+ Hardware Some users have workloads that don't align with the defaults provided by VTK-m. When that is the cause, it is possible to override the defaults by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`. As shown below: ```cpp ScheduleParameters CustomScheduleValues(char const* name, int major, int minor, int multiProcessorCount, int maxThreadsPerMultiProcessor, int maxThreadsPerBlock) { ScheduleParameters params { 64 * multiProcessorCount, //1d blocks 64, //1d threads per block 64 * multiProcessorCount, //2d blocks { 8, 8, 1 }, //2d threads per block 64 * multiProcessorCount, //3d blocks { 4, 4, 4 } }; //3d threads per block return params; } vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues); ```	2019-04-17 08:32:16 -04:00
Robert Maynard	ff30684c8e	Removes the default device macros from VTK-m Fixes #116	2019-04-15 08:15:36 -04:00
Robert Maynard	a5dbe1ece3	Merge topic 'bitfields' 661fb64de AtomicInterfaceControl functions are marked with VTKM_SUPPRESS_EXEC_WARNINGS 0c70f9b9a Add BitFieldIn/Out/InOut worklet signature tags. a66510e81 Add ArrayHandleBitField, a boolean-valued AH backed by a BitField. 56cc5c3d3 Add support for BitFields. d01b97382 Allow VTKM_SUPPRESS_EXEC_WARNINGS to be used inside macros. 2f2ca9370 Add bit operations FindFirstSetBit and CountSetBits to Math.h. Acked-by: Kitware Robot <kwrobot@kitware.com> Merge-request: !1629	2019-04-11 12:32:03 -04:00
Allison Vacanti	56cc5c3d3a	Add support for BitFields. BitFields are: - Stored in memory using a contiguous buffer of bits. - Accessible via portals, a la ArrayHandle. - Portals operate on individual bits or words. - Operations may be atomic for safe use from concurrent kernels. The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle containing the indices of all set bits, in no particular order. The new AtomicInterface classes provide an abstraction into bitwise atomic operations across control and execution environments and are used to implement the BitPortals.	2019-04-11 08:27:17 -04:00
Robert Maynard	89ec4aae2f	Reduction on CUDA handles different input and output types better When reducing an input type that differs from the output type you need to write a custom binary operator that also implements how to do the unary transformation.	2019-04-10 14:44:44 -04:00
Robert Maynard	1d20ae4f7b	Move DeviceAdapterTag to vtkm/cont	2019-04-04 11:58:51 -04:00
Robert Maynard	7f612502ac	Merge topic 'remove_unneeded_cont_exec_markup' f1056affa Move select functions to host only to remove host/device suppressions 4f2156dfa Thrust detail::aligned_reinterpret_cast doesn't warn now f4840618c Make sure ThrustPatches is included before thrust. b2bbd66e6 Merge branch 'upstream-taotuple' into update_taoo 4ec6fc812 taotuple 2019-04-03 (8e70fa8a) Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Allison Vacanti <allison.vacanti@kitware.com> Merge-request: !1607	2019-04-04 09:32:04 -04:00
Sujin Philip	c6bead8388	Rename CellLocatorTwoLevelUniformGrid to CellLocatorUniformBins Also make it a concrete sub-class of vtkm::cont::CellLocator Fixes issue #251	2019-04-03 10:21:56 -04:00
Robert Maynard	f4840618cf	Make sure ThrustPatches is included before thrust.	2019-04-03 08:51:05 -04:00

1 2 3 4 5 ...

361 Commits