Commit Graph

357 Commits

Author SHA1 Message Date
Kenneth Moreland
4f9fa08fa1 Remove ArrayHandleStreaming capabilities
The `ArrayHandleStreaming` class stems from an old research project
experimenting with bringing data from an `ArrayHandle` in parts and
overlapping device transfer and execution. It works, but only in very
limited contexts. Thus, it is not actually used today. Plus, the feature
requires global indexing to be permutated throughout the worklet
dispatching classes of VTK-m for no further reason.

Because it is not really used, there are other more promising approaches
on the horizon, and it makes further scheduling improvements difficult,
we are removing this functionality.
2020-03-24 15:01:56 -06:00
Robert Maynard
8377806778 Merge topic 'introduce_mapfield_3d_scheduling'
1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1938
2020-02-27 08:02:52 -05:00
Kenneth Moreland
ec34cb56c4 Use new ways to get array portal in control environment
Also fix deadlocks that occur when portals are not destroyed
in time.
2020-02-26 13:10:46 -07:00
Robert Maynard
1f1688483e Initial infrastructure to allow WorkletMapField to have 3D scheduling 2020-02-25 15:23:41 -05:00
Kenneth Moreland
3671cbe168 Fix token issues with CUDA 2020-02-25 09:39:30 -07:00
Kenneth Moreland
ad0a53af71 Convert execution preparation to use tokens
Marked the old versions of PrepareFor* that do not use tokens as
deprecated and moved all of the code to use the new versions that
require a token. This makes the scope of the execution object more
explicit so that it will be kept while in use and can potentially be
reclaimed afterward.
2020-02-25 09:39:19 -07:00
Kenneth Moreland
76ce9c87f0 Support using Token calling PrepareForExecution in ExecutionObject
The old version of ExecutionObject (that only takes a device) is still
supported, but you will get a deprecated warning if that is what is
defined.

Supporing this also included sending vtkm::cont::Token through the
vtkm::cont::arg::Transport mechanism, which was a change that propogated
through a lot of code.
2020-02-25 07:41:39 -07:00
Allison Vacanti
46b7155bdb Add 64-bit CUDA atomic store. 2020-01-08 10:58:51 -05:00
Allison Vacanti
539f6e5ad7 Port benchmarking framework to Google Benchmark. 2020-01-08 10:58:51 -05:00
Allison Vacanti
44c4f0838f Add vtkm/Algorithms.h header with device-friendly binary search algorithms. 2019-12-20 12:35:10 -05:00
Allison Vacanti
813f5a422f Fixup custom portal iterator logic.
The convenience functions `ArrayPortalToIteratorBegin()` and
`ArrayPortalToIteratorEnd()` wouldn't detect specializations of
`ArrayPortalToIterators<PortalType>` since the specializations aren't
visible when the `Begin`/`End` functions are declared.

Since the CUDA iterators rely on a specialization, the convenience
functions would not compile on CUDA.

Now, instead of specializing `ArrayPortalToIterators` to provide custom
iterators for a particular portal, the portal may advertise custom
iterators by defining `IteratorType`, `GetIteratorBegin()`, and
`GetIteratorEnd()`. `ArrayPortalToIterators` will detect such portals
and automatically switch to using the specialized portals.

This eliminates the need for the specializations to be visible to the
convenience functions and allows them to be usable on CUDA.
2019-12-17 15:39:51 -05:00
Kenneth Moreland
92db376236 Convert uses of ListTagBase to List 2019-12-06 15:37:46 -07:00
Kenneth Moreland
5ab0b5bb1d Access ArrayHandle internals in a critical section
Repeat the changes of the previous commit with the specialized
ArrayHandle for basic storage.
2019-11-20 14:42:58 -07:00
Robert Maynard
5c56ff945f Label tests which exercise a given Device Adapter
This allows developers an easy way to run all OpenMP tests
2019-09-13 15:52:40 -04:00
Allison Vacanti
b9affb7edc Disable copy for RAII helper. 2019-09-09 17:59:38 -04:00
Allison Vacanti
ea0bbfeefc Increase CUDA stack size for ParticleAdvection worklets.
Sometimes the CUDA runtime would not allocate sufficient stack
space for the particle advection code to run. This issue was exposed by
!1737 -- for some reason, once those changes to unrelated filters/worklets
are added to VTK, CUDA allocates less stack and the following tests would
fail:

UnitTestLagrangianFilterCUDA
UnitTestLagrangianStructuresFilterCUDA
UnitTestStreamlineFilterCUDA
UnitTestStreamSurfaceFilterCUDA

These were fixed by increasing the stack size in the particle advection
worklet Run(...) methods.

An RAII helper has been added that will restore the previous stack size
in case an exception is thrown, and the KDTree code has been updated
to use this helper when it adjusts the CUDA stack allocation.
2019-09-09 16:06:23 -04:00
Allison Vacanti
884616788a Simplify and extend AtomicArray implementation.
- Use AtomicInterface to implement device-specific atomic operations.
- Remove DeviceAdapterAtomicArrayImplementations.
- Extend supported atomic types to include unsigned 32/64-bit ints.
- Add a static_assert to check that AtomicArray type is supported.
- Add documentation for AtomicArrayExecutionObject, including a CAS
  example.
- Add a `T Get(idx)` method to AtomicArrayExecutionObject that does
  an atomic load, and update existing CAS usage to use this instead
  of `Add(idx, 0)`.
2019-08-23 15:40:37 -04:00
Allison Vacanti
0e728c8000 Update atomic interfaces to support Add/CAS for UInt32/64.
These will be used for the AtomicArray implementation.
2019-08-23 15:40:37 -04:00
Allison Vacanti
112024dae2 Fix CUDA shfl usage.
There was a bug in the implementations of CountSetBits and
BitFieldToUnorderedSet.
2019-08-01 10:57:57 -04:00
Kenneth Moreland
5e23853521 Create ArrayHandleMultiplexer 2019-07-22 08:36:28 -06:00
Mark Kim
8dbb1c4de3 Merge branch 'master' of gitlab.kitware.com:m-kim/vtk-m into advdatamodel 2019-06-26 19:37:47 -04:00
Allison Vacanti
920ef9b3b9 Merge topic 'bit_algorithms'
f370857c1 Add CountSetBits and Fill device algorithms.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1696
2019-06-25 15:42:16 -04:00
Allison Vacanti
f370857c15 Add CountSetBits and Fill device algorithms. 2019-06-25 11:30:39 -04:00
Mark Kim
699b57191f Merge branch 'master' of gitlab.kitware.com:m-kim/vtk-m into advdatamodel 2019-06-25 10:36:47 -04:00
Mark Kim
cffd3873fc Merge branch 'advdatamodel' 2019-06-20 22:20:44 -04:00
Mark Kim
6e1d3a84f0 First Extrude commit.
how did any of this work?

match other CellSet file layouts.

???

compile in CUDA.

unit tests.

also only serial.

make error message accurate

Well, this compiles and works now.

Did it ever?

use CellShapeTagGeneric

UnitTest matches previous changes.

whoops

Fix linking problems.

Need the same interface

as other ThreadIndices.

add filter test

okay, let's try duplicating CellSetStructure.

okay

inching...

change to wedge in CellSetListTag

Means changing these to support it.

switch back to wedge from generic

compiles and runs

remove ExtrudedType

need vtkm_worklet

vtkm_worklet needs to be included

fix segment count for wedge specialization

need to actually save the index

for the other constructor.

specialize on Explicit

clean up warning

angled brackets not quotes.

formatting
2019-06-20 22:17:24 -04:00
Robert Maynard
1ea386222e cuda copy functions don't launch on length zero arrays 2019-06-20 16:54:23 -04:00
Robert Maynard
8aaf922aa4 Introduce a log level that details kernel launch parameters 2019-06-18 15:01:07 -04:00
Kenneth Moreland
f11702ae92 Fix for rogue definition of PASCAL macro 2019-06-05 10:09:49 -06:00
Robert Maynard
4020f51988 RuntimeDeviceTracker can't be copied and is only accessible via reference.
As the RuntimeDeviceTracker is a per thread construct we now make
it explicit that you can only get a reference to the per-thread
version and can't copy it.
2019-05-20 11:43:05 -04:00
Robert Maynard
d1ce4a0bca Fix the default launch sizes for Tesla hardware.
The 8x8x8 is a better launch strategy for most VTK-m kernels.
The current problem is that a couple of VTK-m kernels use a
high number of registers and this number of threads combines to
require too many registers.

What we should do in the longer run is have more controls over
kernel launches on a per kernel basis. This will require VTK-m
to extract the number of registers being used by each kernel
2019-05-06 16:12:15 -04:00
Robert Maynard
770912f991 Correct compiler issues found with GCC 4.8.5 + CUDA 9.2 on summit 2019-05-02 10:27:48 -04:00
Robert Maynard
065d117838 Testing Device Adapter now uses ArrayHandle for all device transfers
The consistent API for control to execution memory transfers is
the ArrayHandle class. Previously the tests would verify memory
transfer by calling the ArrayManagerExecution class directly. This
is problematic as the class isn't used by ArrayHandle<T, StorageBasic>.
2019-04-30 13:50:08 -04:00
Robert Maynard
63c931e639 Correct location of ThrustPatches which clang formatter moved 2019-04-23 15:02:58 -04:00
Robert Maynard
ff687016ee For VTK-m libs all includes of DeviceAdapterTagCuda happen from cuda files
It is very easy to cause ODR violations with DeviceAdapterTagCuda.
If you include that header from a C++ file and a CUDA file inside
the same program we an ODR violation. The reasons is that the C++
versions will say the tag is invalid, and the CUDA will say the
tag is valid.

The solution to this is that any compilation unit that includes
DeviceAdapterTagCuda from a version of VTK-m that has CUDA enabled
must be invoked by the cuda compiler.
2019-04-22 10:39:54 -04:00
nadavi
fbcea82e78 conslidate the license statement 2019-04-17 10:57:13 -06:00
Robert Maynard
6c5c197a37 Merge topic 'support_cuda_scheduling_parameters_via_runtime'
047b64651 VTK-m now provides better scheduling parameters controls

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1643
2019-04-17 10:04:19 -04:00
Robert Maynard
047b646517 VTK-m now provides better scheduling parameters controls
VTK-m now offers a more GPU aware set of defaults for kernel scheduling.
When VTK-m first launches a kernel we do system introspection and determine
what GPU's are on the machine and than match this information to a preset
table of values. The implementation is designed in a way that allows for
VTK-m to offer both specific presets for a given GPU ( V100 ) or for
an entire generation of cards ( Pascal ).

Currently VTK-m offers preset tables for the following GPU's:
- Tesla V100
- Tesla P100

If the hardware doesn't match a specific GPU card we than try to find the
nearest know hardware generation and use those defaults. Currently we offer
defaults for
- Older than Pascal Hardware
- Pascal Hardware
- Volta+ Hardware

Some users have workloads that don't align with the defaults provided by
VTK-m. When that is the cause, it is possible to override the defaults
by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`.
As shown below:

```cpp
  ScheduleParameters CustomScheduleValues(char const* name,
                                          int major,
                                          int minor,
                                          int multiProcessorCount,
                                          int maxThreadsPerMultiProcessor,
                                          int maxThreadsPerBlock)
  {

    ScheduleParameters params  {
        64 * multiProcessorCount,  //1d blocks
        64,                        //1d threads per block
        64 * multiProcessorCount,  //2d blocks
        { 8, 8, 1 },               //2d threads per block
        64 * multiProcessorCount,  //3d blocks
        { 4, 4, 4 } };             //3d threads per block
    return params;
  }
  vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);
```
2019-04-17 08:32:16 -04:00
Robert Maynard
ff30684c8e Removes the default device macros from VTK-m
Fixes #116
2019-04-15 08:15:36 -04:00
Robert Maynard
a5dbe1ece3 Merge topic 'bitfields'
661fb64de AtomicInterfaceControl functions are marked with VTKM_SUPPRESS_EXEC_WARNINGS
0c70f9b9a Add BitFieldIn/Out/InOut worklet signature tags.
a66510e81 Add ArrayHandleBitField, a boolean-valued AH backed by a BitField.
56cc5c3d3 Add support for BitFields.
d01b97382 Allow VTKM_SUPPRESS_EXEC_WARNINGS to be used inside macros.
2f2ca9370 Add bit operations FindFirstSetBit and CountSetBits to Math.h.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1629
2019-04-11 12:32:03 -04:00
Allison Vacanti
56cc5c3d3a Add support for BitFields.
BitFields are:
- Stored in memory using a contiguous buffer of bits.
- Accessible via portals, a la ArrayHandle.
- Portals operate on individual bits or words.
- Operations may be atomic for safe use from concurrent kernels.

The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle
containing the indices of all set bits, in no particular order.

The new AtomicInterface classes provide an abstraction into bitwise
atomic operations across control and execution environments and are used
to implement the BitPortals.
2019-04-11 08:27:17 -04:00
Robert Maynard
89ec4aae2f Reduction on CUDA handles different input and output types better
When reducing an input type that differs from the output type
you need to write a custom binary operator that also implements
how to do the unary transformation.
2019-04-10 14:44:44 -04:00
Robert Maynard
1d20ae4f7b Move DeviceAdapterTag to vtkm/cont 2019-04-04 11:58:51 -04:00
Robert Maynard
7f612502ac Merge topic 'remove_unneeded_cont_exec_markup'
f1056affa Move select functions to host only to remove host/device suppressions
4f2156dfa Thrust detail::aligned_reinterpret_cast doesn't warn now
f4840618c Make sure ThrustPatches is included before thrust.
b2bbd66e6 Merge branch 'upstream-taotuple' into update_taoo
4ec6fc812 taotuple 2019-04-03 (8e70fa8a)

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1607
2019-04-04 09:32:04 -04:00
Sujin Philip
c6bead8388 Rename CellLocatorTwoLevelUniformGrid to CellLocatorUniformBins
Also make it a concrete sub-class of vtkm::cont::CellLocator
Fixes issue #251
2019-04-03 10:21:56 -04:00
Robert Maynard
f4840618cf Make sure ThrustPatches is included before thrust. 2019-04-03 08:51:05 -04:00
Robert Maynard
b9e0e541b8 VTK-m once again uses consistent include style 2019-03-28 14:12:08 -04:00
Robert Maynard
256e0c3c11 Merge topic 'rename_to_GetRuntimeDeviceTracker'
ae11e115a RuntimeDeviceTracker: Remove `Global` from names

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1592
2019-03-24 08:17:02 -04:00
Robert Maynard
ae11e115a0 RuntimeDeviceTracker: Remove Global from names 2019-03-22 08:53:26 -07:00
Robert Maynard
6cdf6cb672 Less aggressive defaults for VTK-m compared to summit.
Since we don't have per system checks currently built into
vtk-m we can't use the tuned values for Summit, as they
don't run on all our hardware.
2019-03-20 09:30:34 -07:00