Commit Graph

140 Commits

Author SHA1 Message Date
Kenneth Moreland
c44f686496 Add hints to device adapter scheduler
The `DeviceAdapter` provides an abstract interface to the accelerator
devices worklets and other algorithms run on. As such, the programmer has
less control about how the device launches each worklet. Each device
adapter has its own configuration parameters and other ways to attempt to
optimize how things are run, but these are always a universal set of
options that are applied to everything run on the device. There is no way
to specify launch parameters for a particular worklet.

To provide this information, VTK-m now supports `Hint`s to the device
adapter. The `DeviceAdapterAlgorithm::Schedule` method takes a templated
argument that is of the type `HintList`. This object contains a template
list of `Hint` types that provide suggestions on how to launch the parallel
execution. The device adapter will pick out hints that pertain to it and
adjust its launching accordingly.

These are called hints rather than, say, directives, because they don't
force the device adapter to do anything. The device adapter is free to
ignore any (and all) hints. The point is that the device adapter can take
into account the information to try to optimize for itself.

A provided hint can be tied to specific device adapters. In this way, an
worklet can further optimize itself. If multiple hints match a device
adapter, the last one in the list will be selected.

The `Worklet` base now has an internal type named `Hints` that points to a
`HintList` that is applied when the worklet is scheduled. Derived worklet
classes can provide hints by simply defining their own `Hints` type.
2024-02-09 10:42:23 -05:00
Mark Bolstad
4676d07f09 Fix error under CUDA with private class declarations
nvcc doesn't like the couple of places that we have Impl classes hiding behind a
private declaration. So for VTKM_CUDA only, ifdef the private declaration out
2023-12-05 10:03:40 -07:00
Ben Boeckel
6c00a87554 clang-tidy: fix readability-redundant-access-specifiers lints 2023-12-01 07:01:11 -05:00
Kenneth Moreland
34412ff298 Deprecate ArrayHandle::Shrink
This method has been subsumed by Allocate with vtkm::CopyFlag::On.
2021-02-01 08:07:40 -07:00
Sujin Philip
e57f5a175d Fix DeviceAdapterAlgorithmGeneral Reduce
It was using `ArrayHandleImplicit` in an unsupported manner.
2021-01-07 16:24:53 -05:00
Kenneth Moreland
4345fe26b0 Store the number of bits of a BitField in the Buffer's metadata
The number of bits in a `BitField` cannot be directly implied from the
size of the buffer (because the buffer gets padded to the nearest sized
word). Thus, the `BitField stored the number of bits in its own
internals.

Unfortunately, that caused issues when passing the `BitField` data
between it and an `ArrayHandleBitField`. If the `ArrayHandleBitField`
resized itself, the `BitField` would not see the new size because it
ignored the new buffer size.

To get around this problem, `BitField` now declares its own
`BufferMetaData` that stores the number of bits. Now, since the number
of bits is stored in the `Buffer` object, it is sufficient to just share
the `Buffer` to synchronize all of the state.
2020-08-24 17:09:30 -06:00
Kitware Robot
cf0cdcf7d1 clang-format: reformat the repository with clang-format-9 2020-08-24 14:01:08 -04:00
Sujin Philip
452f61e290 Add Kokkos backend 2020-08-12 13:55:24 -04:00
Kenneth Moreland
6dc0b394a9 Fix reduce-by-key with a fancy output array
If you gave ReduceByKey a fancy output array that decorated another
array, you could get a runtime error for using an invalid array (if the
device adapter used the generic algorithm). The problem was that
ReduceByKey creates a temporary array, and that array was given the same
storage as the output array. That might not be valid for fancy arrays,
so instead use the default storage for the temporary array.
2020-04-16 14:19:44 -06:00
Kenneth Moreland
52f157e420 Fix scan-by-key with a fancy output array
If you gave ScanInclusiveByKey a fancy output array that decorated
another array, you would get a runtime error for using an invalid array.
The problem was that ScanInclusiveByKey creates a temporary output array
and then copies the result to the actual output array. The problem was
that the temporary output array was given the same storage as the output
array, which won't work if the output array is fancy. Instead, make the
storage for the temporary array default.
2020-04-16 10:37:48 -06:00
Kenneth Moreland
4f9fa08fa1 Remove ArrayHandleStreaming capabilities
The `ArrayHandleStreaming` class stems from an old research project
experimenting with bringing data from an `ArrayHandle` in parts and
overlapping device transfer and execution. It works, but only in very
limited contexts. Thus, it is not actually used today. Plus, the feature
requires global indexing to be permutated throughout the worklet
dispatching classes of VTK-m for no further reason.

Because it is not really used, there are other more promising approaches
on the horizon, and it makes further scheduling improvements difficult,
we are removing this functionality.
2020-03-24 15:01:56 -06:00
Kenneth Moreland
ec34cb56c4 Use new ways to get array portal in control environment
Also fix deadlocks that occur when portals are not destroyed
in time.
2020-02-26 13:10:46 -07:00
Kenneth Moreland
b2fdf236e7 Fix deadlocks in device adapters and low level tests
The new Token functionality makes it easy for a thread to deadlock
itself if it does not detach a token after it is done.
2020-02-25 09:39:27 -07:00
Kenneth Moreland
ad0a53af71 Convert execution preparation to use tokens
Marked the old versions of PrepareFor* that do not use tokens as
deprecated and moved all of the code to use the new versions that
require a token. This makes the scope of the execution object more
explicit so that it will be kept while in use and can potentially be
reclaimed afterward.
2020-02-25 09:39:19 -07:00
Allison Vacanti
afe1bd12dd Add ScanExtended device algorithm.
This behaves just like `ScanExclusive`, but rather than returning the
total sum, it is appended to the end of the output array.

This is in preparation for the CellSetExplicit refactoring described in
issue #408.
2019-09-03 15:02:41 -04:00
Allison Vacanti
884616788a Simplify and extend AtomicArray implementation.
- Use AtomicInterface to implement device-specific atomic operations.
- Remove DeviceAdapterAtomicArrayImplementations.
- Extend supported atomic types to include unsigned 32/64-bit ints.
- Add a static_assert to check that AtomicArray type is supported.
- Add documentation for AtomicArrayExecutionObject, including a CAS
  example.
- Add a `T Get(idx)` method to AtomicArrayExecutionObject that does
  an atomic load, and update existing CAS usage to use this instead
  of `Add(idx, 0)`.
2019-08-23 15:40:37 -04:00
Allison Vacanti
f370857c15 Add CountSetBits and Fill device algorithms. 2019-06-25 11:30:39 -04:00
nadavi
fbcea82e78 conslidate the license statement 2019-04-17 10:57:13 -06:00
Allison Vacanti
56cc5c3d3a Add support for BitFields.
BitFields are:
- Stored in memory using a contiguous buffer of bits.
- Accessible via portals, a la ArrayHandle.
- Portals operate on individual bits or words.
- Operations may be atomic for safe use from concurrent kernels.

The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle
containing the indices of all set bits, in no particular order.

The new AtomicInterface classes provide an abstraction into bitwise
atomic operations across control and execution environments and are used
to implement the BitPortals.
2019-04-11 08:27:17 -04:00
Allison Vacanti
bd337854ec Initial implementation of general logging.
Addresses #291.
2018-10-02 11:37:55 -04:00
Allison Vacanti
be0c6a17a9 Move DevAdaptAtomicArrayImplementation to its own file. 2018-05-29 14:07:29 -04:00
Robert Maynard
8808b41fbd Merge branch 'master' into vtk-m-cmake_refactor 2018-03-29 22:51:26 -04:00
Robert Maynard
7c54125b66 Switch over from static const to static constexpr where possible. 2018-03-10 11:39:58 -05:00
Robert Maynard
e630ac5aa4 Merge branch 'master' into vtk-m-cmake_refactor 2018-02-23 14:52:00 -05:00
Robert Maynard
022c987182 Correct warnings and errors found with MSVC2017+CUDA9 2018-01-31 15:58:45 -05:00
Robert Maynard
0c5a087e41 Merge topic 'dont_allow_rvalue_tasks'
ef611239 Don't allow DeviceTaskTypes to construct tasks from rvalues.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1062
2018-01-30 08:37:29 -05:00
luz.paz
80b11afa24 Misc. typos
Found via `codespell -q 3` via downstream VTK
2018-01-30 06:51:47 -05:00
Robert Maynard
ef611239f6 Don't allow DeviceTaskTypes to construct tasks from rvalues. 2018-01-18 13:55:37 -05:00
Robert Maynard
7d7c6ab1ab Don't allow DeviceTaskTypes to construct tasks from rvalues. 2018-01-18 13:51:30 -05:00
Li-Ta Lo
a713a0d889 Generalize and documentation for DeviceAdapterAlgorithm::Transform
Generalize DeviceAdapterAlgorithm::Transform to accept input array of different value and storage type.
Add doxygen documentation in DeviceAdapterAlgorithm.h
2018-01-16 14:43:31 -07:00
Li-Ta Lo
2e88f4220a Connected component for triangle mesh 2017-12-22 10:31:02 -07:00
Kenneth Moreland
37d0100828 Merge topic 'correct-simple-unique'
7a2ef646 Correct the implementation of DeviceAdapterAlgorithmGeneral::Unique

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !982
2017-10-25 10:30:10 -04:00
Kenneth Moreland
7a2ef6464a Correct the implementation of DeviceAdapterAlgorithmGeneral::Unique
The implementation of the simplified version of
DeviceAdapterAlgorithmGeneral::Unique had two errors.

First, the implementation is such that it calls the more complex version
of Unique (which specifies a binary predicate to establish equality).
However, it was not calling the Unique method in the DerivedAlgorithm
like it should have been. Instead, it was calling its own Unique
algorithm, which might not be as efficient as the specialized Unique for
the device.

Second, it was using std::equal_to as its binary predicate. Using
functors from std can be dangerous because they are not marked with
VTKM_EXEC, so have the potential to not work in the execution
environment. Instead, use the readily available vtkm::Equal binary
predicate.
2017-10-24 16:57:18 -06:00
Kenneth Moreland
e50ec6b667 Fix type error in ScanExclusiveByKey
The implementation of ScanExclusiveByKey in
DeviceAdapterAlgorithmGeneral by shifting values in the input values
array and then calling ScanInclusiveByKey. However, the temporary
shifted values array was created using the key type instead of the
values type. This caused a compile error when the keys and values had
different types.
2017-10-24 16:12:30 -06:00
Allison Vacanti
1018d981a0 Check for overlap in CopySubRange.
Some parallel copy implementations will not handle this sanely.
2017-10-11 16:52:32 -04:00
Allison Vacanti
c2a7e4faba Adapt ReduceByKey to handle ArrayHandleDiscard for output keys.
Often times we don't care about the output keys, and it's useful to
be able to pass an ArrayHandleDiscard into the algorithm to save
memory in these cases.
2017-10-10 10:28:51 -04:00
Robert Maynard
311618a15f Enable highest level of warnings(W4) under MSVC
This will make VTK-m warning level match the one used by VTK. This commit
also resolves the first round of warnings that W4 exposes.
2017-09-22 13:04:28 -04:00
Kenneth Moreland
c3a3184d51 Update copyright for Sandia
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
2017-09-20 15:33:44 -06:00
Robert Maynard
89f439999a Reduce the amount of typedef statements in DeviceAdapters
By using the auto keyword and decltype we can reduce the number of
complex typedefs that exist when writing device adapter algorithms.
The goal being that it is easier for developers to see the actual
algorithms being implemented, by reducing the amount of template
'noise'.
2017-08-16 14:23:21 -04:00
Robert Maynard
e095831199 ArrayHandleReverse now supports shrink/allocate under all conditions.
Previously ArrayHandleReverse would only work if it was provided an explicit
users array to map too. But this doesn't need to be so, if a user wants to
start by constructing an ArrayHandleReverse we should allow that.

The side effect of this, is that some very tricky code in the DeviceAdapters
can be removed, that explicitly was added to allow output to ArrayHandleReverse
2017-08-16 14:20:05 -04:00
Robert Maynard
ac90e2dc73 vtkm::cont::internal now prefers 'using' over 'typedef' 2017-08-10 15:26:05 -04:00
Robert Maynard
b85cdd9080 Convert VTK-m over to use 'using' instead of 'typedef' 2017-08-07 14:05:43 -04:00
Sujin Philip
c4e3102084 Simplify ArrayHandleImplicit template 2017-06-08 16:46:45 -04:00
Robert Maynard
5dd346007b Respect VTK-m convention of parameters all or nothing on a line
clang-format BinPack settings have been disabled to make sure that the
VTK-m style guideline is obeyed.
2017-05-26 13:53:28 -04:00
Robert Maynard
ac4b2a1a3b Merge topic 'array_handle_reverse_write_2'
af61c7e6 Merge branch 'array_handle_reverse_write_2' of gitlab.kitware.com:ollielo/vtk-m into array_handle_reverse_write_2
869c9789 Merge branch 'array_handle_reverse_write_2' of gitlab.kitware.com:ollielo/vtk-m into array_handle_reverse_write_2
14592b82 Merge branch 'master' of gitlab.kitware.com:vtk/vtk-m into array_handle_reverse_write_2
1c50c86a remove commented out code to print values to stdout
80d4ee90 Enable output to ArrayHandleReverse
f64cb5ef Added test for using ArrayHandleReverse as output of ScanInclusiveByKey
87fe6768 thruw ErrorBadType, add PrepareForOutput from Pat
60542804 fixed a typo and added write support for ArrayHandleReverse

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !783
2017-05-25 17:50:29 -04:00
Li-Ta Lo
14592b8279 Merge branch 'master' of gitlab.kitware.com:vtk/vtk-m into array_handle_reverse_write_2 2017-05-25 09:05:14 -06:00
Robert Maynard
60a405ef65 Add TaskTiling1D/3D which use faux virtuals to reduce binary size.
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.

Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB

WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB

libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
2017-05-25 11:00:01 -04:00
Kitware Robot
4ade5f5770 clang-format: apply to the entire tree 2017-05-25 07:51:37 -04:00
Li-Ta Lo
f64cb5efb4 Added test for using ArrayHandleReverse as output of ScanInclusiveByKey
Relexed the constrain for ArrayHandleReverse to allocate storage for the
underlying ArryaHandle
2017-05-24 16:02:46 -06:00
Kitware Robot
efbde1d54b clang-format: sort include directives 2017-05-18 12:59:33 -04:00