The `DeviceAdapter` provides an abstract interface to the accelerator
devices worklets and other algorithms run on. As such, the programmer has
less control about how the device launches each worklet. Each device
adapter has its own configuration parameters and other ways to attempt to
optimize how things are run, but these are always a universal set of
options that are applied to everything run on the device. There is no way
to specify launch parameters for a particular worklet.
To provide this information, VTK-m now supports `Hint`s to the device
adapter. The `DeviceAdapterAlgorithm::Schedule` method takes a templated
argument that is of the type `HintList`. This object contains a template
list of `Hint` types that provide suggestions on how to launch the parallel
execution. The device adapter will pick out hints that pertain to it and
adjust its launching accordingly.
These are called hints rather than, say, directives, because they don't
force the device adapter to do anything. The device adapter is free to
ignore any (and all) hints. The point is that the device adapter can take
into account the information to try to optimize for itself.
A provided hint can be tied to specific device adapters. In this way, an
worklet can further optimize itself. If multiple hints match a device
adapter, the last one in the list will be selected.
The `Worklet` base now has an internal type named `Hints` that points to a
`HintList` that is applied when the worklet is scheduled. Derived worklet
classes can provide hints by simply defining their own `Hints` type.
Several revisions ago, the ability to use virtual methods in the
execution environment was deprecated. Completely remove this
functionality for the VTK-m 2.0 release.
This mechanism sets up CMake variables that allow a user to select which
modules/libraries to create. Dependencies will be tracked down to ensure
that all of a module's dependencies are also enabled.
The modules are also arranged into groups.
Groups allow you to set the enable flag for a group of modules at once.
Thus, if you have several modules that are likely to be used together,
you can create a group for them.
This can be handy in converting user-friendly CMake options (such as
`VTKm_ENABLE_RENDERING`) to the modules that enable that by pointing to
the appropriate group.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
Previously, each device adapter implementation had their own version of
these tests by including a common header. Simplify this by making a
single test in UnitTests_vtkm_cont_testing for each, which can now be
compiled for and tested on a device.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
Also re-enabled the testing of ranges for Vecs of size 9, which is now
supported.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
Previously, each device adapter implementation had their own version of
these tests by including a common header. Simplify this by making a
single test in UnitTests_vtkm_cont_testing for each, which can now be
compiled for and tested on a device.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
This header file contained tests for a bunch of fancy array handles so
that they could be compiled for each device. These tests were bunched
together because they were replicated for each device implementation,
which was a hassle. However, having a bunch of tests crammed together is
problematic for a number of reasons.
The new testing no longer has a need to make a separate test for each
device. Thus, the tests for the individual devices are removed, and the
tests are split up and added to the basic vtkm_cont tests. In some
cases, individual tests already existed there as well (probably because
the developer did not see the test).
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
Previously, each device adapter implementation had their own version of
this test by including a common header. Simplify this by making a single
test in UnitTests_vtkm_cont_testing, which can now be compiled for and
tested on a device.
The `vtkm_unit_tests` function in the CMake build now allows you to specify
which files need to be compiled with a device compiler using the
`DEVICE_SOURCES` argument. Previously, the only way to specify that unit
tests needed to be compiled with a device compiler was to use the
`ALL_BACKENDS` argument, which would automatically compile everything with
the device compiler as well as test the code on all backends.
`ALL_BACKENDS` is still supported, but it no longer changes the sources to
be compiled with the device compiler.
TBB 2020 introduced a new class called `task_group`. TBB 2021 removed
the old class `task` as its functionality was replaced by `task_group`.
Our parallel radix sort for TBB was implemented using `task`s, so change
it to use `task_group` (which actually cleans up the code a bit).
Years ago we discovered a problem with TBB's parallel sort, which we
patch in our local repo and submitted a change to TBB, which has been
accepted.
The code to decide whether to use our parallel_sort patch does not work
with the latest versions of TBB because it requires including a header
that changed names to get the TBB version.
We no longer support any TBB version with this bug, so just remove the
patch from VTK-m.
Deprecate `VirtualObjectHandle` and all other classes that are used to
implement objects with virtual methods in the execution environment.
Additionally, the code is updated so that if the
`VTKm_NO_DEPRECATED_VIRTUAL` flag is set none of the code is compiled at
all. This opens us up to opportunities that do not work with virtual
methods such as backends that do not support virtual methods and dynamic
libraries for CUDA.
This class was used indirectly by the old `ArrayHandle`, through
`ArrayHandleTransfer`, to move data to and from a device. This
functionality has been replaced in the new `ArrayHandle`s through the
`Buffer` class (which can be compiled into libraries rather than make
every translation unit compile their own template).
This commit removes `ArrayManagerExecution` and all the implementations
that the device adapters were required to make. None of this code was in
any use anymore.
When `DeviceAdapterAlgorithm::ScheduleTask` was called directly (i.e.
not through `Schedule`), nothing was added to the log. Adding
`VTKM_LOG_SCOPE` to these methods so that all scheduling is added to the
performance log.
The new name reflects better what the underlying algorithm does. It also
helps prevent confusion about what types of data the locator is good
for. The old name suggested it only worked for structured grids, which
is not the case.
As we remove more and more virtual methods from VTK-m, I expect several
users will be interested in completely removing them from the build for
several reasons.
1. They may be compiling for hardware that does not support virtual
methods.
2. They may need to compile for CUDA but need shared libraries.
3. It should go a bit faster.
To enable this, a CMake option named `VTKm_NO_DEPRECATED_VIRTUAL` is
added. It defaults to `OFF`. But when it is `ON`, none of the code that
both uses virtuals and is deprecated will be built.
Currently, only `ArrayHandleVirtual` is deprecated, so the rest of the
virtual classes will still be built. As we move forward, more will be
removed until all virtual method functionality is removed.
Now that we have atomic free functions (e.g. `vtkm::AtomicAdd()`), we no
longer need special implementations for control and each execution
device. (Well, technically we do have special implementations for each,
but they are handled with compiler directives in the free functions.)
Convert the old atomic interface classes (`AtomicInterfaceControl` and
`AtomicInterfaceExecution`) to use the new atomic free functions. This
will allow us to test the new atomic functions everywhere that atomics
are used in VTK-m.
Once verified, we can deprecate the old atomic interface classes.
Some of the code in the base `vtkm` namespace is device specific. For
example, the functions in `Math.h` are customized for specific devices.
Thus, we want this code to be specially compiled and run on these
devices.
Previously, we made a header file and then added separate tests to each
device package. That was created before we had ways of running on any
device. Now, it is much easier to compile the test a single time for all
devices and use the `ALL_BACKENDS` feature of `vtkm_unit_tests` CMake
function to automatically create the test for all devices.
The buffer class encapsulates the movement of raw C arrays between
host and devices.
The `Buffer` class itself is not associated with any device. Instead,
`Buffer` is used in conjunction with a new templated class named
`DeviceAdapterMemoryManager` that can allocate data on a given
device and transfer data as necessary. `DeviceAdapterMemoryManager`
will eventually replace the more complicated device adapter classes
that manage data on a device.
The code in `DeviceAdapterMemoryManager` is actually enclosed in
virtual methods. This allows us to limit the number of classes that
need to be compiled for a device. Rather, the implementation of
`DeviceAdapterMemoryManager` is compiled once with whatever compiler
is necessary, and then the `RuntimeDeviceInformation` is used to
get the correct object instance.
The `ArrayHandleStreaming` class stems from an old research project
experimenting with bringing data from an `ArrayHandle` in parts and
overlapping device transfer and execution. It works, but only in very
limited contexts. Thus, it is not actually used today. Plus, the feature
requires global indexing to be permutated throughout the worklet
dispatching classes of VTK-m for no further reason.
Because it is not really used, there are other more promising approaches
on the horizon, and it makes further scheduling improvements difficult,
we are removing this functionality.
1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1938