Some versions of doxygen have issues with documenting `typedef`s (or the
equivalent `using`). This was causing warnings with doxygen and failing
to create some of the documentation. This fixes the problem by moving the
documentation to the classes things are defined to.
The `DeviceAdapter` provides an abstract interface to the accelerator
devices worklets and other algorithms run on. As such, the programmer has
less control about how the device launches each worklet. Each device
adapter has its own configuration parameters and other ways to attempt to
optimize how things are run, but these are always a universal set of
options that are applied to everything run on the device. There is no way
to specify launch parameters for a particular worklet.
To provide this information, VTK-m now supports `Hint`s to the device
adapter. The `DeviceAdapterAlgorithm::Schedule` method takes a templated
argument that is of the type `HintList`. This object contains a template
list of `Hint` types that provide suggestions on how to launch the parallel
execution. The device adapter will pick out hints that pertain to it and
adjust its launching accordingly.
These are called hints rather than, say, directives, because they don't
force the device adapter to do anything. The device adapter is free to
ignore any (and all) hints. The point is that the device adapter can take
into account the information to try to optimize for itself.
A provided hint can be tied to specific device adapters. In this way, an
worklet can further optimize itself. If multiple hints match a device
adapter, the last one in the list will be selected.
The `Worklet` base now has an internal type named `Hints` that points to a
`HintList` that is applied when the worklet is scheduled. Derived worklet
classes can provide hints by simply defining their own `Hints` type.
The `vtkm::worklet::Keys` object held a `SortedValuesMap` array, an
`Offsets` array, a `Counts` array, and (optionally) a `UniqueKeys` array.
Of these, the `Counts` array is redundant because the counts are trivially
computed by subtracting adjacent entries in the offsets array. This pattern
shows up a lot in VTK-m, and most places we have moved to removing the
counts and just using the offsets.
This change removes the `Count` array from the `Keys` object. Where the
count is needed internally, adjacent offsets are subtracted. The deprecated
`GetCounts` method is implemented by copying values into a new array.
There are occasions when you need a worklet to opeate on 2D or 3D
indices. Most worklets operate on 1D indices, which requires recomputing
the 3D index in each worklet instance. A workaround is to use a worklet
that does a 3D scheduling and pull the working index from that.
The problem was that there was no easy way to get this 3D index. To
provide this option, a feature was added to the `BoundaryState` class
that can be provided by `WorkletPointNeighborhood`.
Thus, to get a 3D index in a worklet, use the
`WorkletPointNeighborhood`, add `Boundary` as an argument to the
`ExecutionSignature`, and then call `GetCenterIndex` on the
`BoundaryState` object passed to the worklet operator.
The structured connectivity classes are templated on two tags to
determine what 2 incident topological elements are being accessed. Back
in the day, these were called the "from" elements and "to" elements, as
taken from VTK filter names like `PointDataToCellData`. However, these
names were found to be very confusion, and after much debate they have
been renamed to the visit element type and the incident element type.
Meaning that a worklet is "visiting" elements of a particular type (such
as visiting each cell) and can access "incident" elements of a
particular type (such as the points incident on the cell).
I found a few methods converting flat and logical indices using the old,
confusing from/to convention. This changes them to the new convention.
The `Fetch::Load` for an input array of a topology map for an XGC array
(that uses `ConnectivityExtrude`) was failing to compile because it
creates a return `Vec` and then fills it. That does not work if the
input array has values that cannot be default constructed. This is the
case for `ArrayHandleRecombineVec`, which creates values that lazily
pull data out of portals.
Change the code to careful construct the return `Vec` such that it does
not require the default constructor of the components.
When you use an `ArrayHandle` as an output array in a worklet (for example,
as a `FieldOut`), the fetch operation does not read values from the array
during the `Load`. Instead, it just constructs a new object. This makes
sense as an output array is expected to have garbage in it anyway.
This is a problem for some special arrays that contain `Vec`-like objects
that are sized dynamically. For example, if you use an
`ArrayHandleGroupVecVariable`, each entry is a dynamically sized `Vec`. The
array is referenced by creating a special version of `Vec` that holds a
reference to the array portal and an index. Components are retrieved and
set by accessing the memory in the array portal. This allows us to have a
dynamically sized `Vec` in the execution environment without having to
allocate within the worklet.
The problem comes when we want to use one of these arrays with `Vec`-like
objects for an output. The typical fetch fails because you cannot construct
one of these `Vec`-like objects without an array portal to bind it to. In
these cases, we need the fetch to create the `Vec`-like object by reading
it from the array. Even though the data will be garbage, you get the
necessary buffer into the array (and nothing more).
Previously, the problem was fixed by creating partial specializations of
the `Fetch` for these `ArrayHandle`s. This worked OK as long as you were
using the array directly. However, the approach failed if the `ArrayHandle`
was wrapped in another `ArrayHandle` (for example, if an `ArrayHandleView`
was applied to an `ArrayHandleGroupVecVariable`).
To get around this problem and simplify things, the basic `Fetch` for
direct output arrays is changed to handle all cases where the values in the
`ArrayHandle` cannot be directly constructed. A compile-time check of the
array's value type is checked with `std::is_default_constructible`. If it
can be constructed, then the array is not accessed. If it cannot be
constructed, then it grabs a value out of the array.
CUDA 12 adds a `cub::Swap` function that creates ambiguity with `vtkm::Swap`.
This happens when a function from the `cub` namespace is called with an object
of a class defined in the `vtkm` namespace as an argument. If that function
has an unqualified call to `Swap`, it results in ADL being used, causing the
templated functions `cub::Swap` and `vtkm::Swap` to conflict.
With the major revision 2.0 of VTK-m, many items previously marked as
deprecated were removed. If updating to a new version of VTK-m, it is
recommended to first update to VTK-m 1.9, which will include the deprecated
features but provide warnings (with the right compiler) that will point to
the replacement code. Once the deprecations have been fixed, updating to
2.0 should be smoother.
For several versions, VTK-m has had a `Variant` templated class. This acts
like a templated union where the object will store one of a list of types
specified as the template arguments. (There are actually 2 versions for the
control and execution environments, respectively.)
Because this is a complex class that required several iterations to work
through performance and compiler issues, `Variant` was placed in the
`internal` namespace to avoid complications with backward compatibility.
However, the class has been stable for a while, so let us expose this
helpful tool for wider use.
Several revisions ago, the ability to use virtual methods in the
execution environment was deprecated. Completely remove this
functionality for the VTK-m 2.0 release.
`ExecutionWholeArray` is an archaic class in VTK-m that is a thin wrapper
around an array portal. In the early days of VTK-m, this class was used to
transfer whole arrays to the execution environment. However, now the
supported method is to use `WholeArray*` tags in the `ControlSignature` of
a worklet.
Nevertheless, the `WholeArray*` tags caused the array portal transferred to
the worklet to be wrapped inside of an `ExecutionWholeArray` class. This
is unnecessary and can cause confusion about the types of data being used.
Most code is unaffected by this change. Some code that had to work around
the issue of the portal wrapped in another class used the `GetPortal`
method which is no longer needed (for obvious reasons). One extra feature
that `ExecutionWholeArray` had was that it provided an subscript operator
(somewhat incorrectly). Thus, any use of '[..]' to index the array portal
have to be changed to use the `Get` method.
This mechanism sets up CMake variables that allow a user to select which
modules/libraries to create. Dependencies will be tracked down to ensure
that all of a module's dependencies are also enabled.
The modules are also arranged into groups.
Groups allow you to set the enable flag for a group of modules at once.
Thus, if you have several modules that are likely to be used together,
you can create a group for them.
This can be handy in converting user-friendly CMake options (such as
`VTKm_ENABLE_RENDERING`) to the modules that enable that by pointing to
the appropriate group.
Rather than try to collect all `LastCell` types inside of a single
header and make an uber type, have each cell locator define its own cell
locator type and use that.
The `Variant` class was missing a way to check the type. You could do it
indirectly using `variant.GetIndex() == variant.GetIndexOf<T>()`, but
having this convenience function is more clear.
`Variant::CastAndCall` was using the C++11 style for an `auto` return
where the return type was specified with a `->` that got the `decltype`
of the return value of the functor. This was used as part of SFINAE to
pick whether you needed the const or non-const version.
However, this was causing a problem with functions that got an error
when deducing the return type for that. This was particularly
problematic for lambda functions. For example, say you have the
following simple `CastAndCall`.
```cpp
variant.CastAndCall([](auto& x){ ++x; });
```
To determine the return type of the lambda (`void`), the function has to
be compiled. But when it is compiled with a const type, which happens
when deducing the const version of `CastAndCall`, you get a compile
error. This error is not considered a substitution error (hence SFINAE),
it is an outright error. So you get a compile error just trying to
deduce the type.
The solution was to move to the C++14 version of an auto return type. In
this case, the return type is no longer important for SFINAE and is
delayed until the function is actually compiled with the specific
template parameters. This would be a problem if the const version of
`CastAndCall` was used when the non-const version was needed. But now
both versions will pass SFINAE and thus the non-const version will be
chosen as long as the `Variant` object itself is non-const. If the
`Variant` object itself is const, then that is in fact a legitimate
error, so a compile error is OK.
One thing I find wierd is that `CastAndCall` still has a `noexcept`
expression that will likewise cause a compile error in this case.
However, it is still working. I _think_ the difference is that
`noexcept` is not used to determine template substitution/overloaded, so
is therefore ignored until the function is actually compiled.
There was a typo in the declaration of the `CastAndCall` for a non-const
`Variant`. When determining the `noexcept` status of the function being
called, it was passing in a const reference instead of a regular
reference, which is what is actually passed to the function. This
potentially causes the function call to not match and fail to compile.
There was a bug where if you attempted to copy a variant that was not
valid (i.e. did not hold an object), a seg fault could happen. This has
been changed to set the target variant to also be invalid.
We have been doing a better job at hiding device code (and moving code
into libraries). Smoke out source that no longer needs to be compiled by
device compilers.