For several versions, VTK-m has had a `Variant` templated class. This acts
like a templated union where the object will store one of a list of types
specified as the template arguments. (There are actually 2 versions for the
control and execution environments, respectively.)
Because this is a complex class that required several iterations to work
through performance and compiler issues, `Variant` was placed in the
`internal` namespace to avoid complications with backward compatibility.
However, the class has been stable for a while, so let us expose this
helpful tool for wider use.
This mechanism sets up CMake variables that allow a user to select which
modules/libraries to create. Dependencies will be tracked down to ensure
that all of a module's dependencies are also enabled.
The modules are also arranged into groups.
Groups allow you to set the enable flag for a group of modules at once.
Thus, if you have several modules that are likely to be used together,
you can create a group for them.
This can be handy in converting user-friendly CMake options (such as
`VTKm_ENABLE_RENDERING`) to the modules that enable that by pointing to
the appropriate group.
The `Variant` class was missing a way to check the type. You could do it
indirectly using `variant.GetIndex() == variant.GetIndexOf<T>()`, but
having this convenience function is more clear.
`Variant::CastAndCall` was using the C++11 style for an `auto` return
where the return type was specified with a `->` that got the `decltype`
of the return value of the functor. This was used as part of SFINAE to
pick whether you needed the const or non-const version.
However, this was causing a problem with functions that got an error
when deducing the return type for that. This was particularly
problematic for lambda functions. For example, say you have the
following simple `CastAndCall`.
```cpp
variant.CastAndCall([](auto& x){ ++x; });
```
To determine the return type of the lambda (`void`), the function has to
be compiled. But when it is compiled with a const type, which happens
when deducing the const version of `CastAndCall`, you get a compile
error. This error is not considered a substitution error (hence SFINAE),
it is an outright error. So you get a compile error just trying to
deduce the type.
The solution was to move to the C++14 version of an auto return type. In
this case, the return type is no longer important for SFINAE and is
delayed until the function is actually compiled with the specific
template parameters. This would be a problem if the const version of
`CastAndCall` was used when the non-const version was needed. But now
both versions will pass SFINAE and thus the non-const version will be
chosen as long as the `Variant` object itself is non-const. If the
`Variant` object itself is const, then that is in fact a legitimate
error, so a compile error is OK.
One thing I find wierd is that `CastAndCall` still has a `noexcept`
expression that will likewise cause a compile error in this case.
However, it is still working. I _think_ the difference is that
`noexcept` is not used to determine template substitution/overloaded, so
is therefore ignored until the function is actually compiled.
There was a typo in the declaration of the `CastAndCall` for a non-const
`Variant`. When determining the `noexcept` status of the function being
called, it was passing in a const reference instead of a regular
reference, which is what is actually passed to the function. This
potentially causes the function call to not match and fail to compile.
There was a bug where if you attempted to copy a variant that was not
valid (i.e. did not hold an object), a seg fault could happen. This has
been changed to set the target variant to also be invalid.
Previously, if you called `Get` on a `Variant` with a type that is not
in the list of types supported by the `Variant`, that would attempt to
look up the type at index `-1` and could spin the compiler into an
endless loop.
Instead, check for the case where you are attempting to get a type from
the `Variant` not listed in its templat arguments. In this case, instead
of producing a compiler error, produce a runtime error. Although this
increases the possibility that a bad compile path is being generated, it
simplifies creating templated code that produces cases we don't care
about.
Create a `VaraintUnion` that is an actual C++ `union` to store the data
in a `Variant`.
You may be asking yourself, why not just use an `std::aligned_union`
rather than a real union type? That was our first implementation, but
the problem is that the `std::aligned_union` reference needs to be
recast to the actual type. Typically you would do that with
`reinterpret_cast`. However, doing that leads to undefined behavior. The
C++ compiler assumes that 2 pointers of different types point to
different memory (even if it is clear that they are set to the same
address). That means optimizers can remove code because it "knows" that
data in one type cannot affect data in another type. To safely change
the type of an `std::aligned_union`, you really have to do an
`std::memcpy`. This is problematic for types that cannot be trivially
copied. Another problem is that we found that device compilers do not
optimize the memcpy as well as most CPU compilers. Likely, memcpy is
used much less frequently on GPU devices.
`std::is_trivial` is part of the C++14 specification. However, we have
encountered multiple compilers that purport to implement C++14 but do
not implement `std::is_trivial` and the like checks correctly.
To avoid such issues, only use `std::is_trivial` on compilers that we
have tested to support it.
The newer version of `ArrayHandle` no longer supports different types of
portals for different devices. Thus, the `ReadPortalType` and
`WritePortalType` are sufficient for all types of portals across all
devices.
This significantly simplifies supporting execution objects on devices,
and thus this change also includes many changes to various execution
objects to remove their dependence on the device adapter tag.
The `Variant` class has separate implementations for its move and copy
constructors/assignment operators depending on whether the classes it
holds can be trivially moved. If the objects are trivial, Variant is
trivial as well. However, in the case where the objects are not trivial,
special construction and copying needs to be done.
Previously, the non-trivial `Variant` defined a move constructor that
did a byte copy of the contained object and reset the right hand side
object so that it did not attempt to destroy the object. That usually
works because it guarantees that only one version of the `Variant` will
attempt to destroy the object and its resources should be cleaned up
correctly.
But C++ is a funny language that lets you do weird things. Turns out
there are cases where moving the location of memory for an object
without calling the proper copy method can invalidate the object. For
example, if the object holds a pointer to one of its own members, that
pointer will become invalid. Also, if it points to something that points
back, then the object will need to update those pointers when it is
moved. GCC's version of `std::string` seems to be a type like this.
Solve the problem by simply deleting the move constructors. The copy
constructors and destructor will be called instead to properly manage
the object. A test for these conditions is added to `UnitTestVariant`.
The `Variant` class is templated to hold objects of other types.
Depending on whether those objects of are meant to be used in the
control or execution side, the methods on `Variant` might need to be
declared with (or without) special modifiers. We can sometimes try to
compile the `Variant` methods for both host and device and ask the
device compiler to ignore incompatibilities, but that does not always
work.
To get around that, create two different implementations of `Variant`.
Their API and implementation is exactly the same except one declares its
methods with `VTKM_CONT` and the other its methods `VTKM_EXEC`.
The only reason Keys has a template is so that it can hold a UniqueKeys
array and provide the key for each group. If that is not needed and you
want to implement a library function that takes a keys object, you can
now grab the Keys superclass KeysBase. KeysBase is not templated, so you
can pass it to a standard method in a library.
This change is needed for being able to use different thread indices types
without changing Fetchs. Basically decoupling those two areas.
1. This commit removes concrete specialization instantiations of
ThreadIndicesTypes in all of the Fetch's specializations.
2. It also moves the ThreadIndicesType template parameter from the Fetch
struct to a template parameter in their methods Load/Store.
Signed-off-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
The `ArrayHandleStreaming` class stems from an old research project
experimenting with bringing data from an `ArrayHandle` in parts and
overlapping device transfer and execution. It works, but only in very
limited contexts. Thus, it is not actually used today. Plus, the feature
requires global indexing to be permutated throughout the worklet
dispatching classes of VTK-m for no further reason.
Because it is not really used, there are other more promising approaches
on the horizon, and it makes further scheduling improvements difficult,
we are removing this functionality.
Yet more ways that we can reduce the complexity of `FunctionInterface`.
This is another step in figuring out what set of features the replacement
for `FunctionInterface` needs to have.
It is very easy to cause ODR violations with DeviceAdapterTagCuda.
If you include that header from a C++ file and a CUDA file inside
the same program we an ODR violation. The reasons is that the C++
versions will say the tag is invalid, and the CUDA will say the
tag is valid.
The solution to this is that any compilation unit that includes
DeviceAdapterTagCuda from a version of VTK-m that has CUDA enabled
must be invoked by the cuda compiler.
This adds an ExecutionSignature tag named Device that passes the
DeviceAdapterTag as an argument to the worklet's operator(). This allows
worklets to specialize their code based on the device.
Mask objects allow you to specify which output values should be
generated when a worklet is run. That is, the Mask allows you to skip
the invocation of a worklet for any number of outputs.
`vtkm::cont::testing` now initializes with logging enabled and support
for device being passed on the command line, `vtkm::testing` only
enables logging.
There was a direct edit to WorkletInvokeFunctorDetail.h that was not
reflected in WorkletInvokeFunctorDetail.h.in. This makes the two files
consistent so that future edits will not loose the changes.
While making changes to how execution objects work, we had agreed to
name the base object ExecutionObjectBase instead of its original name of
ExecutionObjectFactoryBase. Somehow that change did not make it through.
In order to make the change from the current way execution obejcts are utilized to the new proposed executionObjectFactory process type checks now has to look for the new execution object factory class to check against.
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
1. Add a 'FastVec' class that copies input vector types to an efficient
Vec type on the stack. Specializations avoid copies of already efficient
types.
2. Update 'WorldCoordinatesToParametricCoordinates' functions to utilize the
'FastVec' class. This should improve performance when the passed in
vectors are of slow types like 'vtkm::VecFromPortalPermute'.
3. Since most input Vec types will convert to the same 'FastVec' type this
also reduces the code generations. Some code refactoring was required for
this.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.