We have run into an issue with some Intel compilers where if a `union`
contains a `struct` that has some padding for byte alignment, the value
copy might skip over that padding even when the `union` contains a different
type where those bytes are valid. This breaks the value copy of our
`Variant` class.
This is not a unique problem. We have seen the same thing in other
compilers and already have a workaround for when this happens. The
workaround creates a special struct that has no padding placed at the front
of the `union`. The Intel compiler adds a fun twist in that this
placeholder structure only works if the alignment is at least as high as
the struct that follows it.
To get around this problem, make the alignment of the placeholder `struct`
at large as possible for the size of the `union`.
It was taking too long to compile with a device. Redesign the
placeholder to be simpler and compile faster.
The structured connectivity classes are templated on two tags to
determine what 2 incident topological elements are being accessed. Back
in the day, these were called the "from" elements and "to" elements, as
taken from VTK filter names like `PointDataToCellData`. However, these
names were found to be very confusion, and after much debate they have
been renamed to the visit element type and the incident element type.
Meaning that a worklet is "visiting" elements of a particular type (such
as visiting each cell) and can access "incident" elements of a
particular type (such as the points incident on the cell).
I found a few methods converting flat and logical indices using the old,
confusing from/to convention. This changes them to the new convention.
This commit adds the flag VTKm_ENABLE_GPU_MPI which when enable it
will use GPU AWARE MPI.
- This will only work with GPUs and MPI implementation that supports GPU
AWARE MPI calls.
- Enabling VTKm_ENABLE_GPU_MPI without MPI/GPU support might results in
errors when running VTK-m with DIY/MPI.
- Only the following tests can run with this feature if enabled:
- UnitTestSerializationDataSet
- UnitTestSerializationArrayHandle
Add a regression test for passing a variant as an argument to a worklet.
Variants are tricky objects, and the compiler might make some strange
assumptions during optimization.
One case that popped up in particular was when a variant contained two
objects with the same `sizeof` but the first object had padding. When the
variant contained the second object, the value under the padded part of
the first object was not set.
I've been seeing errors in a nightly build that compiles for CUDA Pascal
using GCC5. The issue is that one of the `ArrayHandleMultiplexer` tests
is failing to copy an implicit array correctly. I think the problem is
that in this test the first and second type of the `Variant` are the same
size, but the first type has some padding in the middle whereas the
second type does not. When using this second type, the values in the
same position of the padding of the first type don't seem to be
initialized properly in the kernel invocation.
My nonexhaustive experiment shows that things work OK as long as the
first type is large enough and has no fillers. Enforce this by adding an
internal entry to the union that is completely full.
The `VecTraits` class allows templated functions, methods, and classes to
treat type arguments uniformly as `Vec` types or to otherwise differentiate
between scalar and vector types. This only works for types that `VecTraits`
is defined for.
The `VecTraits` templated class now has a default implementation that will
be used for any type that does not have a `VecTraits` specialization. This
removes many surprise compiler errors when using a template that, unknown
to you, has `VecTraits` in its implementation.
One potential issue is that if `VecTraits` gets defined for a new type, the
behavior of `VecTraits` could change for that type in backward-incompatible
ways. If `VecTraits` is used in a purely generic way, this should not be an
issue. However, if assumptions were made about the components and length,
this could cause problems.
Fixes#589
267ee49cb docker: update kokkos hip image
67bf9a966 docker: update kokkos hip image
50d4ab5cc CMake: VTKm_ENABLE_KOKKOS_THRUST to depend on Kokkos with CUDA or HIP enabled
0604f314a Fix for building SERIAL unit tests with KOKKOS_HIP/CUDA enabled
802bf80b0 CI: Add rocthrust installation step
fda475d5b Rework Thrust CMake options
e6f63a807 Adding CMake tweaks to turn off thrust algorithms if thrust is not detected.
5a72275ed Rewrite sorting specialization using std::enable_if_t
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !2926
With the major revision 2.0 of VTK-m, many items previously marked as
deprecated were removed. If updating to a new version of VTK-m, it is
recommended to first update to VTK-m 1.9, which will include the deprecated
features but provide warnings (with the right compiler) that will point to
the replacement code. Once the deprecations have been fixed, updating to
2.0 should be smoother.
For several versions, VTK-m has had a `Variant` templated class. This acts
like a templated union where the object will store one of a list of types
specified as the template arguments. (There are actually 2 versions for the
control and execution environments, respectively.)
Because this is a complex class that required several iterations to work
through performance and compiler issues, `Variant` was placed in the
`internal` namespace to avoid complications with backward compatibility.
However, the class has been stable for a while, so let us expose this
helpful tool for wider use.
Several revisions ago, the ability to use virtual methods in the
execution environment was deprecated. Completely remove this
functionality for the VTK-m 2.0 release.
This mechanism sets up CMake variables that allow a user to select which
modules/libraries to create. Dependencies will be tracked down to ensure
that all of a module's dependencies are also enabled.
The modules are also arranged into groups.
Groups allow you to set the enable flag for a group of modules at once.
Thus, if you have several modules that are likely to be used together,
you can create a group for them.
This can be handy in converting user-friendly CMake options (such as
`VTKm_ENABLE_RENDERING`) to the modules that enable that by pointing to
the appropriate group.
The `Variant` class was missing a way to check the type. You could do it
indirectly using `variant.GetIndex() == variant.GetIndexOf<T>()`, but
having this convenience function is more clear.
This commit address issue #713 and removes the need to pass information
about the spatial decomposition (block origins, block sizes, block index
and blocks per dimension) to the contour tree filter. Block origin
information is added to CellSetStructured (as GlobalPointSize) and the
distributed contour tree filter will get this information from
CellSetStructured instead of expecting it as parameters to the
constructor. Information about blocks per dimension and block indices
are computed from the information in CellSetStructure albeit requiring
global communication across all ranks. To avoid this communication cost,
the caller of the filter may explicitly specify this information via the
SetBlockIndices() method.
`Variant::CastAndCall` was using the C++11 style for an `auto` return
where the return type was specified with a `->` that got the `decltype`
of the return value of the functor. This was used as part of SFINAE to
pick whether you needed the const or non-const version.
However, this was causing a problem with functions that got an error
when deducing the return type for that. This was particularly
problematic for lambda functions. For example, say you have the
following simple `CastAndCall`.
```cpp
variant.CastAndCall([](auto& x){ ++x; });
```
To determine the return type of the lambda (`void`), the function has to
be compiled. But when it is compiled with a const type, which happens
when deducing the const version of `CastAndCall`, you get a compile
error. This error is not considered a substitution error (hence SFINAE),
it is an outright error. So you get a compile error just trying to
deduce the type.
The solution was to move to the C++14 version of an auto return type. In
this case, the return type is no longer important for SFINAE and is
delayed until the function is actually compiled with the specific
template parameters. This would be a problem if the const version of
`CastAndCall` was used when the non-const version was needed. But now
both versions will pass SFINAE and thus the non-const version will be
chosen as long as the `Variant` object itself is non-const. If the
`Variant` object itself is const, then that is in fact a legitimate
error, so a compile error is OK.
One thing I find wierd is that `CastAndCall` still has a `noexcept`
expression that will likewise cause a compile error in this case.
However, it is still working. I _think_ the difference is that
`noexcept` is not used to determine template substitution/overloaded, so
is therefore ignored until the function is actually compiled.
There was a typo in the declaration of the `CastAndCall` for a non-const
`Variant`. When determining the `noexcept` status of the function being
called, it was passing in a const reference instead of a regular
reference, which is what is actually passed to the function. This
potentially causes the function call to not match and fail to compile.
There was a bug where if you attempted to copy a variant that was not
valid (i.e. did not hold an object), a seg fault could happen. This has
been changed to set the target variant to also be invalid.
The current implementation of the AMD HIP compiler adds default compiler
flags to attempt to inline everything possible. This is wrecking havoc
with the `Variant::CastAndCall` function, whose implementation has one
or more large switch statements with each case calling a different
potentially inline function. In some parts of the VTK-m code, this is
dragging the compilation on for days as it tries to resolve a
`Variant::CastAndCall` within a `Variant::CastAndCall`. This will
probably need to be addressed in the compiler, but meanwhile we will
force the inlining to be turned off for the function called by
`Variant::CastAndCall`. It is unclear if this will result in some extra
runtime overhead, but the change is worth it to get reasonable compile
times.
Thanks to Nick Curtis for tracking this down and providing the solution.
The previous implementation of `Variant`'s `CastAndCall` generated a
switch statement with 20 cases (plus a default) regardless of how many
types were handled by the `Variant` (with the excess doing nothing
useful). This reduced the amount of code, but caused the compiler to
have to build many more instructions (and optimize for them). This in
turn lead to large compile times and unnecessary large libraries/
executables.
This change makes a different function to use for `CastAndCall` so that
the number of cases in the switch matches exactly the number of types in
the `Variant`'s union.
Because the size of VariantImplDetail.h was getting large, I also
reduced the maximum expansions for the code. This does not seem to
negatively affect compile time, and I doubt it will have an noticible
difference in running time (when in release mode).
I also modified some other parts of this code to match the expansion
without making unnecessary defaults.
Recently, an instantiation method was added to the VTK-m configuration
files to set up a set of source files that compile instances of a template.
This allows the template instances to be compiled exactly once in separate
build files.
However, the implementation made the assumption that the instantiations
were happening for VTK-m filters. Now that the VTK-m filters are being
redesigned, this assumption is broken.
Thus, the instantiation code has been redesigned to be more general. It can
now be applied to code within the new filter structure. It can also be
applied anywhere else in the VTK-m source code.
In pretty much any practical circumstance, whenusing `ListAll` or
`ListAny`, you have a list of types on which you run some sort of
predicate on each item in the list to determine whether any or all of
the items match the predicate. To make this easier, add a second
argument to `ListAll` and `ListAny` to provide a predicate that will
automatically be added.
If no predicate is given, then the operation is run directly on the
list. This is implemented by just using an identity operation.
GCC 11 is having trouble compiling brigand.hpp at all, even before we
instantiate any templates. Since we no longer need it, let's get rid of
it. It was always placed in an internal namespace.
Add deprecation warnings to the code whenever someone uses brigand.hpp.
We are no longer supporting this header file, but we'll give code a
chance to transition off of it.
Also added some other deprecation warnings to other header files that
are themselves deprecated but only issued warnings if you used something
in it.
These new features to VTK-m lists allow you to compute a single value
from a list. `ListReduce` allows you to compute a value based on a
predicate. `ListAll` and `ListAny` use this feature to determine if all
or any of a list of `true_type` or `false_type` objects are true.
Previously, if you called `Get` on a `Variant` with a type that is not
in the list of types supported by the `Variant`, that would attempt to
look up the type at index `-1` and could spin the compiler into an
endless loop.
Instead, check for the case where you are attempting to get a type from
the `Variant` not listed in its templat arguments. In this case, instead
of producing a compiler error, produce a runtime error. Although this
increases the possibility that a bad compile path is being generated, it
simplifies creating templated code that produces cases we don't care
about.
Deprecate `VirtualObjectHandle` and all other classes that are used to
implement objects with virtual methods in the execution environment.
Additionally, the code is updated so that if the
`VTKm_NO_DEPRECATED_VIRTUAL` flag is set none of the code is compiled at
all. This opens us up to opportunities that do not work with virtual
methods such as backends that do not support virtual methods and dynamic
libraries for CUDA.
For some reason some versions of the CUDA compiler would return true for
`is_trivially_copyable` on a `VariantUnion` even when the types of the
union caused the copy constructor to get deleted.
Solve the problem by using `AllTriviallyCopyable` instead of directly
caling `is_trivially_copyable` on the union.
Nvcc was having troubles resolving the return type of the overloaded
function to get a value out of a `VariantUnion`. Replace the
implementation with a class with specializations. This is more verbose,
but easier on the compiler.
Create a `VaraintUnion` that is an actual C++ `union` to store the data
in a `Variant`.
You may be asking yourself, why not just use an `std::aligned_union`
rather than a real union type? That was our first implementation, but
the problem is that the `std::aligned_union` reference needs to be
recast to the actual type. Typically you would do that with
`reinterpret_cast`. However, doing that leads to undefined behavior. The
C++ compiler assumes that 2 pointers of different types point to
different memory (even if it is clear that they are set to the same
address). That means optimizers can remove code because it "knows" that
data in one type cannot affect data in another type. To safely change
the type of an `std::aligned_union`, you really have to do an
`std::memcpy`. This is problematic for types that cannot be trivially
copied. Another problem is that we found that device compilers do not
optimize the memcpy as well as most CPU compilers. Likely, memcpy is
used much less frequently on GPU devices.
I have run into strange problems with reference types sneaking into
templates. Reference types can cause weird errors when passing things
across hosts and devices, and it's best not to let them sneak into an
object like Varient where they become more difficult to check for.