Scatter in worklets
Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.
See merge request !221
I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.
I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
Array transforms can now be created with an inverse functor, allowing for
casts back into the native array type. As a result, array transforms with
both a functor and inverse functor defined can perform read and write
operations. As an example, ArrayHandleCast now supports this operation. The
original implementation of ArrayHandleCast (i.e. read only) has been renamed
'ArrayHandleCastForInput'.
Previously, there was a table holding the number of vertices produced
for each MC case. However, what we really need is the number of
triangles, so we would have to divide that by 3. Instead, just store the
number of triangles.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.
The previous implementation was declaring static arrays in methods,
which cannot be used on a CUDA device. Instead, make static tables that
can be passed to the device with array handles (much like clip tables
do).
This has been requested on the mailing list to make it easier to
interpolate integer vectors.
There are a couple of downsides to this addition. First, it implicitly
casts doubles back to whatever the vector type is, which can cause a
loss of precision. Second, it makes it more likely to get overload
errors when multiplying with Vec. In particular, the operator to cast
Vec of size 1 to the component class had to be removed.
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.
To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
It is the case that there are two ways to create the output to input map in
a count scatter. The first is to use a parallel find for every output index.
The second, which is used when there are lots of output, is to iterate over
the input and write out the reverse map. In this case, it is trivial to also
write out the visit indices, so do that instead of a bunch more searches.
Previously, each VecFromPortalPermute (the type that held the from field
values) held its own copy of the indices. For point to cell on
structured grids, this was a lot of repeated data values, which has the
potential to fill up cache and registers. Instead, just use pointer
references.
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.
Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
The original isosurface code was not treating the origin and spacing of
the point coordinates correctly. Instead, it was ignoring the origin and
spacing and instead scaling all point coordinates to be in the unit cube
in world space (except there was also an off-by-one error in that). This
change recompensates by adjusting the origin and spacing to make the
correct position where the geometry was previously errantly placed.
Now that ScatterCounting is implemented, we can use that to implement a
good part of the triangle generation in the isosurface algorithm. This
changes the worklet from a basic map to a topology map, which also
reduces a lot of code.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.
Also added the VisitIndex execution signature tag, which relies on this
behavior.
The parallel implementation in CellSetExplicit that builds cell-to-point
connectivity from point-to-cell connectivity uses a parallel sort-by-
key. The sort-by-key in the device adapter is not guaranteed to be
stable, so values associated with a particular key can be in any order.
The test for the result was expecting the connectivity array to be in a
particular order. Change the test to allow any connectivity ordering
that is still valid.