Scatter in worklets
Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.
See merge request !221
Array transforms can now be created with an inverse functor, allowing for
casts back into the native array type. As a result, array transforms with
both a functor and inverse functor defined can perform read and write
operations. As an example, ArrayHandleCast now supports this operation. The
original implementation of ArrayHandleCast (i.e. read only) has been renamed
'ArrayHandleCastForInput'.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.
Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
Now that ScatterCounting is implemented, we can use that to implement a
good part of the triangle generation in the isosurface algorithm. This
changes the worklet from a basic map to a topology map, which also
reduces a lot of code.
The parallel implementation in CellSetExplicit that builds cell-to-point
connectivity from point-to-cell connectivity uses a parallel sort-by-
key. The sort-by-key in the device adapter is not guaranteed to be
stable, so values associated with a particular key can be in any order.
The test for the result was expecting the connectivity array to be in a
particular order. Change the test to allow any connectivity ordering
that is still valid.
A recent change to the DeviceAdapter header includes the TBB device if
available instead of the serial device. Thus, DeviceAdapterTagSerial was
not defined automatically in all cases for the build of
UnitTestDataSetPermutation. Add the header for that explicitly.
adding VTK file exporter and test cases
This adds a legacy VTK file exporter which supports unstructured, explicit, and point meshes. (Single Cell Type cell sets are also supported.)
See merge request !247
CUDA default constructors, destructors, and assignment operators
Several classes exclusively work in the control environment. However, CUDA likes to add __device__ to constructors, destructors, and assignment operators it automatically creates. This in turn causes warnings about the __device__ function using host-only classes (like boost::shared_ptr). Solve this problem by adding explicit methods for all of these.
See merge request !245
The DynamicArrayHandle and DynamicCellSet classes exclusively work in
the control environment. However, CUDA likes to add __device__ to
constructors, destructors, and assignment operators it automatically
adds. This in turn causes warnings about the __device__ function using
host-only classes (like boost::shared_ptr). Solve this problem by adding
explicit methods for all of these.
The CellSet classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.
Thanks to Tom Fogal from NVIDIA for these fixes.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
See merge request !233
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
9a8809f9 Add CellSetPermutation which allows custom iteration over a cell set.
66f6db5a IsWriteableArrayHandle now can tell if an array handle can be written too
20f3fb50 Update VertexClustering to use vtkm::cont::CellSetSingleType.
154896b7 Extend the test for DataSetSingleType.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !228
When you create a CellSetPermutation you provide an array of the cell ids that
you want to iterate. This allows the user to do custom blanking of a data set,
or to do multi iteration over a set of cells.