CUDA default constructors, destructors, and assignment operators
Several classes exclusively work in the control environment. However, CUDA likes to add __device__ to constructors, destructors, and assignment operators it automatically creates. This in turn causes warnings about the __device__ function using host-only classes (like boost::shared_ptr). Solve this problem by adding explicit methods for all of these.
See merge request !245
The DynamicArrayHandle and DynamicCellSet classes exclusively work in
the control environment. However, CUDA likes to add __device__ to
constructors, destructors, and assignment operators it automatically
adds. This in turn causes warnings about the __device__ function using
host-only classes (like boost::shared_ptr). Solve this problem by adding
explicit methods for all of these.
The CellSet classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.
Thanks to Tom Fogal from NVIDIA for these fixes.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
See merge request !233
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
9a8809f9 Add CellSetPermutation which allows custom iteration over a cell set.
66f6db5a IsWriteableArrayHandle now can tell if an array handle can be written too
20f3fb50 Update VertexClustering to use vtkm::cont::CellSetSingleType.
154896b7 Extend the test for DataSetSingleType.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !228
When you create a CellSetPermutation you provide an array of the cell ids that
you want to iterate. This allows the user to do custom blanking of a data set,
or to do multi iteration over a set of cells.
8bc40880 Add a test for CellSetExplicit::GetIndices
fa81d1de CellSetExplicit always calls methods using "this->"
9e496306 Allow incremental construction of CellSetSingleType.
3e307879 Mark CellSetExplicit::Fill as host side only.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !227
Remove the const from the ValueType of the delegate portal in
ArrayPortalGroupVec. This was creating a Vec with a const type, which
was immutable, which was problematic when trying to create the Vec in
the first place.
The testing of ArrayHandleGroupVec was just using the == operator to
check values. Even though we are not doing any math, optimizers can
sometimes make float values slightly different anyway, so test_equal
should give the correct comparison.
Previously if you created a cell set explicit and didn't set the number of
points you would get a runtime error when you over-ran an array's bounds.
Now we account for this use case and properly generate the Cell To Point
Connectivity.
Even when using implicit index's the ConnectivityExplicit would generate
the code to compute the IndexOffsets, which would than fail to compile as
the ArrayHandle would only support read operations. This fixes that issue.
19cebccf Correct issues that buildbot brought up in the code.
c6dbc0f2 GetNumberOfPointsInCell consistently returns a vtkm::IdComponent
25ff1e94 CellSetExplicit storage tags are now easier to override.
935b3fd6 CellSetExplicit uses UInt8 for shape, and IdComponent for numIndices.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !210
Add new version of DynamicArrayHandle::CastToArrayHandle
This takes a reference to an array handle and fills it. This removes a lot of the pain of determining template arguments.
See merge request !205
98885186 Fix CopyInto tests that use different DeviceAdapterTag
69b2ad2a Add unit tests for CopyInto function
2c55b15c Add additional control logic for CopyInto function
20c1a048 CopyInto function for ArrayHandles
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !202
Xcode 7 warnings
The XCode 7 compiler has a new warning for unused typedefs. The Boost code we use has some instances where this warning gets issued. Suppress these warnings.
See merge request !199
This is to be used in place of BOOST_STATIC_ASSERT so that we can
control its implementation.
The implementation is designed to fix the issue where the latest XCode
clang compiler gives a warning about a unused typedefs when the boost
static assert is used within a function. (This warning also happens when
using the C++11 static_assert keyword.) You can suppress this warning
with _Pragma commands, but _Pragma commands inside a block is not
supported in GCC. The implementation of VTKM_STATIC_ASSERT handles all
current cases.
ArrayHandles in DAX have a CopyInto function which allows the user to copy an array handle's data into a compatible STL type iterator. Originally this was fairly straight forward to implement since array handles in DAX are templated on the DeviceAdapterTag. In contrast, VTKm array handles use a polymorphic ArrayHandleExecutionManager under the hood allowing a single array handle to interface with multiple devices at runtime. To achieve this virtual functions are used. This makes implementing the CopyInto function difficult since it is templated on the IteratorType and virtual functions cannot be templated.
To work around this, I've implemented a concrete templated CopyInto function in the class derived from ArrayHandleExecutionManagerBase. In the ArrayHandle class, CopyInto dynamically casts the base class into the derived class, then calls the CopyInto function defined in the derived class.
The drawback to this approach is that, should the user define their own class that inherits from ArrayHandleExectionManagerBase, they are not forced to implement the CopyInto function, unlike the other virtual functions.
fd685210 Always install all device headers even when device isn't enabled.
b1663b24 Add an example of using multiple backends from a single translation unit.
fc0ff69d Methods with try/catch need to be host only.
4d635d64 DeviceAdapter Tags now always exist, and contain if the device is valid.
cf32b430 Teach Configure.h to store if TBB and CUDA are enabled.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !198
When compiling with cuda and tbb enabled in a single translation unit you
need to make sure all try/catch blocks are marked as host only otherwise
the cuda compiler will error out.
Previously it was really hard to verify if a device adapter was valid. Since
you would have to check for the existence of the tag. Now the tag always
exists, but instead you query the traits of the DeviceAdapter to see if
it is a valid adapter.
This makes compiling with multiple backends alot easier.
The boost assert macros seem to have an issue where they define an
unused typedef. This is causing the XCode 7 compiler to issue a warning.
Since the offending code is in a macro, the warning is identified with
the VTK-m header even though the code is in boost. To get around this,
wrap all uses of the boost assert that is causing the warning in the
third party pre/post macros to disable the warning.
Modify ArrayHandleCounting so that it supports both a starting value and
a step (increment). This adds a multiplication, but the common case that
does not use it is already in a separate class (ArrayHandleIndex).
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.
Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan
Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
median = 0.0117049s
median abs dev = 0.00324614s
mean = 0.0167615s
std dev = 0.00786269s
min = 0.00845875s
max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
median = 0.0234463s
median abs dev = 0.000317249s
mean = 0.021452s
std dev = 0.00470307s
min = 0.011255s
max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
median = 0.0310486s
median abs dev = 0.000182129s
mean = 0.0286914s
std dev = 0.00634102s
min = 0.0116225s
max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
median = 0.0310617s
median abs dev = 0.000193583s
mean = 0.0295779s
std dev = 0.00491531s
min = 0.0147257s
max = 0.032307s
adding cell-to-point topology support and worklet
This adds code to support a cell-to-point topological mapping worklet.
For explicit cell set, there is code to calculate a cell-to-point topology from the canonical point-to-cell topology. (It is not parallelized at this point.) Most of the required code for structured grids was already in place.
See merge request !154
514ac54e Add custom operator and initial value support to ExclusiveScan
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !148