A recent change to the DeviceAdapter header includes the TBB device if
available instead of the serial device. Thus, DeviceAdapterTagSerial was
not defined automatically in all cases for the build of
UnitTestDataSetPermutation. Add the header for that explicitly.
adding VTK file exporter and test cases
This adds a legacy VTK file exporter which supports unstructured, explicit, and point meshes. (Single Cell Type cell sets are also supported.)
See merge request !247
ca71d70b Update worklet UnitTests to not try statically known invalid combinations
6b2edb70 Update UnitTestDispatcherBase to use verify DynamicTransform error messages.
9fdc0f09 Improve the error message for Invoke type mismatch at compile time.
54d25fae Only perform DynamicTransformCont if at least one parameter is dynamic.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !249
8816642d Move algorithms out of DeviceAdapterAlgorithmGeneral to reduce compilation size
c78e54fa Move algorithms out of DeviceAdapterAlgorithmTBB to reduce compilation size.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !248
Previously we always ran DynamicTransformCont even if we knew all the types.
By checking for Dynamic types first, we save roughly 3% on the binary size.
This also is a good starting point for a redesign of DynamicTransformCont
39142d83 Add convenience tags like FieldInPoint, FieldInCell, to WorkletMapPointToCell
f34119b6 Clarify the name of worklet for point to cell operations.
2c767e10 Add WorkletMapTopologyBase to make Dispatcher templates easier to read.
05d397cb Remove unnecessary template parameters from DispatcherMapField
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !246
CUDA default constructors, destructors, and assignment operators
Several classes exclusively work in the control environment. However, CUDA likes to add __device__ to constructors, destructors, and assignment operators it automatically creates. This in turn causes warnings about the __device__ function using host-only classes (like boost::shared_ptr). Solve this problem by adding explicit methods for all of these.
See merge request !245
DispatcherMapField was templated on the device adapter but it
actually doesn't need to be, only BasicInvoke and subsequent
methods need to be templated on the device.
The DynamicArrayHandle and DynamicCellSet classes exclusively work in
the control environment. However, CUDA likes to add __device__ to
constructors, destructors, and assignment operators it automatically
adds. This in turn causes warnings about the __device__ function using
host-only classes (like boost::shared_ptr). Solve this problem by adding
explicit methods for all of these.
The CellSet classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.
Thanks to Tom Fogal from NVIDIA for these fixes.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
See merge request !233
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
The current CUDA compiler is crashing when using _Pragma in code blocks,
which were being used with clang on apple. Turn these off when compiling
with CUDA.
9a8809f9 Add CellSetPermutation which allows custom iteration over a cell set.
66f6db5a IsWriteableArrayHandle now can tell if an array handle can be written too
20f3fb50 Update VertexClustering to use vtkm::cont::CellSetSingleType.
154896b7 Extend the test for DataSetSingleType.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !228
When you create a CellSetPermutation you provide an array of the cell ids that
you want to iterate. This allows the user to do custom blanking of a data set,
or to do multi iteration over a set of cells.
8bc40880 Add a test for CellSetExplicit::GetIndices
fa81d1de CellSetExplicit always calls methods using "this->"
9e496306 Allow incremental construction of CellSetSingleType.
3e307879 Mark CellSetExplicit::Fill as host side only.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !227
Remove the const from the ValueType of the delegate portal in
ArrayPortalGroupVec. This was creating a Vec with a const type, which
was immutable, which was problematic when trying to create the Vec in
the first place.
The testing of ArrayHandleGroupVec was just using the == operator to
check values. Even though we are not doing any math, optimizers can
sometimes make float values slightly different anyway, so test_equal
should give the correct comparison.
Previously if you created a cell set explicit and didn't set the number of
points you would get a runtime error when you over-ran an array's bounds.
Now we account for this use case and properly generate the Cell To Point
Connectivity.
dbed8827 Correct issues found in code review.
82b977da CellSetExplicit::CreateConnectivity now works with implicit NumIndices.
30f5d628 ConnectivityExplicit will not generate IndexOffsets when they are implicit.
f04ea6d7 CellSetExplicit has userdefined offset storage.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Tested-by: buildbot <buildbot@kitware.com>
Merge-request: !212
There was an error in the test_equal comparison that would return true
when the second value was 0 (or close to 0) and the first value was not.
This was a bug I introduced with commit
b270438fea8a9fe4200322bfe6344ab2075c4d9a. I clearly misinterpreted how a
conditional worked.
Even when using implicit index's the ConnectivityExplicit would generate
the code to compute the IndexOffsets, which would than fail to compile as
the ArrayHandle would only support read operations. This fixes that issue.
19cebccf Correct issues that buildbot brought up in the code.
c6dbc0f2 GetNumberOfPointsInCell consistently returns a vtkm::IdComponent
25ff1e94 CellSetExplicit storage tags are now easier to override.
935b3fd6 CellSetExplicit uses UInt8 for shape, and IdComponent for numIndices.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !210
Add new version of DynamicArrayHandle::CastToArrayHandle
This takes a reference to an array handle and fills it. This removes a lot of the pain of determining template arguments.
See merge request !205
These changes are basically to support some upcoming changes to
ArrayHandleZip. The major additions are an implementation of VecTraits
for Pair and an overloaded << operator to ostream for Pair.
I also had to declare the operator<< for Pair to be in Types.h. Under
some circumstances the operator has to either be declared before the
template is declared or declared in the vtkm namespace. (The reasons are
described at <http://clang.llvm.org/compatibility.html#dep_lookup> but I
still don't understand.) I tried adding it to the vtkm namespace, but
that caused several of the other operator<< to fail. Since there is no
way to guarantee that Pair.h is declared before, say, ArrayHandle.h, I
moved the implementation to Types.h.
Since I was moving operator<< to Types.h, I went ahead and moved the
TypeTraits and VecTraits to their respective headers. Since Pair is
declared (but not implemented) in Types.h, these templated classes can
be implemented without including Pair.h.
98885186 Fix CopyInto tests that use different DeviceAdapterTag
69b2ad2a Add unit tests for CopyInto function
2c55b15c Add additional control logic for CopyInto function
20c1a048 CopyInto function for ArrayHandles
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !202
Xcode 7 warnings
The XCode 7 compiler has a new warning for unused typedefs. The Boost code we use has some instances where this warning gets issued. Suppress these warnings.
See merge request !199
This is to be used in place of BOOST_STATIC_ASSERT so that we can
control its implementation.
The implementation is designed to fix the issue where the latest XCode
clang compiler gives a warning about a unused typedefs when the boost
static assert is used within a function. (This warning also happens when
using the C++11 static_assert keyword.) You can suppress this warning
with _Pragma commands, but _Pragma commands inside a block is not
supported in GCC. The implementation of VTKM_STATIC_ASSERT handles all
current cases.
ArrayHandles in DAX have a CopyInto function which allows the user to copy an array handle's data into a compatible STL type iterator. Originally this was fairly straight forward to implement since array handles in DAX are templated on the DeviceAdapterTag. In contrast, VTKm array handles use a polymorphic ArrayHandleExecutionManager under the hood allowing a single array handle to interface with multiple devices at runtime. To achieve this virtual functions are used. This makes implementing the CopyInto function difficult since it is templated on the IteratorType and virtual functions cannot be templated.
To work around this, I've implemented a concrete templated CopyInto function in the class derived from ArrayHandleExecutionManagerBase. In the ArrayHandle class, CopyInto dynamically casts the base class into the derived class, then calls the CopyInto function defined in the derived class.
The drawback to this approach is that, should the user define their own class that inherits from ArrayHandleExectionManagerBase, they are not forced to implement the CopyInto function, unlike the other virtual functions.
5e72d3a8 Rename kernels directory to splatkernels to avoid confusion
7a2225cf Add Copyright text
d04e4dfa Remove c++11 constexpr keyword
ed5faf5b Fix for M_PI on windows, use vtkm::Pi()
fe284ffb Add unit test for splat kernel integral.
29001e37 Change GaussianSplatter to KernelSplatter to support other kernels
378cb17e Code cleanup, style, debug, unused vars
65d2980f Fix clang compile error, cleanup debug messages
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !193
fd685210 Always install all device headers even when device isn't enabled.
b1663b24 Add an example of using multiple backends from a single translation unit.
fc0ff69d Methods with try/catch need to be host only.
4d635d64 DeviceAdapter Tags now always exist, and contain if the device is valid.
cf32b430 Teach Configure.h to store if TBB and CUDA are enabled.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !198
The recently added pragma to suppress warnings about unused local
typedefs caused lots of dashboard failures because many GCC and clang
compiler do not have this warning so did not recognized the pragma to
suppress it. Now only use the pragma on clang compilers with a large
enough version.
I also discovered that the check for VTKM_CLANG was wrong (at least for
the most modern versions of XCode). Fixed that as well as some uses of
VTKM_CLANG that were wrong.
When compiling with cuda and tbb enabled in a single translation unit you
need to make sure all try/catch blocks are marked as host only otherwise
the cuda compiler will error out.
Previously it was really hard to verify if a device adapter was valid. Since
you would have to check for the existence of the tag. Now the tag always
exists, but instead you query the traits of the DeviceAdapter to see if
it is a valid adapter.
This makes compiling with multiple backends alot easier.
The boost assert macros seem to have an issue where they define an
unused typedef. This is causing the XCode 7 compiler to issue a warning.
Since the offending code is in a macro, the warning is identified with
the VTK-m header even though the code is in boost. To get around this,
wrap all uses of the boost assert that is causing the warning in the
third party pre/post macros to disable the warning.
XCode 7 added a warning for unused typedefs. VTK-m proper does not have
any instances of this, but there are several cases in the Boost code
that we are using. Add an exception to the third party pragmas to
supporess this warning.
Performance testing has shown that the combined lower-bounds + update-connectivity
worklet is marginally, but consistently faster than DeviceAdapterAlgorithm::LowerBounds
followed by an update-connectivity worklet. It also requires less memory.
Add ArrayHandleIndex
Add an ```ArrayHandleIndex``` class that echoes the index. This is really a special version of ```ArrayHandleCounting``` that is of type ```vtkm::Id``` and starts at 0, but it is the most common case and does not need the extra addition.
Also modified ```ArrayHandleCounting``` to support a step, which makes the use a bit more powerful.
See merge request !192
Modify ArrayHandleCounting so that it supports both a starting value and
a step (increment). This adds a multiplication, but the common case that
does not use it is already in a separate class (ArrayHandleIndex).
Template the splatter algorithm over Kernel type and use abstract kernel
interface to fetch the kernel value.
Add Gaussian, Spline3rdOrder kernel classes templated over dimension.
Other kernels can/will be added in future.
Kernel classes are defined such that the integral of the volume is unity
as is the convention in (for example) SPH simulations.
Clean up the implementation to take a uniform grid dataset as input
and splat using the correct world coordinates of the points into
the volume. Kernels are centred around the splat points in world coordinates
(floating point) and not quantized to the nearest voxel.
Skip the generation of all volume points which is unnecessary (memory usage).
Add ability to specify a radius array with one entry per splat point so
that the size of each splat may be controlled arbitrarily.
form of an ArrayHandle. Histogram takes the number of bins and returns the bin array.
Statistics returns a structure with standard stats plus raw and central moments.
When ExecutionWholeArray is passed to a worklet, it is expected to
behave like an array portal. However, it was missing the
GetNumberOfValues method and the ValueType typedef. These are now added.
Update VTK-m to export to Configure.h if we have opengl interop enabled.
At the same time I rearranged the CMakeLists.txt so that the configuration
for Configure.h is later so that it catches all CMake Variables.
See merge request !184
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.
Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan
Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
median = 0.0117049s
median abs dev = 0.00324614s
mean = 0.0167615s
std dev = 0.00786269s
min = 0.00845875s
max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
median = 0.0234463s
median abs dev = 0.000317249s
mean = 0.021452s
std dev = 0.00470307s
min = 0.011255s
max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
median = 0.0310486s
median abs dev = 0.000182129s
mean = 0.0286914s
std dev = 0.00634102s
min = 0.0116225s
max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
median = 0.0310617s
median abs dev = 0.000193583s
mean = 0.0295779s
std dev = 0.00491531s
min = 0.0147257s
max = 0.032307s
Although this is labeled as a unit test, this is more of an integration
test to make sure that when a uniform point coordinate is passed as a
field to a WorkletTopologyMap from points to cells with a regular cell
set, you get the specialized VecRectilinearPointCoordinates in the
worklet argument.
In the special case where you are loading the point coordinates for a
structured grid in a point to cell map (an important use case), create a
VecRectilinearPointCoordinates rather than build a Vec of the values.
This will activate the cell specalizations in previous commits.
These changes also added some flat-to-logical index conversion and vice
versa in ConnectivityStructuredInternals. This change also fixed a bug
in getting cells attached to points in 2D grids. (Actually, technically
someone else fixed it and checked it in first. The changes were merged
during a rebase.)
I also added a specalization to Vec for 1D that implicitly converts
between the 1D Vec and the component. This can be convenient when
templating on the Vec length.
This class implicitly stores the point coordinates for a rectilinear
cell (such as a voxel) with just the location of the lower left point
and the spacing in each dimension. In addition to saving space, this
class should allow cell-specific functions to specialize for faster
processing.
adding cell-to-point topology support and worklet
This adds code to support a cell-to-point topological mapping worklet.
For explicit cell set, there is code to calculate a cell-to-point topology from the canonical point-to-cell topology. (It is not parallelized at this point.) Most of the required code for structured grids was already in place.
See merge request !154
fixing 2d structured connectivity calc for logical point indices.
`CalculateLogicalPointIndices` in `ConnectivityStructuredInternals.h` for 2D grids was incorrect -- it was using cell dimensions instead of point dimensions. This fixes the calculation and the unit test for the data set.
See merge request !153