I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.
I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
Array transforms can now be created with an inverse functor, allowing for
casts back into the native array type. As a result, array transforms with
both a functor and inverse functor defined can perform read and write
operations. As an example, ArrayHandleCast now supports this operation. The
original implementation of ArrayHandleCast (i.e. read only) has been renamed
'ArrayHandleCastForInput'.
Previously, there was a table holding the number of vertices produced
for each MC case. However, what we really need is the number of
triangles, so we would have to divide that by 3. Instead, just store the
number of triangles.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.
The previous implementation was declaring static arrays in methods,
which cannot be used on a CUDA device. Instead, make static tables that
can be passed to the device with array handles (much like clip tables
do).
This has been requested on the mailing list to make it easier to
interpolate integer vectors.
There are a couple of downsides to this addition. First, it implicitly
casts doubles back to whatever the vector type is, which can cause a
loss of precision. Second, it makes it more likely to get overload
errors when multiplying with Vec. In particular, the operator to cast
Vec of size 1 to the component class had to be removed.
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.
To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
It is the case that there are two ways to create the output to input map in
a count scatter. The first is to use a parallel find for every output index.
The second, which is used when there are lots of output, is to iterate over
the input and write out the reverse map. In this case, it is trivial to also
write out the visit indices, so do that instead of a bunch more searches.
Previously, each VecFromPortalPermute (the type that held the from field
values) held its own copy of the indices. For point to cell on
structured grids, this was a lot of repeated data values, which has the
potential to fill up cache and registers. Instead, just use pointer
references.
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.
Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
The original isosurface code was not treating the origin and spacing of
the point coordinates correctly. Instead, it was ignoring the origin and
spacing and instead scaling all point coordinates to be in the unit cube
in world space (except there was also an off-by-one error in that). This
change recompensates by adjusting the origin and spacing to make the
correct position where the geometry was previously errantly placed.
Now that ScatterCounting is implemented, we can use that to implement a
good part of the triangle generation in the isosurface algorithm. This
changes the worklet from a basic map to a topology map, which also
reduces a lot of code.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.
Also added the VisitIndex execution signature tag, which relies on this
behavior.
The parallel implementation in CellSetExplicit that builds cell-to-point
connectivity from point-to-cell connectivity uses a parallel sort-by-
key. The sort-by-key in the device adapter is not guaranteed to be
stable, so values associated with a particular key can be in any order.
The test for the result was expecting the connectivity array to be in a
particular order. Change the test to allow any connectivity ordering
that is still valid.
A recent change to the DeviceAdapter header includes the TBB device if
available instead of the serial device. Thus, DeviceAdapterTagSerial was
not defined automatically in all cases for the build of
UnitTestDataSetPermutation. Add the header for that explicitly.
adding VTK file exporter and test cases
This adds a legacy VTK file exporter which supports unstructured, explicit, and point meshes. (Single Cell Type cell sets are also supported.)
See merge request !247
ca71d70b Update worklet UnitTests to not try statically known invalid combinations
6b2edb70 Update UnitTestDispatcherBase to use verify DynamicTransform error messages.
9fdc0f09 Improve the error message for Invoke type mismatch at compile time.
54d25fae Only perform DynamicTransformCont if at least one parameter is dynamic.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !249
8816642d Move algorithms out of DeviceAdapterAlgorithmGeneral to reduce compilation size
c78e54fa Move algorithms out of DeviceAdapterAlgorithmTBB to reduce compilation size.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !248
Previously we always ran DynamicTransformCont even if we knew all the types.
By checking for Dynamic types first, we save roughly 3% on the binary size.
This also is a good starting point for a redesign of DynamicTransformCont
39142d83 Add convenience tags like FieldInPoint, FieldInCell, to WorkletMapPointToCell
f34119b6 Clarify the name of worklet for point to cell operations.
2c767e10 Add WorkletMapTopologyBase to make Dispatcher templates easier to read.
05d397cb Remove unnecessary template parameters from DispatcherMapField
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !246
CUDA default constructors, destructors, and assignment operators
Several classes exclusively work in the control environment. However, CUDA likes to add __device__ to constructors, destructors, and assignment operators it automatically creates. This in turn causes warnings about the __device__ function using host-only classes (like boost::shared_ptr). Solve this problem by adding explicit methods for all of these.
See merge request !245
DispatcherMapField was templated on the device adapter but it
actually doesn't need to be, only BasicInvoke and subsequent
methods need to be templated on the device.
The DynamicArrayHandle and DynamicCellSet classes exclusively work in
the control environment. However, CUDA likes to add __device__ to
constructors, destructors, and assignment operators it automatically
adds. This in turn causes warnings about the __device__ function using
host-only classes (like boost::shared_ptr). Solve this problem by adding
explicit methods for all of these.
The CellSet classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.
Thanks to Tom Fogal from NVIDIA for these fixes.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
See merge request !233
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
The current CUDA compiler is crashing when using _Pragma in code blocks,
which were being used with clang on apple. Turn these off when compiling
with CUDA.
9a8809f9 Add CellSetPermutation which allows custom iteration over a cell set.
66f6db5a IsWriteableArrayHandle now can tell if an array handle can be written too
20f3fb50 Update VertexClustering to use vtkm::cont::CellSetSingleType.
154896b7 Extend the test for DataSetSingleType.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !228
When you create a CellSetPermutation you provide an array of the cell ids that
you want to iterate. This allows the user to do custom blanking of a data set,
or to do multi iteration over a set of cells.
8bc40880 Add a test for CellSetExplicit::GetIndices
fa81d1de CellSetExplicit always calls methods using "this->"
9e496306 Allow incremental construction of CellSetSingleType.
3e307879 Mark CellSetExplicit::Fill as host side only.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !227
Remove the const from the ValueType of the delegate portal in
ArrayPortalGroupVec. This was creating a Vec with a const type, which
was immutable, which was problematic when trying to create the Vec in
the first place.
The testing of ArrayHandleGroupVec was just using the == operator to
check values. Even though we are not doing any math, optimizers can
sometimes make float values slightly different anyway, so test_equal
should give the correct comparison.
Previously if you created a cell set explicit and didn't set the number of
points you would get a runtime error when you over-ran an array's bounds.
Now we account for this use case and properly generate the Cell To Point
Connectivity.
dbed8827 Correct issues found in code review.
82b977da CellSetExplicit::CreateConnectivity now works with implicit NumIndices.
30f5d628 ConnectivityExplicit will not generate IndexOffsets when they are implicit.
f04ea6d7 CellSetExplicit has userdefined offset storage.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Tested-by: buildbot <buildbot@kitware.com>
Merge-request: !212
There was an error in the test_equal comparison that would return true
when the second value was 0 (or close to 0) and the first value was not.
This was a bug I introduced with commit
b270438fea8a9fe4200322bfe6344ab2075c4d9a. I clearly misinterpreted how a
conditional worked.
Even when using implicit index's the ConnectivityExplicit would generate
the code to compute the IndexOffsets, which would than fail to compile as
the ArrayHandle would only support read operations. This fixes that issue.
19cebccf Correct issues that buildbot brought up in the code.
c6dbc0f2 GetNumberOfPointsInCell consistently returns a vtkm::IdComponent
25ff1e94 CellSetExplicit storage tags are now easier to override.
935b3fd6 CellSetExplicit uses UInt8 for shape, and IdComponent for numIndices.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !210
Add new version of DynamicArrayHandle::CastToArrayHandle
This takes a reference to an array handle and fills it. This removes a lot of the pain of determining template arguments.
See merge request !205
These changes are basically to support some upcoming changes to
ArrayHandleZip. The major additions are an implementation of VecTraits
for Pair and an overloaded << operator to ostream for Pair.
I also had to declare the operator<< for Pair to be in Types.h. Under
some circumstances the operator has to either be declared before the
template is declared or declared in the vtkm namespace. (The reasons are
described at <http://clang.llvm.org/compatibility.html#dep_lookup> but I
still don't understand.) I tried adding it to the vtkm namespace, but
that caused several of the other operator<< to fail. Since there is no
way to guarantee that Pair.h is declared before, say, ArrayHandle.h, I
moved the implementation to Types.h.
Since I was moving operator<< to Types.h, I went ahead and moved the
TypeTraits and VecTraits to their respective headers. Since Pair is
declared (but not implemented) in Types.h, these templated classes can
be implemented without including Pair.h.
98885186 Fix CopyInto tests that use different DeviceAdapterTag
69b2ad2a Add unit tests for CopyInto function
2c55b15c Add additional control logic for CopyInto function
20c1a048 CopyInto function for ArrayHandles
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !202
Xcode 7 warnings
The XCode 7 compiler has a new warning for unused typedefs. The Boost code we use has some instances where this warning gets issued. Suppress these warnings.
See merge request !199
This is to be used in place of BOOST_STATIC_ASSERT so that we can
control its implementation.
The implementation is designed to fix the issue where the latest XCode
clang compiler gives a warning about a unused typedefs when the boost
static assert is used within a function. (This warning also happens when
using the C++11 static_assert keyword.) You can suppress this warning
with _Pragma commands, but _Pragma commands inside a block is not
supported in GCC. The implementation of VTKM_STATIC_ASSERT handles all
current cases.
ArrayHandles in DAX have a CopyInto function which allows the user to copy an array handle's data into a compatible STL type iterator. Originally this was fairly straight forward to implement since array handles in DAX are templated on the DeviceAdapterTag. In contrast, VTKm array handles use a polymorphic ArrayHandleExecutionManager under the hood allowing a single array handle to interface with multiple devices at runtime. To achieve this virtual functions are used. This makes implementing the CopyInto function difficult since it is templated on the IteratorType and virtual functions cannot be templated.
To work around this, I've implemented a concrete templated CopyInto function in the class derived from ArrayHandleExecutionManagerBase. In the ArrayHandle class, CopyInto dynamically casts the base class into the derived class, then calls the CopyInto function defined in the derived class.
The drawback to this approach is that, should the user define their own class that inherits from ArrayHandleExectionManagerBase, they are not forced to implement the CopyInto function, unlike the other virtual functions.
5e72d3a8 Rename kernels directory to splatkernels to avoid confusion
7a2225cf Add Copyright text
d04e4dfa Remove c++11 constexpr keyword
ed5faf5b Fix for M_PI on windows, use vtkm::Pi()
fe284ffb Add unit test for splat kernel integral.
29001e37 Change GaussianSplatter to KernelSplatter to support other kernels
378cb17e Code cleanup, style, debug, unused vars
65d2980f Fix clang compile error, cleanup debug messages
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !193
fd685210 Always install all device headers even when device isn't enabled.
b1663b24 Add an example of using multiple backends from a single translation unit.
fc0ff69d Methods with try/catch need to be host only.
4d635d64 DeviceAdapter Tags now always exist, and contain if the device is valid.
cf32b430 Teach Configure.h to store if TBB and CUDA are enabled.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !198
The recently added pragma to suppress warnings about unused local
typedefs caused lots of dashboard failures because many GCC and clang
compiler do not have this warning so did not recognized the pragma to
suppress it. Now only use the pragma on clang compilers with a large
enough version.
I also discovered that the check for VTKM_CLANG was wrong (at least for
the most modern versions of XCode). Fixed that as well as some uses of
VTKM_CLANG that were wrong.
When compiling with cuda and tbb enabled in a single translation unit you
need to make sure all try/catch blocks are marked as host only otherwise
the cuda compiler will error out.