a7127f0f Adding vtkm::cont::RuntimeDeviceInformation.
7d249e89 Move DeviceAdapterTraits into vtkm::cont as they are user API.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !287
The RuntimeDeviceInformation class allows developers to check if a given
device is supported on a machine at runtime. This allows developers to properly
check for CUDA support before running any worklets.
Older GCC compilers (earlier than 4.6) do not support the diagnostic
pragmas that push and pop the warning levels. This change just disables
the warnings we need to for third party libraries and leaves them off.
This means you might miss some legit warnings on older compilers, but
most developers use newer compilers anyway, and the push/pop should
still work there.
When writing multiple backend code users of vtkm need to use the
DeviceAdapterTraits classes, so therefore we should move them to vtkm::cont
to signify this.
WholeArray tag for ControlSignature
Add WholeArrayIn, WholeArrayInOut, and WholeArrayOut tags for ControlSignature. They tags behave similarly to using an ExecObject tag with an ExecutionWholeArray or ExecutionWholeArrayConst object. However, the WholeArray* tags can simplify some implementations in two ways. First, it allows you to specify more precisely what data is passed in. You have to pass in an ArrayHandle or else an error will occur (as opposed to be able to pass in any type of execution object). Second, this allows you to easily pass in arrays stored in DynamicArrayHandle objects. The Invoke mechanism will automatically find the appropriate static class. This cannot be done easily with ExecutionWholeArray.
See merge request !284
The two worklets for marching cubes use tables stored in arrays that
have random access. Previously, they arrays were passed using the
ExecObject tag in ControlSignature along with ExecutionWholeArrayConst.
This changes to using a WholeArrayIn tag and just passing the
ArrayHandle directly to the Invoke method. The end result is the same,
but the code is a bit cleaner.
The WholeArrayIn, WholeArrayInOut, and WholeArrayOut ControlSignature
tags behave similarly to using an ExecObject tag with an
ExecutionWholeArray or ExecutionWholeArrayConst object. However, the
WholeArray* tags can simplify some implementations in two ways. First,
it allows you to specify more precisely what data is passed in. You have
to pass in an ArrayHandle or else an error will occur (as opposed to be
able to pass in any type of execution object). Second, this allows you
to easily pass in arrays stored in DynamicArrayHandle objects. The
Invoke mechanism will automatically find the appropriate static class.
This cannot be done easily with ExecutionWholeArray.
We had a report that vtkm::Min/Max was significantly slower than other
products. This was traced back to the fact that these functions were not
completely inlining because they were calling fmin or fmax, and that
resulted in an actual C library call. It turns out using the templated
functions in the std namespace is faster.
This change has the VTK-m min/max functions use the std version in
almost all circumstances. The one exception (so far) is that fmin and
fmax are used for CUDA devices since the std functions are not declared
to run on the device and the nvcc compiler treats these functions
special.
2df501c7 Suppress failure to vectorize warnings from the Intel Compiler.
68e3032b Properly detect OSX Intel Compiler as Intel not Clang.
646b3ad7 Suppress notification about failure to vectorize on release builds.
05e8f592 Disable vectorization pragma's if we detect the compiler doesn't support them.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !282
When a C++ object is constructed, the fields (ivars) of that object are
initialized in the order they are declared in the structure regardless
of the order of initializers listed in the constructor. Thus, it is good
C++ convention to list the initializers of the constructor in the same
order they are declared in the class so that there is no confusion about
the order of initialization (which can matter if there are any
dependencies). To help enforce this convention, some compilers warn if
the order does not match. This commit fixes that issue.
This commit also removes trailing whitespace at the end of some lines in
StreamLineUniformGrid.h. My editor does this automatically because
trailing whitespace bugs some programmers.
4ceb111a Enable vectorization inside the Serial and TBB backends.
514ea09a Teach VTK-m how to enable vectorization for gcc, clang, and icc.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !275
Scatter in worklets
Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.
See merge request !221
I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.
I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
Array transforms can now be created with an inverse functor, allowing for
casts back into the native array type. As a result, array transforms with
both a functor and inverse functor defined can perform read and write
operations. As an example, ArrayHandleCast now supports this operation. The
original implementation of ArrayHandleCast (i.e. read only) has been renamed
'ArrayHandleCastForInput'.
Previously, there was a table holding the number of vertices produced
for each MC case. However, what we really need is the number of
triangles, so we would have to divide that by 3. Instead, just store the
number of triangles.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.
The previous implementation was declaring static arrays in methods,
which cannot be used on a CUDA device. Instead, make static tables that
can be passed to the device with array handles (much like clip tables
do).
This has been requested on the mailing list to make it easier to
interpolate integer vectors.
There are a couple of downsides to this addition. First, it implicitly
casts doubles back to whatever the vector type is, which can cause a
loss of precision. Second, it makes it more likely to get overload
errors when multiplying with Vec. In particular, the operator to cast
Vec of size 1 to the component class had to be removed.
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.
To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
It is the case that there are two ways to create the output to input map in
a count scatter. The first is to use a parallel find for every output index.
The second, which is used when there are lots of output, is to iterate over
the input and write out the reverse map. In this case, it is trivial to also
write out the visit indices, so do that instead of a bunch more searches.
Previously, each VecFromPortalPermute (the type that held the from field
values) held its own copy of the indices. For point to cell on
structured grids, this was a lot of repeated data values, which has the
potential to fill up cache and registers. Instead, just use pointer
references.
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.
Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
The original isosurface code was not treating the origin and spacing of
the point coordinates correctly. Instead, it was ignoring the origin and
spacing and instead scaling all point coordinates to be in the unit cube
in world space (except there was also an off-by-one error in that). This
change recompensates by adjusting the origin and spacing to make the
correct position where the geometry was previously errantly placed.
Now that ScatterCounting is implemented, we can use that to implement a
good part of the triangle generation in the isosurface algorithm. This
changes the worklet from a basic map to a topology map, which also
reduces a lot of code.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.
Also added the VisitIndex execution signature tag, which relies on this
behavior.
The parallel implementation in CellSetExplicit that builds cell-to-point
connectivity from point-to-cell connectivity uses a parallel sort-by-
key. The sort-by-key in the device adapter is not guaranteed to be
stable, so values associated with a particular key can be in any order.
The test for the result was expecting the connectivity array to be in a
particular order. Change the test to allow any connectivity ordering
that is still valid.
A recent change to the DeviceAdapter header includes the TBB device if
available instead of the serial device. Thus, DeviceAdapterTagSerial was
not defined automatically in all cases for the build of
UnitTestDataSetPermutation. Add the header for that explicitly.
adding VTK file exporter and test cases
This adds a legacy VTK file exporter which supports unstructured, explicit, and point meshes. (Single Cell Type cell sets are also supported.)
See merge request !247
ca71d70b Update worklet UnitTests to not try statically known invalid combinations
6b2edb70 Update UnitTestDispatcherBase to use verify DynamicTransform error messages.
9fdc0f09 Improve the error message for Invoke type mismatch at compile time.
54d25fae Only perform DynamicTransformCont if at least one parameter is dynamic.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !249
8816642d Move algorithms out of DeviceAdapterAlgorithmGeneral to reduce compilation size
c78e54fa Move algorithms out of DeviceAdapterAlgorithmTBB to reduce compilation size.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !248
Previously we always ran DynamicTransformCont even if we knew all the types.
By checking for Dynamic types first, we save roughly 3% on the binary size.
This also is a good starting point for a redesign of DynamicTransformCont
39142d83 Add convenience tags like FieldInPoint, FieldInCell, to WorkletMapPointToCell
f34119b6 Clarify the name of worklet for point to cell operations.
2c767e10 Add WorkletMapTopologyBase to make Dispatcher templates easier to read.
05d397cb Remove unnecessary template parameters from DispatcherMapField
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !246
CUDA default constructors, destructors, and assignment operators
Several classes exclusively work in the control environment. However, CUDA likes to add __device__ to constructors, destructors, and assignment operators it automatically creates. This in turn causes warnings about the __device__ function using host-only classes (like boost::shared_ptr). Solve this problem by adding explicit methods for all of these.
See merge request !245
DispatcherMapField was templated on the device adapter but it
actually doesn't need to be, only BasicInvoke and subsequent
methods need to be templated on the device.
The DynamicArrayHandle and DynamicCellSet classes exclusively work in
the control environment. However, CUDA likes to add __device__ to
constructors, destructors, and assignment operators it automatically
adds. This in turn causes warnings about the __device__ function using
host-only classes (like boost::shared_ptr). Solve this problem by adding
explicit methods for all of these.
The CellSet classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.
Thanks to Tom Fogal from NVIDIA for these fixes.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
See merge request !233
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).