Synchronize the CUDA timer on both the start and end events
Previously, the timer for CUDA devices only called cudaEventSynchronize
at the end event when asking for the elapsed time. This, however, could
allow time to pass from when the timer was reset to when the start event
happened that was not recorded in the timer. This added synchronization
should make sure that all time spent in CUDA is recorded.
See merge request !291
DataSetBuilder fixes
There were some compiler warnings and testing issues with the
recently merged DataSetBuilder branch. This fixes some of
those problems.
See merge request !316
Reduce the number of conditions UnitTestArrayHandleCartesianProduct
tries so that it is faster. All the conditions are pretty similar, so it
should be OK to reduce some.
Previously UnitTestDataSetBuilderRectilinear and
UnitTestDataSetBuilderRegular did an exhaustive test of all possible
grid sizes within a certain amount of dimensions and with some number of
ways to build point arrays. This created hundreds of thousands of cases
that were checked, and it was running a long time. At best it wasted
time and at worst ctest reported the test as timed out.
It is not really necessary to perform an exhaustive test. The tests are
changed to do 10 trials using random values for dimensions and fill
methods.
f86382f0 Fix support for CoordinateSystems using ArrayHandleCartesianProduct.
d6a2a142 Add toleranced compare for values. Add tests for vtkm::Float32,Float64,Id typed arrays.
5d438353 Add toleranced comparisions for bounds validation. Also, add vtkm::Float32 and vtkm::Float64 to the testing for rectilinear and regular datasets.
b225ae97 Rectilinear coordinates (created with DataSetBuilderRectilinear) are now converted to vtkm::FloatDefault. This reduces the number of types to consider when casting inside CoordinateSystem, and was felt by all to be a reasonable restriction.
d755e43d Use ArrayHandleCompositeVector to represent separated point arrays for DataSetBuilderExplicit.h.
c7b0ffb8 Add tests for DataSetBuilderExplicit. Added cont/testing/ExplicitTestData.h which includes several explicit datasets. These datasets come from VTK data generated in VisIt. The new unit tests build datasets in several different ways and do some basic validation.
b4d04fff Add specialization of printSummary_ArrayHandle for UInt8. It prints them as characters, which are a little hard to understand to this computer scientist.
bd929c20 Fix compiler warnings.
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !262
f5f9939f Update all of vtkm to understand it can only identify as one compiler.
c706c826 Configure.h can only state a machine is a one compiler.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !307
It was previously using vtkm::cont::CellSetListTagCommon which would
break the ability for people to specify a custom value for
VTKM_DEFAULT_CELL_SET_LIST_TAG.
Add lots of checks to CUDA calls in the timer to try to identify any
problems that might be showing up on the dashboard.
Also adding some print statements around the sleep function in the
device adapter testing. For some reason the problem just went away with
them.
Previously, the timer for CUDA devices only called cudaEventSynchronize
at the end event when asking for the elapsed time. This, however, could
allow time to pass from when the timer was reset to when the start event
happened that was not recorded in the timer. This added synchronization
should make sure that all time spent in CUDA is recorded.
It used to be possible for vtk-m to say it was multiple compilers, for example
it could be both GCC and PGI, Clang and MSVC ( yes possible ), or Intel and
Clang.
This logical restructure now makes that impossible, and instead prefers a system
where we choose the most specialized version of the compiler over the most
general, where general is GCC / Clang.
b70a000a Allow resetting the Type and Storage of a DynamicArrayHandle in a single call.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !302
82a573f7 MarchingCubes is now able to not generate normals.
502e7c28 Merge branch 'cleanup_scatter_counting_uses' into marching_cubes_normal_generation_option
8079dc28 MarchingCubes generate step now requires a ScatterCounting object.
4cd2f582 Add a default constructor for ScatterCounting.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !300
vtkm::filter has the use case where we need to reset both the type and
storage of an array handle, by doing both at the same time we can reduce
the number of temporary objects, and invalid conversions of arrays.
WorkletMapPointToCell is a convenience subclass of WorkletMapTopology.
As such, it renames all the From/To signature tags to say Point/Cell to
be easier to read. However, the alias for FromCount was missing. Add the
alias PointCount.
Added cont/testing/ExplicitTestData.h which includes several explicit datasets. These datasets come from VTK data generated in VisIt. The new unit tests build datasets in several different ways and do some basic validation.
Add some new methods for DataSetFieldAdd class to improve usability.
c7756c78 TBB SortByKey works now when the key is a ArrayHandleZip.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !301
30ac46f2 We know the exact FieldOutCell types for MarchingCubes, so restrict it.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !299
By restricting the types explicitly in the ControlSignature we reduce
the code bloat, if we ever pass in a DynamicArrayHandle as one of those
parameters.
Without a default constructor for ScatterCounting any class that wants
to hold onto a ScatterCounting object is required to know what device
they are running on. By allowing default construction, we can move that
requirement to just have a method on the object require a device adapter
object.
Previously, the ArrayHandleCompositeVector had a separate implementation
of ArrayPortal for the control and execution environments. Because I was
lazy when I implemented it, the control version did not support Get.
Since originally implementing this class, VTK-m now allows defining
methods that are declared as working in both control and execution
environments (VTKM_EXEC_CONT_EXPORT) but only work in one or the other
depending on methods of templated subclasses they call. Thus, solve this
problem by simply removing the control version of the portal and use the
same portal for both.
1. Additional ASSERT calls to validate arguments in: DataSetBuilderRegular
2. Fix some untested compile errors in DataSetBuilderRectilinear
3. Added a new unit test, cont/testing/UnitTestDataSetBuilderRectilinear.cxx
4. Provided additional tests for UnitTestDataSetBuilderRegular.cxx.
The new tests in (4) were also included in (3), and provide a much more robust way of validating datasets created. It has nested for loops to do an all-all test on various ways to specify the X,Y, and Z coordinates. It computes the bounds on the coordinate system and make sure they are correct.
Note: The GetBounds() call for Rectilinear is not working, and is an item for future discussion. It is disabled for now.
Previously each device adapter only had a unique string name. This was
not the best when it came to developing data structures to track the status
of a given device at runtime.
This adds in a unique numeric identifier to each device adapter. This will
allow classes to easily create bitmasks / lookup tables for the validity of
devices.
a7127f0f Adding vtkm::cont::RuntimeDeviceInformation.
7d249e89 Move DeviceAdapterTraits into vtkm::cont as they are user API.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !287
The RuntimeDeviceInformation class allows developers to check if a given
device is supported on a machine at runtime. This allows developers to properly
check for CUDA support before running any worklets.
Older GCC compilers (earlier than 4.6) do not support the diagnostic
pragmas that push and pop the warning levels. This change just disables
the warnings we need to for third party libraries and leaves them off.
This means you might miss some legit warnings on older compilers, but
most developers use newer compilers anyway, and the push/pop should
still work there.
When writing multiple backend code users of vtkm need to use the
DeviceAdapterTraits classes, so therefore we should move them to vtkm::cont
to signify this.
WholeArray tag for ControlSignature
Add WholeArrayIn, WholeArrayInOut, and WholeArrayOut tags for ControlSignature. They tags behave similarly to using an ExecObject tag with an ExecutionWholeArray or ExecutionWholeArrayConst object. However, the WholeArray* tags can simplify some implementations in two ways. First, it allows you to specify more precisely what data is passed in. You have to pass in an ArrayHandle or else an error will occur (as opposed to be able to pass in any type of execution object). Second, this allows you to easily pass in arrays stored in DynamicArrayHandle objects. The Invoke mechanism will automatically find the appropriate static class. This cannot be done easily with ExecutionWholeArray.
See merge request !284
The two worklets for marching cubes use tables stored in arrays that
have random access. Previously, they arrays were passed using the
ExecObject tag in ControlSignature along with ExecutionWholeArrayConst.
This changes to using a WholeArrayIn tag and just passing the
ArrayHandle directly to the Invoke method. The end result is the same,
but the code is a bit cleaner.
The WholeArrayIn, WholeArrayInOut, and WholeArrayOut ControlSignature
tags behave similarly to using an ExecObject tag with an
ExecutionWholeArray or ExecutionWholeArrayConst object. However, the
WholeArray* tags can simplify some implementations in two ways. First,
it allows you to specify more precisely what data is passed in. You have
to pass in an ArrayHandle or else an error will occur (as opposed to be
able to pass in any type of execution object). Second, this allows you
to easily pass in arrays stored in DynamicArrayHandle objects. The
Invoke mechanism will automatically find the appropriate static class.
This cannot be done easily with ExecutionWholeArray.
We had a report that vtkm::Min/Max was significantly slower than other
products. This was traced back to the fact that these functions were not
completely inlining because they were calling fmin or fmax, and that
resulted in an actual C library call. It turns out using the templated
functions in the std namespace is faster.
This change has the VTK-m min/max functions use the std version in
almost all circumstances. The one exception (so far) is that fmin and
fmax are used for CUDA devices since the std functions are not declared
to run on the device and the nvcc compiler treats these functions
special.
2df501c7 Suppress failure to vectorize warnings from the Intel Compiler.
68e3032b Properly detect OSX Intel Compiler as Intel not Clang.
646b3ad7 Suppress notification about failure to vectorize on release builds.
05e8f592 Disable vectorization pragma's if we detect the compiler doesn't support them.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !282
When a C++ object is constructed, the fields (ivars) of that object are
initialized in the order they are declared in the structure regardless
of the order of initializers listed in the constructor. Thus, it is good
C++ convention to list the initializers of the constructor in the same
order they are declared in the class so that there is no confusion about
the order of initialization (which can matter if there are any
dependencies). To help enforce this convention, some compilers warn if
the order does not match. This commit fixes that issue.
This commit also removes trailing whitespace at the end of some lines in
StreamLineUniformGrid.h. My editor does this automatically because
trailing whitespace bugs some programmers.
4ceb111a Enable vectorization inside the Serial and TBB backends.
514ea09a Teach VTK-m how to enable vectorization for gcc, clang, and icc.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !275
Scatter in worklets
Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.
See merge request !221
I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.
I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
Array transforms can now be created with an inverse functor, allowing for
casts back into the native array type. As a result, array transforms with
both a functor and inverse functor defined can perform read and write
operations. As an example, ArrayHandleCast now supports this operation. The
original implementation of ArrayHandleCast (i.e. read only) has been renamed
'ArrayHandleCastForInput'.
Previously, there was a table holding the number of vertices produced
for each MC case. However, what we really need is the number of
triangles, so we would have to divide that by 3. Instead, just store the
number of triangles.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.
The previous implementation was declaring static arrays in methods,
which cannot be used on a CUDA device. Instead, make static tables that
can be passed to the device with array handles (much like clip tables
do).
This has been requested on the mailing list to make it easier to
interpolate integer vectors.
There are a couple of downsides to this addition. First, it implicitly
casts doubles back to whatever the vector type is, which can cause a
loss of precision. Second, it makes it more likely to get overload
errors when multiplying with Vec. In particular, the operator to cast
Vec of size 1 to the component class had to be removed.
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.
To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
It is the case that there are two ways to create the output to input map in
a count scatter. The first is to use a parallel find for every output index.
The second, which is used when there are lots of output, is to iterate over
the input and write out the reverse map. In this case, it is trivial to also
write out the visit indices, so do that instead of a bunch more searches.
Previously, each VecFromPortalPermute (the type that held the from field
values) held its own copy of the indices. For point to cell on
structured grids, this was a lot of repeated data values, which has the
potential to fill up cache and registers. Instead, just use pointer
references.
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.
Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
The original isosurface code was not treating the origin and spacing of
the point coordinates correctly. Instead, it was ignoring the origin and
spacing and instead scaling all point coordinates to be in the unit cube
in world space (except there was also an off-by-one error in that). This
change recompensates by adjusting the origin and spacing to make the
correct position where the geometry was previously errantly placed.
Now that ScatterCounting is implemented, we can use that to implement a
good part of the triangle generation in the isosurface algorithm. This
changes the worklet from a basic map to a topology map, which also
reduces a lot of code.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.
Also added the VisitIndex execution signature tag, which relies on this
behavior.
The parallel implementation in CellSetExplicit that builds cell-to-point
connectivity from point-to-cell connectivity uses a parallel sort-by-
key. The sort-by-key in the device adapter is not guaranteed to be
stable, so values associated with a particular key can be in any order.
The test for the result was expecting the connectivity array to be in a
particular order. Change the test to allow any connectivity ordering
that is still valid.