MultiBlock now uses `diy::reduce` for reductions rather than using proxy
collectives. To support using `diy::reduce` operations on a
vtkm::cont::MultiBlock, added AssignerMultiBlock and
DecomposerMultiBlock classes. This are helper classes that provide DIY
concepts on top of a existing MultiBlock.
vtkm::Bounds and vtkm::Range now uses default copy-constructor and
assignment operator. That way `std::is_trivially_copyable` succeeds for
these basic types.
1. Add option to copy user supplied array in make_ArrayHandle.
2. Replace Field constructors that take user supplied arrays with make_Field.
3. Replace CoordinateSystem constructors that take user supplied arrays with
make_CoordinateSystem.
Updating MultiBlock to use `diy` for computing block summaries like
ranges, bounds etc. This makes it possible to MultiBlock to
work in distributed operations without explicit logic.
f9f205e9 ListCrossProduct now uses a special version for MSVC2013
c02349a8 ListCrossProduct now uses a lazy evaluation implementation
7b1b9e44 Correctly forward rvalue functors when passed to CastAndCall
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1018
The lazy version that was implemented to get the Intel compiler to work doesn't
work under MSVC2013. Since MSVC2013 is our least c++11 conforming compiler we
add a special MSVS2013 code path, since it will be the first compiler we drop.
The intel compiler could not generate code in a timely manner ( 12+ hours ) when
asked to produce a cross product of very long lists. By moving to a lazy
evaluation scheme we now have all compilers product a cross product in a
reasonable amount of time ( 2-4 seconds ).
This resolves Issues:
- https://gitlab.kitware.com/vtk/vtk-m/issues/190
- https://gitlab.kitware.com/vtk/vtk/issues/17196
If a global static array is declared with VTKM_EXEC_CONSTANT and the code
is compiled by nvcc (for multibackend code) then the array is only accesible
on the GPU. If for some reason a worklet fails on the cuda backend and it is
re-executed on any of the CPU backends, it will continue to fail.
We couldn't find a simple way to declare the array once and have it available
on both CPU and GPU. The approach we are using here is to declare the arrays
as static inside some "Get" function which is marked as VTKM_EXEC_CONT.
f6e18ac4 Remove IntegerSequence.h as we don't need it in vtk-m anymore
7f762204 Redesign the Dispatcher to not need FunctionInterface to convert dynamic types
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1010
203205a1 TryExecute RuntimeDeviceTracker can't be a const ref anymore.
dfb9cc62 Allow users to pass multiple arguments to TryExecute
5384305d Update tests and a single worklet to verify new CastAndCall works
2ff14a81 Allow users to pass multiple arguments to CastAndCall
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1005
Previously we allowed a const ref as we would make a copy, this only works
as it relies on RuntimeDeviceTracker implementing state through a shared_ptr.
Instead if we require modifiable types only we can make TryExecute more
efficient and clearer on what it does.
By using perfect forwarding we can reduce not only the amount of TryExecute
signatures, but we can enable the ability to pass temporary functors to
TryExecute.
At the same time we have optimized TryExecute by moving the string generation
code into a single function that is compiled into the vtkm_cont library.
The end result is that the vtkm_rendering library size has been reduced from
12MB to 11MB, and we shave off about 5% of our build time.
Rather than requiring all the arguments to be placed as member variables to
the functor you can now pass extra arguments that will be added to the functor
call signature.
So for example:
vtkm::ForEach(functor, vtkm::TypeListTagCommon(), double{42.0}, int{42});
will be converted into:
functor(vtkm::Int32, double, int)
functor(vtkm::Int64, double, int)
functor(vtkm::Float32, double, int)
functor(vtkm::Float64, double, int)
...
b3a7c697 MarchingCubes: Now is able to run on Uint8/Int8 types
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !995
The speed improvement fix regressed support for non scalar types, this
correct that issue.
The issue was found while trying to bump the VTK-m version inside VTK.
f6ead29c Removing a debug variable that was causing cuda to crash
565e69c5 Merge branch 'master' into visit_changes
81afcb62 further removing debug statements
a3bf1b26 Merge branch 'master' of https://gitlab.kitware.com/mclarsen/vtk-m into visit_changes
fdd5d1c8 removing debug statements
e60e330c Merge branch 'connectivity_tracer_additions' into visit_changes
2cb26b2d something
0bea9ce9 tracking path lengths if present
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !992
32148cdb A first pass at improving the compile time of MarchingCubes
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Matt Larsen <mlarsen@cs.uoregon.edu>
Merge-request: !989
f4739743 Fixing error with uninitialized variable
aeeb954f correcting typo
46f58dff fixing various warnings
14b7789f fixing ambiguous min max call
45003311 Merge remote-tracking branch 'upstream/master' into support_2D_and_1D_plots
73255080 removing debug statements
035814a4 adding 2d ortho support to ray tracing and updating WireFramer to support 2d lines and 1D line plots
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !985
5ada2812 Some fixes for CellLocatorTwoLevelUniformGrid
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !979
The implementation of the simplified version of
DeviceAdapterAlgorithmGeneral::Unique had two errors.
First, the implementation is such that it calls the more complex version
of Unique (which specifies a binary predicate to establish equality).
However, it was not calling the Unique method in the DerivedAlgorithm
like it should have been. Instead, it was calling its own Unique
algorithm, which might not be as efficient as the specialized Unique for
the device.
Second, it was using std::equal_to as its binary predicate. Using
functors from std can be dangerous because they are not marked with
VTKM_EXEC, so have the potential to not work in the execution
environment. Instead, use the readily available vtkm::Equal binary
predicate.
The implementation of ScanExclusiveByKey in
DeviceAdapterAlgorithmGeneral by shifting values in the input values
array and then calling ScanInclusiveByKey. However, the temporary
shifted values array was created using the key type instead of the
values type. This caused a compile error when the keys and values had
different types.
5a99dd76 Only use cuda hints for CUDA 8.0+.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !978
1. Fix incorrect computation of grid dimensions.
2. Add checks for empty bounding box of bins.
3. Workaround issues caused by floating point precision.
For std::copy to optimize a copy to memcpy, the valuetype must be both
trivially constructable and trivially copyable.
The new copy benchmarks highlighted an issue that std::copy'ing pairs
and vecs were not optimized to memcpy. For a 256 MiB buffer on my
laptop w/ GCC, the serial copy speeds were:
UInt8: 10.10 GiB/s
Vec<UInt8, 2> 3.12 GiB/s
Pair<UInt32, Float32> 6.92 GiB/s
After this patch, the optimization occurs and a bitwise copy occurs:
UInt8: 10.12 GiB/s
Vec<UInt8, 2> 9.66 GiB/s
Pair<UInt32, Float32> 9.88 GiB/s
Check were also added to the Vec and Pair unit tests to ensure that
this classes continue to be trivial.
The ArrayHandleSwizzle test was refactored a bit to eliminate a new
'possibly uninitialized memory' warning introduced with the default
Vec ctors.
In generic code, it's a pain to use the equality operators since they
requires the ValueType and Storage to match, else the operator is undefined.
This commit adds operators for such comparisons, as well as a unit test.
8fabece1 Use median point from cluster as representative vertex.
c7bf0c95 Compute PointIdMap while reducing cluster ids.
5dee7c6a Select input point from cluster rather than averaging.
28e76ddb Update vertex clustering benchmarking code.
e3c9e7bb Optimize cell map computation.
d7669650 Use requested grid in VertexClustering worklet.
0472dc11 Fix warning on Cuda.
3f4e17e2 Add field mapping to VertexClustering.
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !960
This is a bit counterintuitive, but choosing a random point from each
cluster rather than averaging them gives a better visual result. The
averages poorly represent an surface that runs through the grid block and
tends to bias the output points towards the center of each block, creating
very noticeable grid artifacts that look blocky.
This is to match the default behavior of vtkQuadricClustering. If we
want to add this functionality back, it should go into the filter as
an option that adjusts nDivisions before calling the worklet.
assert(false && ""); emitted a
"warning : controlling expression is constant"
Replace the assertion with an exception, which is more appropriate here
anyway.
Often times we don't care about the output keys, and it's useful to
be able to pass an ArrayHandleDiscard into the algorithm to save
memory in these cases.
Rob noticed a degridation in performance in some worklet tests when
ArrayCopy was added. I hypothesize that this slowdown is doing the array
copy with TBB instead of serial in the serial tests. (There have been
some checks in the existing code to suggest that some operations in TBB
can be slower than serial.) This change forces the array copy to be on
the device for which we are testing.
1147edb1 MarchingCubes now uses Gradient fast paths when possible.
d7d5da4f More changes to Neighborhood code to make it more easy to use.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !916
This ensures that the order of the values presented to the
WorkletReduceByKey functor is consistent.
After this change, the key array used to build the worklet::Keys object
is no longer modified. The sorted keys can be obtained by using permuting
the input keys with Keys::GetSortedValuesMap().
In a previous commit I made a version of make_ArrayHandleCast that
returned the same array if no cast was needed. That should shorten
template type names and make them easier to read. However, some
compilers were having trouble distinguishing between the two versions I
had created. This change uses an internal structure to make the
resolution easier.
The idea of the test was to turn off the "default" storage to ensure
that the fancy array was not making assumptions about the storage of its
delegate array. But there is lots of code elsewhere that uses the
default storage (rightly so) to create intermediate arrays, which will
fail if you disable the default storage. This was causing a test to
fail, so turn default storage back on for this case.
This is still more convenient than declaring DeviceAdapterAlgorithm just
to copy two arrays. Now the function works whether or not you know what
the device should be.
This is a convenience method to do a deep copy of an array. This comes
up a lot, but can be a pain if you don't have a specific device adapter
on which to do the copy.