Created split implementation. Parallel radix
sort calls moved to vtkm_cont library.
Added key value radix sorts. SortByKey will invoke
radix sort when the key is a fundamental C++ numeric
or character type.
Added fast path for vtkm::SortLess and vtkm::SortGreater
calls to Sort and SortByKey.
By using WrappedBinaryOperator we will not get warnings on vs2017 when
scanning <32bit arrays, and at the same time also properly support
fancy arrays.
Cell dimension for structured data is computed by subtracting Point dimensions
by vtkm::Id3(1). This fix prevents a dimension component from being less than
1 for 2D and 1D cases.
bdb9c37e update based on issues pointed out by Robert
a713a0d8 Generalize and documentation for DeviceAdapterAlgorithm::Transform
29232c49 Revert un-intended change to examples
7ef956a9 Merge branch 'master' into connected_component
a9ed1ecf add CMakeLists.txt for header files
ba3cba64 update copyright statements
aa96874e Merge branch 'connected_component' of gitlab.kitware.com:ollielo/vtk-m into connected_component
2f07119e Merge branch 'master' into connected_component
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1044
Generalize DeviceAdapterAlgorithm::Transform to accept input array of different value and storage type.
Add doxygen documentation in DeviceAdapterAlgorithm.h
Parallel radix sorting will be invoked in DeviceAdapterAlgorthmTBB.h when
the input is ArrayHandle<T, vtkm::cont::StorageTagBasic> where T is one of
the following basic C++ types:
unsigned int
unsigned short int
unsigned long int
unsigned long long int
unsigned char
char16_t
char32_t
wchar_t
char
short
int
long long
signed char
float
double
If a comparison operator is provided, it must be type std::less<T> or std::greater<T>.
Radix sort implementation is Satish parallel radix sort as documented in the
following citation:
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey.
In Proc. SIGMOD, pages 351–362, 2010
Implementation is based on Takuya Akiba's GitHub source code with the following
changes:
- Changed parallel threading from OpenMP to TBB tasks
- Removed pair sorting
- Added minimum threshold for parallel, will instead invoke serial radix sort (kxsort)
- Added std::greater<T> and std::less<T> to interface for descending order sorts
- Added can_use_parallel_radix_sort<T, F>() function to determine if parallel radix sorting
is possible for type T and compare function F (fallback is std::sort() if not possible)
- Added linear scaling of threads used by the algorithm for more stable performance
on machines with lots of available threads (KNL and Haswell)
Added kxsort (serial MSD radix sort by Dinghua Li via GitHub) implementation without modification.
* Add FindDeviceAdapterTagAndCall
* Add support for multiple arguments to be passed to the functor in
'ForEachValidDevice' and 'FindDeviceAdapterTagAndCall'.
When using vtkm::dot on narrow types you easily rollover the values.
Instead the result type of vtkm::dot should be wide enough to store the results
(32bits) when this occurs.
Fixes#193
MultiBlock now uses `diy::reduce` for reductions rather than using proxy
collectives. To support using `diy::reduce` operations on a
vtkm::cont::MultiBlock, added AssignerMultiBlock and
DecomposerMultiBlock classes. This are helper classes that provide DIY
concepts on top of a existing MultiBlock.
1. Add option to copy user supplied array in make_ArrayHandle.
2. Replace Field constructors that take user supplied arrays with make_Field.
3. Replace CoordinateSystem constructors that take user supplied arrays with
make_CoordinateSystem.
Updating MultiBlock to use `diy` for computing block summaries like
ranges, bounds etc. This makes it possible to MultiBlock to
work in distributed operations without explicit logic.
f9f205e9 ListCrossProduct now uses a special version for MSVC2013
c02349a8 ListCrossProduct now uses a lazy evaluation implementation
7b1b9e44 Correctly forward rvalue functors when passed to CastAndCall
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1018
The intel compiler could not generate code in a timely manner ( 12+ hours ) when
asked to produce a cross product of very long lists. By moving to a lazy
evaluation scheme we now have all compilers product a cross product in a
reasonable amount of time ( 2-4 seconds ).
This resolves Issues:
- https://gitlab.kitware.com/vtk/vtk-m/issues/190
- https://gitlab.kitware.com/vtk/vtk/issues/17196
f6e18ac4 Remove IntegerSequence.h as we don't need it in vtk-m anymore
7f762204 Redesign the Dispatcher to not need FunctionInterface to convert dynamic types
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1010
Previously we allowed a const ref as we would make a copy, this only works
as it relies on RuntimeDeviceTracker implementing state through a shared_ptr.
Instead if we require modifiable types only we can make TryExecute more
efficient and clearer on what it does.
By using perfect forwarding we can reduce not only the amount of TryExecute
signatures, but we can enable the ability to pass temporary functors to
TryExecute.
At the same time we have optimized TryExecute by moving the string generation
code into a single function that is compiled into the vtkm_cont library.
The end result is that the vtkm_rendering library size has been reduced from
12MB to 11MB, and we shave off about 5% of our build time.
5ada2812 Some fixes for CellLocatorTwoLevelUniformGrid
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !979
The implementation of the simplified version of
DeviceAdapterAlgorithmGeneral::Unique had two errors.
First, the implementation is such that it calls the more complex version
of Unique (which specifies a binary predicate to establish equality).
However, it was not calling the Unique method in the DerivedAlgorithm
like it should have been. Instead, it was calling its own Unique
algorithm, which might not be as efficient as the specialized Unique for
the device.
Second, it was using std::equal_to as its binary predicate. Using
functors from std can be dangerous because they are not marked with
VTKM_EXEC, so have the potential to not work in the execution
environment. Instead, use the readily available vtkm::Equal binary
predicate.
The implementation of ScanExclusiveByKey in
DeviceAdapterAlgorithmGeneral by shifting values in the input values
array and then calling ScanInclusiveByKey. However, the temporary
shifted values array was created using the key type instead of the
values type. This caused a compile error when the keys and values had
different types.
1. Fix incorrect computation of grid dimensions.
2. Add checks for empty bounding box of bins.
3. Workaround issues caused by floating point precision.
For std::copy to optimize a copy to memcpy, the valuetype must be both
trivially constructable and trivially copyable.
The new copy benchmarks highlighted an issue that std::copy'ing pairs
and vecs were not optimized to memcpy. For a 256 MiB buffer on my
laptop w/ GCC, the serial copy speeds were:
UInt8: 10.10 GiB/s
Vec<UInt8, 2> 3.12 GiB/s
Pair<UInt32, Float32> 6.92 GiB/s
After this patch, the optimization occurs and a bitwise copy occurs:
UInt8: 10.12 GiB/s
Vec<UInt8, 2> 9.66 GiB/s
Pair<UInt32, Float32> 9.88 GiB/s
Check were also added to the Vec and Pair unit tests to ensure that
this classes continue to be trivial.
The ArrayHandleSwizzle test was refactored a bit to eliminate a new
'possibly uninitialized memory' warning introduced with the default
Vec ctors.
In generic code, it's a pain to use the equality operators since they
requires the ValueType and Storage to match, else the operator is undefined.
This commit adds operators for such comparisons, as well as a unit test.
8fabece1 Use median point from cluster as representative vertex.
c7bf0c95 Compute PointIdMap while reducing cluster ids.
5dee7c6a Select input point from cluster rather than averaging.
28e76ddb Update vertex clustering benchmarking code.
e3c9e7bb Optimize cell map computation.
d7669650 Use requested grid in VertexClustering worklet.
0472dc11 Fix warning on Cuda.
3f4e17e2 Add field mapping to VertexClustering.
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !960
assert(false && ""); emitted a
"warning : controlling expression is constant"
Replace the assertion with an exception, which is more appropriate here
anyway.
Often times we don't care about the output keys, and it's useful to
be able to pass an ArrayHandleDiscard into the algorithm to save
memory in these cases.
In a previous commit I made a version of make_ArrayHandleCast that
returned the same array if no cast was needed. That should shorten
template type names and make them easier to read. However, some
compilers were having trouble distinguishing between the two versions I
had created. This change uses an internal structure to make the
resolution easier.
The idea of the test was to turn off the "default" storage to ensure
that the fancy array was not making assumptions about the storage of its
delegate array. But there is lots of code elsewhere that uses the
default storage (rightly so) to create intermediate arrays, which will
fail if you disable the default storage. This was causing a test to
fail, so turn default storage back on for this case.
This is still more convenient than declaring DeviceAdapterAlgorithm just
to copy two arrays. Now the function works whether or not you know what
the device should be.
This is a convenience method to do a deep copy of an array. This comes
up a lot, but can be a pain if you don't have a specific device adapter
on which to do the copy.
There are still some warnings left:
* Some text in markdown files are incorrectly picked up as
doxygen commands
* ArrayPortalTransform weirdly inherits from a specialized
version of itself. It's technically correct C++ code, but
gives doxygen fits.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
3b03177c Add TBB specialization of Unique.
94d668dd Add serial version of Unique.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !933
TBB's ReduceByKey was using the generic DeviceAdapterGeneral
implementation and was about 50x slower than the serial implementation,
which is very efficient.
This patch improves TBB's RBK implementation significantly, though it still
does not scale well. On a quad core processor, this implementation performs
comparably or slightly worse than the highly efficient serial algorithm.
More than 4 cores may be needed to see sufficient parallel speedup that
would overcome the TBB overhead, and grain size does not seem to affect the
performance significantly.
Previously, ConvertNumComponentsToOffsets always used TryCompile on the
global set of runtime devices. That is still the default behavior, but
now you are able to specify your own runtime tracker. Also, there are
now versions of ConvertNumComponentsToOffsets that take a device adapter
tag.
Previously once an ArrayHandle was stolen it was placed in an invalid state
where it could not used again by VTK-m. Now instead after being stolen it
is placed into a state where it is identical to memory allocated outside
of VTK-m and passed in.
By using the auto keyword and decltype we can reduce the number of
complex typedefs that exist when writing device adapter algorithms.
The goal being that it is easier for developers to see the actual
algorithms being implemented, by reducing the amount of template
'noise'.
Previously ArrayHandleReverse would only work if it was provided an explicit
users array to map too. But this doesn't need to be so, if a user wants to
start by constructing an ArrayHandleReverse we should allow that.
The side effect of this, is that some very tricky code in the DeviceAdapters
can be removed, that explicitly was added to allow output to ArrayHandleReverse
75517554 Move check for cell variables to it gets executed.
147247e8 Code formatting changes and compiler warning fixes.
a3fd135b Fix errors and warnings on Mac and Windows
347af497 Poly Data for External Faces
aeed7a07 Cell variables for External Faces
ad13e9b4 Merge branch 'master' into external-faces-production
ab25c160 External Faces Uniform and Rectilear grids
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !860