1. Add a 'FastVec' class that copies input vector types to an efficient
Vec type on the stack. Specializations avoid copies of already efficient
types.
2. Update 'WorldCoordinatesToParametricCoordinates' functions to utilize the
'FastVec' class. This should improve performance when the passed in
vectors are of slow types like 'vtkm::VecFromPortalPermute'.
3. Since most input Vec types will convert to the same 'FastVec' type this
also reduces the code generations. Some code refactoring was required for
this.
22ea5833 iVTK-m CUDA backend doesn't use thrust::cuda::pointer any more.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1076
When the implementation for the cell derivative functions was made, the
special cases for creating the Jacobian for wedge and pyramid cell
shapes was not working. Instead, it just used the hexahedra case for a
degenerate cell. This fixes the issues with the special cases.
There were multiple issues to be fixed. There were some complaints
by the compiler about types. There was a mistake in the pyramid table.
But the biggest issue was a problem with macro expansions. It was
the classic tale of forgetting that you have to encase parameters
in parenthesis to make sure that operator precedence comes out as
expected.
Created split implementation. Parallel radix
sort calls moved to vtkm_cont library.
Added key value radix sorts. SortByKey will invoke
radix sort when the key is a fundamental C++ numeric
or character type.
Added fast path for vtkm::SortLess and vtkm::SortGreater
calls to Sort and SortByKey.
By using WrappedBinaryOperator we will not get warnings on vs2017 when
scanning <32bit arrays, and at the same time also properly support
fancy arrays.
c6565cde correcting compile error with CanvasGL
c57fad48 Depth no considered with annotations. Fix for volume renderer. Consistent color blending.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1068
b420a4b3 fixing typo
2bb64e65 fixing opacity for veritical bar
80081f2a adding finer grain control over color bar and scalar field label
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1063
Cell dimension for structured data is computed by subtracting Point dimensions
by vtkm::Id3(1). This fix prevents a dimension component from being less than
1 for 2D and 1D cases.
bdb9c37e update based on issues pointed out by Robert
a713a0d8 Generalize and documentation for DeviceAdapterAlgorithm::Transform
29232c49 Revert un-intended change to examples
7ef956a9 Merge branch 'master' into connected_component
a9ed1ecf add CMakeLists.txt for header files
ba3cba64 update copyright statements
aa96874e Merge branch 'connected_component' of gitlab.kitware.com:ollielo/vtk-m into connected_component
2f07119e Merge branch 'master' into connected_component
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1044
Generalize DeviceAdapterAlgorithm::Transform to accept input array of different value and storage type.
Add doxygen documentation in DeviceAdapterAlgorithm.h
The ScalarsToColors is the first step in reproducing the color
capability that exists in VTK. The next step after this is to
provide a comparable lookup table.
Visual Studio default toolset uses 32bit executables for compiling which
means that if it uses more than 4GB of memory per instance it crashes.
By moving the ConnectivityTracer into a separate compilation unit we
can help out the compiler.
This also improved compilation times and library size:
Old:
build time (j8): 48.62 real
lib size: 6.2MB
New:
build time (j8): 41.31 real
lib size: 5.0MB
Using std::tolower with std::string actually can cause undefined behavior,
and only MSVC 2017 looks to complain about it. You can find more
information on the exact reason at: http://en.cppreference.com/w/cpp/string/byte/tolower
70fcd1d1 Update CoordinateSystem to use the Virtual Array
950b12b1 Add ArrayHandleVirtualCoordinates
a4d0b57b Make ForEachValidDevice internal
59dc78fd Add ErrorBadDevice
4810cef7 Field::SetData should accept any ArrayHandle type
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1021
Parallel radix sorting will be invoked in DeviceAdapterAlgorthmTBB.h when
the input is ArrayHandle<T, vtkm::cont::StorageTagBasic> where T is one of
the following basic C++ types:
unsigned int
unsigned short int
unsigned long int
unsigned long long int
unsigned char
char16_t
char32_t
wchar_t
char
short
int
long long
signed char
float
double
If a comparison operator is provided, it must be type std::less<T> or std::greater<T>.
Radix sort implementation is Satish parallel radix sort as documented in the
following citation:
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey.
In Proc. SIGMOD, pages 351–362, 2010
Implementation is based on Takuya Akiba's GitHub source code with the following
changes:
- Changed parallel threading from OpenMP to TBB tasks
- Removed pair sorting
- Added minimum threshold for parallel, will instead invoke serial radix sort (kxsort)
- Added std::greater<T> and std::less<T> to interface for descending order sorts
- Added can_use_parallel_radix_sort<T, F>() function to determine if parallel radix sorting
is possible for type T and compare function F (fallback is std::sort() if not possible)
- Added linear scaling of threads used by the algorithm for more stable performance
on machines with lots of available threads (KNL and Haswell)
Added kxsort (serial MSD radix sort by Dinghua Li via GitHub) implementation without modification.
* Add FindDeviceAdapterTagAndCall
* Add support for multiple arguments to be passed to the functor in
'ForEachValidDevice' and 'FindDeviceAdapterTagAndCall'.
a9e64c4b FloatPointReturnType is float if 'T' is < 32bytes instead of being double.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Acked-by: Thomas Otahal <tjotaha@sandia.gov>
Merge-request: !1048
In trying to give error diagnostics with template definitions of invalid
types, the user encounters some pretty confusing error messages at
first. There is no way to get the compiler to give exactly the
diagnostics we want in a nice readable error message, so we are putting
some verbose instructions as comments in the code. However, a user might
not know to look at the source code since the error happens deep in an
unfamiliar (and complicated) class. Thus, add (yet another) error at the
front that gives a (hopefully) clear indication to look at the source
code for help in understanding the errors.
When one of the parameters to DispatcherBase::Invoke is incorrect, you
get an error in a strange place. It is deep in a call stack using some
heavily templated types rather than where the Invoke is actually called.
Formerly, the code was structured to give a very obfuscated error
message. Try to make this easier on users by first adding helpful hints
in the code around where the error is to explain why the error occured
and helpful advice on how to track down the problem. Second, reconfigure
the static_assert so that the compiler gives a readable error message.
Third, add some auxiliary error messages that provide additional
information about the types and classes involved in the error.
There were some recent changes to supress some CUDA errors in (among
many other places) some Math.h functions. These changes were done to
Math.h, but they also need to be reflected in Math.h.in so we can
continue to mainting the automatically generated functions.
Previously FloatPointReturnType would always be double for types that
are not float, which caused Int8/Int16 types to promote to double instead
of float.
When using vtkm::dot on narrow types you easily rollover the values.
Instead the result type of vtkm::dot should be wide enough to store the results
(32bits) when this occurs.
Fixes#193
MultiBlock now uses `diy::reduce` for reductions rather than using proxy
collectives. To support using `diy::reduce` operations on a
vtkm::cont::MultiBlock, added AssignerMultiBlock and
DecomposerMultiBlock classes. This are helper classes that provide DIY
concepts on top of a existing MultiBlock.
vtkm::Bounds and vtkm::Range now uses default copy-constructor and
assignment operator. That way `std::is_trivially_copyable` succeeds for
these basic types.
1. Add option to copy user supplied array in make_ArrayHandle.
2. Replace Field constructors that take user supplied arrays with make_Field.
3. Replace CoordinateSystem constructors that take user supplied arrays with
make_CoordinateSystem.
Updating MultiBlock to use `diy` for computing block summaries like
ranges, bounds etc. This makes it possible to MultiBlock to
work in distributed operations without explicit logic.
f9f205e9 ListCrossProduct now uses a special version for MSVC2013
c02349a8 ListCrossProduct now uses a lazy evaluation implementation
7b1b9e44 Correctly forward rvalue functors when passed to CastAndCall
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1018
The lazy version that was implemented to get the Intel compiler to work doesn't
work under MSVC2013. Since MSVC2013 is our least c++11 conforming compiler we
add a special MSVS2013 code path, since it will be the first compiler we drop.
The intel compiler could not generate code in a timely manner ( 12+ hours ) when
asked to produce a cross product of very long lists. By moving to a lazy
evaluation scheme we now have all compilers product a cross product in a
reasonable amount of time ( 2-4 seconds ).
This resolves Issues:
- https://gitlab.kitware.com/vtk/vtk-m/issues/190
- https://gitlab.kitware.com/vtk/vtk/issues/17196
If a global static array is declared with VTKM_EXEC_CONSTANT and the code
is compiled by nvcc (for multibackend code) then the array is only accesible
on the GPU. If for some reason a worklet fails on the cuda backend and it is
re-executed on any of the CPU backends, it will continue to fail.
We couldn't find a simple way to declare the array once and have it available
on both CPU and GPU. The approach we are using here is to declare the arrays
as static inside some "Get" function which is marked as VTKM_EXEC_CONT.
f6e18ac4 Remove IntegerSequence.h as we don't need it in vtk-m anymore
7f762204 Redesign the Dispatcher to not need FunctionInterface to convert dynamic types
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1010
203205a1 TryExecute RuntimeDeviceTracker can't be a const ref anymore.
dfb9cc62 Allow users to pass multiple arguments to TryExecute
5384305d Update tests and a single worklet to verify new CastAndCall works
2ff14a81 Allow users to pass multiple arguments to CastAndCall
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1005