Also
- Renamed vtkm::cont::make_DeviceAdapterIdFromName to just overload
make_DeviceAdapterId.
- Refactored CMake logic for unit tests
- Since we're now querying the device tracker for the names, they
cannot be all caps.
- Updated usages of InitLogging to use Initialize instead.
- Added changelog.
VTK-m has been updated to replace old per device worklet testing executables with a device
dependent shared library so that it's able to accept a device adapter
at runtime.
Meanwhile, it updates the testing infrastructure APIs. vtkm::cont::testing::Run
function would call ForceDevice when needed and if users need the device
adapter info at runtime, RunOnDevice function would pass the adapter into the functor.
Optional Parser is bumped from 1.3 to 1.7.
By making RuntimeDeviceInformation class template independent, vtkm is
able to detect
device info at runtime with a runtime specified deviceId. In the past
it's impossible
because the CRTP pattern does not allow function overloading(compiler
would complain
that DeviceAdapterRuntimeDetector does not have Exists() function
defined).
Now that the dispatcher does its own TryExecute, filters do not need to
do that. This change requires all worklets called by filters to be able
to execute without knowing the device a priori.
Implement more general versions of `test_equal_ArrayHandles`, `test_equal_CellSets`, `test_equal_Fields`, and `test_equal_DataSets` functions and put them
in vtkm/cont/testing/Testing.hi with the hope that they will be useful for
others also.
This is a subclass of ExecutionObject and a superset of its
functionality. In addition to having a PrepareForExecution method, it
also has a PrepareForControl method that gets an object appropriate for
the control environment. This is helpful for situations where you need
code to work in both environments, such as the functor in an
ArrayHandleTransform.
Also added several runtime checks for execution objects and execution
and cotnrol objects.
This change allows you to set a subclass of
vtkm::cont::ExecutionObjectBase as a functor
used in ArrayHandleTransform. This latter class will then detect that
the functor is an ExecObject and will call PrepareForExecution with the
appropriate device to get the actual Functor object.
This change allows you to use virtual objects and other device dependent
objects as functors for ArrayHandleTransform without knowing a priori
what device the portal will be used on.
Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.
Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.
Adds a fancy array handle that restricts access to an array to some
window of values. It takes a start offset and a size and represents the
values between that start offset and size past that.
554bc3d36 At runtime TryExecute supports a specific deviceId to execute on.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1334
The OpenMP Device Reduction algorithm previously used a std::vector<T>
to store the reduction results of each thread. This caused problems
when T=bool as the types became a proxy type which isn't usable
with vtkm BinaryOperators.
Additionally by fixing this issue in the FunctorsOpenMP we
can remove a workaround in FunctorsGeneral that caused
compile failures when using complex BinaryOperators
such as MinAndMax.
std::random_shuffle is deprecated in C++14 because it's using std::rand
which uses a non uniform distribution and the underlying algorithm is
unspecified. Using std::shuffle can provide a reliable result in a 64
bit version.
Previously ArrayHandleBasicImpl had no support for OpenMP since
we forgot to update the implementation. This version will
work when adding new devices without any changes.
96ae94420 Simplified execution object creation for atomic array
0bd197af9 moved TwoLevelUniformGridExecutionObject to vtkm/exec/internal
6ce895be8 simplified how atomic arrays create execution objects
f1ee5b92a fix a rebase error
25d140361 fix bad rabse for wireframer
f892695f1 fixing so wierd merging issue
9bb00ec66 moved the execution object for TwoLevelUniform grid to vrkm::exec
db1c9bfee Change the namespacing of atomic array
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1243
Most of the arguments given to device adapter algorithms are actually
control-side arguments that get converted to execution objects internally
(usually a `vtkm::cont::ArrayHandle`). However, some of the algorithms,
take an argument that is passed directly to the execution environment, such
as the predicate argument of `Sort`. If the argument is a plain-old-data
(POD) type, which is common enough, then you can just pass the object
straight through. However, if the object has any special elements that have
to be transferred to the execution environment, such as internal arrays,
passing this to the `vtkm::cont::Algorithm` functions becomes
problematic.
To cover this use case, all the `vtkm::cont::Algorithm` functions now
support automatically transferring objects that support the `ExecObject`
worklet convention. If any argument to any of the `vtkm::cont::Algorithm`
functions inherits from `vtkm::cont::ExecutionObjectBase`, then the
`PrepareForExecution` method is called with the device the algorithm is
running on, which allows these device-specific objects to be used without
the hassle of creating a `TryExecute`.
1. Have a per-thread pinned array for cuda errors
2. Check for errors before scheduling new tasks and at explicit sync points
3. Remove explicit synchronizations from most places
Addresses part 2 of #168
There was an error in TestingPointLocatorUniformGrid in which it was
creating arrays of type vtkm::Float32 and passing them to a worklet that
expected vtkm::FloatDefault. This is corrected.
Previously when PointLocatorUniformGrid.h was compiled by the CUDA
compiler, the compiler would crash. Apparently during the ptxas
part of the compiler goes into a crazy recursion and runs out of
stack space. This appears to be a long-standing bug in CUDA
(been there for multiple releases) without a clear reason why it
sometimes rears its ugly head. (See for example
https://devtalk.nvidia.com/default/topic/1028825/cuda-programming-and-performance/-ptxas-died-with-status-0xc00000fd-stack_overflow-/)
The problem appears to be when having a doubly or triply nested
loop over a box of values to check in the uniform array. This
appears to fix the problem by converting that to a single for
loop with some index magic to convert that to 3D indices.
Found via `codespell` and `grep`
more typos
includes source typo change and a typo that needs further review
follow-up typos
Follow-up typos
Revert a commit
Previously, it was not possible to call Allocate or Shrink on an
implicit storage. The reason for this is that the implicit storage does
not represent any real memory and any attempt to modify it is wrong.
However, there are some rare cases where ArrayHandle will attempt to
"allocate" the storage even when behaving in a read-only manner. The use
case this is being created for is when an ArrayHandleImplicit first
calls ReleaseResources and then calls ReleaseResourcesExecution (or
anything else that tries to get a control-side portal). In this case,
the ReleaseResources makes the control side portal invalid and the
ReleaseResourcesExecution attempts to make it valid again by allocating
the storage to size 0. This change solves the problem by allowing the
implicit storage to be "allocated" to something smaller than originally
created.
157894436 Enable Vec construction with intializer_list of single value
79afe2a16 Simplify make_ArrayHandleSwizzle
ae8d994d2 Add support for initializer lists in Vec
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1239
- Adding API files
- Adding back Manish's BoundingIntervalHierarchy search structure
- Updating CMakeLists.txt to accomodate these changes
- Adding the old test file from Manish - won't build for now
-Changing the existing CellLocator.h to CellLocatorHelper.h,
it's used by CellLocatorTwoLevelUniformGrid.h
-Changing unit tests and worklets that use CellLocator.h to use CellLocatorHelper.h
183bcf109 Add initial version of an OpenMP backend.
7b5ad3e80 Expand device scheduler test to check for overlap.
e621b6ba3 Generalize the TBB radix sort implementation.
d60278434 Specialize swap for ArrayPortalValueReference types.
761f8986f Cache inputs to SSI::Unique benchmark.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1099
This fix addresses issue #110. Test functions in TestingDeviceAdapter have
been updated to test large arrays through random indices using
std::uniform_int_distribution rather than testing every 100th or 50th value.
1c5feeb1 Make sure all device specific tests use the intended device.
30205be8 Make sure PointLocatorUniformGrid always uses the provided device adapter
74df09fb Remove unneeded trailing ; from PointLocatorUniformGrid
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1209
c3c8d0fd Refactor ArrayHandleCompositeVector to simplify usage and impl.
ec88b7dd Markup array portals for use in the exec env.
3159b376 Make Swizzle and ExtractComponent array parameters runtime vars.
19344707 Add vtkm_taotuple to build system.
d23c4452 Merge branch 'upstream-taotuple' into tmp
b3b14de7 taotuple 2018-05-15 (e3de8c97)
b00f6c1c Add taocpp/tuple as a thirdparty package.
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1199
- Use tao::tuple instead of FunctionInterface to hold array/portal
collections.
- Type signatures are simplified. Now just use:
- ArrayHandleCompositeVector<ArrayT1, ArrayT2, ...>
- make_ArrayHandleCompositeVector(array1, array2, ...)
instead of relying on helper structs to determine types.
- No longer support component selection from an input array. All
input arrays must have the same ValueType (See ArrayHandleSwizzle
and ArrayHandleExtractComponent as the replacements for these
usecases.
This means that we not only setup the runtime device tracker
to force the intended device, it also means making sure
the default device is the error device.
The previous implementation of DeviceAdapterRuntimeDetector caused
multiple differing definitions of the same class to exist and
was causing the runtime device tracker to report CUDA as disabled
when it actually was enabled.
The ODR was caused by having a default implementation for
DeviceAdapterRuntimeDetector and a specific specialization for
CUDA. If a library had both CUDA and C++ sources it would pick up
both implementations and would have undefined behavior. In general
it would think the CUDA backend was disabled.
To avoid this kind of situation in the future I have reworked VTK-m
so that each device adapter must implement DeviceAdapterRuntimeDetector
for that device.
Modify PointLocatorUniform::Build to return an ExecutionObjec, Locator.
The Locator can the be passed to Worklets for finding neareast neighbor
point by calling the FindNeareastPoint method.
Somehow deducing the parameters to a static function was causing
gcc to segfault and crash. Changed around the code to use a free
function and everything works properly
Previously, when a Worklet needed a scatter, the scatter object was
stored in the Worklet object. That was problematic because that means
the Scatter, which is a control object, was shoved into the execution
environment.
To prevent that, move the Scatter into the Dispatcher object. The
worklet still declares a ScatterType alias, but no longer has a
GetScatter method. Instead, the Dispatcher now takes a Scatter object in
its constructor. If using the default scatter (ScatterIdentity), the
default constructor is used. If using another type of Scatter that
requires data to set up its state, then the caller of the worklet needs
to provide that to the dispatcher. For convenience, worklets are
encouraged to have a MakeScatter method to help construct a proper
scatter object.
`vtkm_unit_tests` now supports an MPI option that can be used to add
test that run with MPI. Adding `UnitTestFieldRangeGlobalCompute` to test
global ranges for fields.
Removing MultiBlock::GetGlobalRange API to keep things consistent with
DataSet API. Instead, one should use `FieldRangeCompute` or
`FieldRangeGlobalCompute` as appropriate.
Previously memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors
to support a Storage object that is being moved, we can clearly express that
the ArrayHandle now owns memory it didn't allocate.
Here is an example of how this is done:
```cpp
T* buffer = new T[100];
auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };
vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
storage(buffer, 100, user_free_function);
vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));
```
This fixes the three following issues with StorageBasic.
1. Memory that was allocated by VTK-m and Stolen by the user needed the
proper free function called which is generally StorageBasicAllocator::deallocate.
But that was hard for the user to hold onto. So now we provide a function
pointer to the correct free function.
2. Memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. This is now resolved by allowing the
user to specify a free function to be called on release.
3. When the CUDA backend allocates memory for an ArrayHandle that has no
control representation, and the location we are running on supports concurrent
managed access we want to specify that cuda managed memory as also the host memory.
This requires that StorageBasic be able to call an arbitrary new delete function
which is chosen at runtime.
Changed the "default" ColorTable preset from "cool to warm" to
"viridis." Also made a default constructor for ColorTable that sets it
to this default preset.
The main reason to change to viridis for the default is that it is in
LAB space. We are concerned that having the default ColorTable preset
being Diverging space could lead to users using that color space
inappropriately.
The problem is that there is no good "default" constructor for
ColorTable. The previous default constructor created an empty color
table, but that would be confusing if someone actually tried to use it.
We could set ot to the default preset, but the default preset uses the
diverging color map, which could foul people up if they actually want to
edit or create their own color map. Instead, force the declaration of
ColorTable to indicate what you plan to do with it.
The new and improved vtkm::cont::ColorTable provides a more feature complete
color table implementation that is modeled after
vtkDiscretizableColorTransferFunction. This class therefore supports different
color spaces ( rgb, lab, hsv, diverging ) and supports execution across all
device adapters.
DIY now depends on MPI optionally. Hence we no longer need to depend on
DIY optionally based on whether MPI was enabled. Update cmake and c++
code to always use DIY-based components.
DIY is built with MPI support if VTKm_ENABLE_MPI is ON.
By hard coding the PrepareForDevice to know about all the different VTK-m
devices, we can have a single base class do the execution allocation, and not
have that logic repeated in each child class.
When using vtkm::dot on narrow types you easily rollover the values.
Instead the result type of vtkm::dot should be wide enough to store the results
(32bits) when this occurs.
Fixes#193
1. Add option to copy user supplied array in make_ArrayHandle.
2. Replace Field constructors that take user supplied arrays with make_Field.
3. Replace CoordinateSystem constructors that take user supplied arrays with
make_CoordinateSystem.
Updating MultiBlock to use `diy` for computing block summaries like
ranges, bounds etc. This makes it possible to MultiBlock to
work in distributed operations without explicit logic.
Previously we allowed a const ref as we would make a copy, this only works
as it relies on RuntimeDeviceTracker implementing state through a shared_ptr.
Instead if we require modifiable types only we can make TryExecute more
efficient and clearer on what it does.
By using perfect forwarding we can reduce not only the amount of TryExecute
signatures, but we can enable the ability to pass temporary functors to
TryExecute.
At the same time we have optimized TryExecute by moving the string generation
code into a single function that is compiled into the vtkm_cont library.
The end result is that the vtkm_rendering library size has been reduced from
12MB to 11MB, and we shave off about 5% of our build time.
The implementation of ScanExclusiveByKey in
DeviceAdapterAlgorithmGeneral by shifting values in the input values
array and then calling ScanInclusiveByKey. However, the temporary
shifted values array was created using the key type instead of the
values type. This caused a compile error when the keys and values had
different types.
For std::copy to optimize a copy to memcpy, the valuetype must be both
trivially constructable and trivially copyable.
The new copy benchmarks highlighted an issue that std::copy'ing pairs
and vecs were not optimized to memcpy. For a 256 MiB buffer on my
laptop w/ GCC, the serial copy speeds were:
UInt8: 10.10 GiB/s
Vec<UInt8, 2> 3.12 GiB/s
Pair<UInt32, Float32> 6.92 GiB/s
After this patch, the optimization occurs and a bitwise copy occurs:
UInt8: 10.12 GiB/s
Vec<UInt8, 2> 9.66 GiB/s
Pair<UInt32, Float32> 9.88 GiB/s
Check were also added to the Vec and Pair unit tests to ensure that
this classes continue to be trivial.
The ArrayHandleSwizzle test was refactored a bit to eliminate a new
'possibly uninitialized memory' warning introduced with the default
Vec ctors.
In generic code, it's a pain to use the equality operators since they
requires the ValueType and Storage to match, else the operator is undefined.
This commit adds operators for such comparisons, as well as a unit test.
8fabece1 Use median point from cluster as representative vertex.
c7bf0c95 Compute PointIdMap while reducing cluster ids.
5dee7c6a Select input point from cluster rather than averaging.
28e76ddb Update vertex clustering benchmarking code.
e3c9e7bb Optimize cell map computation.
d7669650 Use requested grid in VertexClustering worklet.
0472dc11 Fix warning on Cuda.
3f4e17e2 Add field mapping to VertexClustering.
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !960
The idea of the test was to turn off the "default" storage to ensure
that the fancy array was not making assumptions about the storage of its
delegate array. But there is lots of code elsewhere that uses the
default storage (rightly so) to create intermediate arrays, which will
fail if you disable the default storage. This was causing a test to
fail, so turn default storage back on for this case.
This is a convenience method to do a deep copy of an array. This comes
up a lot, but can be a pain if you don't have a specific device adapter
on which to do the copy.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
Previously, ConvertNumComponentsToOffsets always used TryCompile on the
global set of runtime devices. That is still the default behavior, but
now you are able to specify your own runtime tracker. Also, there are
now versions of ConvertNumComponentsToOffsets that take a device adapter
tag.
Previously once an ArrayHandle was stolen it was placed in an invalid state
where it could not used again by VTK-m. Now instead after being stolen it
is placed into a state where it is identical to memory allocated outside
of VTK-m and passed in.
75517554 Move check for cell variables to it gets executed.
147247e8 Code formatting changes and compiler warning fixes.
a3fd135b Fix errors and warnings on Mac and Windows
347af497 Poly Data for External Faces
aeed7a07 Cell variables for External Faces
ad13e9b4 Merge branch 'master' into external-faces-production
ab25c160 External Faces Uniform and Rectilear grids
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !860