By making RuntimeDeviceInformation class template independent, vtkm is
able to detect
device info at runtime with a runtime specified deviceId. In the past
it's impossible
because the CRTP pattern does not allow function overloading(compiler
would complain
that DeviceAdapterRuntimeDetector does not have Exists() function
defined).
169ca722b Redesign vtkm::cont::ColorTable to work with separable compilation.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !1445
In some functors within Algorithm.h, the functor did not have a
default constructor that initialized its fields. This could in turn
lead to a warning about using it unitialized.
Now that the dispatcher does its own TryExecute, filters do not need to
do that. This change requires all worklets called by filters to be able
to execute without knowing the device a priori.
Implement more general versions of `test_equal_ArrayHandles`, `test_equal_CellSets`, `test_equal_Fields`, and `test_equal_DataSets` functions and put them
in vtkm/cont/testing/Testing.hi with the hope that they will be useful for
others also.
This is a subclass of ExecutionObject and a superset of its
functionality. In addition to having a PrepareForExecution method, it
also has a PrepareForControl method that gets an object appropriate for
the control environment. This is helpful for situations where you need
code to work in both environments, such as the functor in an
ArrayHandleTransform.
Also added several runtime checks for execution objects and execution
and cotnrol objects.
This change allows you to set a subclass of
vtkm::cont::ExecutionObjectBase as a functor
used in ArrayHandleTransform. This latter class will then detect that
the functor is an ExecObject and will call PrepareForExecution with the
appropriate device to get the actual Functor object.
This change allows you to use virtual objects and other device dependent
objects as functors for ArrayHandleTransform without knowing a priori
what device the portal will be used on.
Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.
Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.
a8fa8d918 Use device id names where possible.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1393
Adds a fancy array handle that restricts access to an array to some
window of values. It takes a start offset and a size and represents the
values between that start offset and size past that.
59c8bd28a vtkm::cont::Algorithm now can be told which device to run on at runtime
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1365
e34301eca Allow disabling/enabling of CUDA managed memory via an env variable
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1359
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is
capable of doing so.
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of
times this happens by having a deferred free mechanism that frees a pool
of pointers together when a threshold is reached.
Especially helpful during virtual object transfers that requires a few small
allocations and frees.
2c079b96d Make AtomicArrays work on CUDA 8.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1357
Also add a throwFailedRuntimeDeviceTransfer that throws a nicely
detailed message on why a something couldn't be transfered to
the requested device adapter.
554bc3d36 At runtime TryExecute supports a specific deviceId to execute on.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1334
Fixes issue #276.
OpenMP tests when run in parallel exhibit negative scaling as we
have N openMP processes each spawning N threads. We speculate that
this causes excessive context switching and swapping and reduces
performance.
The OpenMP Device Reduction algorithm previously used a std::vector<T>
to store the reduction results of each thread. This caused problems
when T=bool as the types became a proxy type which isn't usable
with vtkm BinaryOperators.
Additionally by fixing this issue in the FunctorsOpenMP we
can remove a workaround in FunctorsGeneral that caused
compile failures when using complex BinaryOperators
such as MinAndMax.
std::random_shuffle is deprecated in C++14 because it's using std::rand
which uses a non uniform distribution and the underlying algorithm is
unspecified. Using std::shuffle can provide a reliable result in a 64
bit version.
It will reduce the cost of getting the thread runtime device tracker,
and will have a better runtime overhead if user constructs a lot of
short lived threads that use VTK-m.
Previously ArrayHandleBasicImpl had no support for OpenMP since
we forgot to update the implementation. This version will
work when adding new devices without any changes.
14824bd42 Make sure people always treat DeviceAdapterId as a proper type
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !1332
96ae94420 Simplified execution object creation for atomic array
0bd197af9 moved TwoLevelUniformGridExecutionObject to vtkm/exec/internal
6ce895be8 simplified how atomic arrays create execution objects
f1ee5b92a fix a rebase error
25d140361 fix bad rabse for wireframer
f892695f1 fixing so wierd merging issue
9bb00ec66 moved the execution object for TwoLevelUniform grid to vrkm::exec
db1c9bfee Change the namespacing of atomic array
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1243
Benchmarking in VTK showed significant overhead in the computation
of the reverse connectivity calculation in
ConnectivityExplicitInternals::ComputeCellToPointConnectivity.
This patch adds a ReverseConnectivityBuilder that reduces the amount of
time and memory needed to build the table by using an atomic histogram
approach that avoids a costly radix SortByKey.
Key operations in the new helper class are templated to allow this
approach to be reused by VTK-specific cell array converters.
There is no real reason why you cannot construct an
ArrayPortalFromIterators on a device, so go ahead and let that happen.
(This removes some CUDA warnings about calling __host__ from
__device__.)
Having VTKM_EXEC on algorithms for CPU devices was problematic because
the algorithms were specific to the CPU, but during a CUDA compile it
would try to compile device code (for no reasons since it was never
called on a device).
Remove these identifiers for the idea that a device implementation knows
specifically what function modifiers to use and does not need the VTK-m
defined catch-alls.
By making is_base_of part of PrepareArgForExec, we can shorten not only
the C++ code but also the code that is generated by it.
Also, return && instead of by value when passing through the argument.
Changes thanks to Robert Maynard.