Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.
Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.
a8fa8d918 Use device id names where possible.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1393
Adds a fancy array handle that restricts access to an array to some
window of values. It takes a start offset and a size and represents the
values between that start offset and size past that.
59c8bd28a vtkm::cont::Algorithm now can be told which device to run on at runtime
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1365
e34301eca Allow disabling/enabling of CUDA managed memory via an env variable
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1359
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is
capable of doing so.
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of
times this happens by having a deferred free mechanism that frees a pool
of pointers together when a threshold is reached.
Especially helpful during virtual object transfers that requires a few small
allocations and frees.
2c079b96d Make AtomicArrays work on CUDA 8.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1357
Also add a throwFailedRuntimeDeviceTransfer that throws a nicely
detailed message on why a something couldn't be transfered to
the requested device adapter.
554bc3d36 At runtime TryExecute supports a specific deviceId to execute on.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1334
Fixes issue #276.
OpenMP tests when run in parallel exhibit negative scaling as we
have N openMP processes each spawning N threads. We speculate that
this causes excessive context switching and swapping and reduces
performance.
The OpenMP Device Reduction algorithm previously used a std::vector<T>
to store the reduction results of each thread. This caused problems
when T=bool as the types became a proxy type which isn't usable
with vtkm BinaryOperators.
Additionally by fixing this issue in the FunctorsOpenMP we
can remove a workaround in FunctorsGeneral that caused
compile failures when using complex BinaryOperators
such as MinAndMax.
std::random_shuffle is deprecated in C++14 because it's using std::rand
which uses a non uniform distribution and the underlying algorithm is
unspecified. Using std::shuffle can provide a reliable result in a 64
bit version.
It will reduce the cost of getting the thread runtime device tracker,
and will have a better runtime overhead if user constructs a lot of
short lived threads that use VTK-m.
Previously ArrayHandleBasicImpl had no support for OpenMP since
we forgot to update the implementation. This version will
work when adding new devices without any changes.
14824bd42 Make sure people always treat DeviceAdapterId as a proper type
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !1332