59c8bd28a vtkm::cont::Algorithm now can be told which device to run on at runtime
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1365
By facilitating brigand library, we can throw a meaningful compile time
error if the placeholder in execution signature exceeds the bound in
control signature.
e34301eca Allow disabling/enabling of CUDA managed memory via an env variable
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1359
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is
capable of doing so.
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of
times this happens by having a deferred free mechanism that frees a pool
of pointers together when a threshold is reached.
Especially helpful during virtual object transfers that requires a few small
allocations and frees.
52549c85d Warning fixes to UnitTestLagrangianFilter.cxx
a99da2dff Added a Unit test for Lagrangian filter
3364d1c4e Lagrangian filter, example
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1351
The added files provide support for Lagrangian analysis of velocity fields of time-varying data. Examples show how to use the filter to generate data and a second example demonstrates consuming generated information to calculate new particle trajectories.
2c079b96d Make AtomicArrays work on CUDA 8.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1357
Also add a throwFailedRuntimeDeviceTransfer that throws a nicely
detailed message on why a something couldn't be transfered to
the requested device adapter.
554bc3d36 At runtime TryExecute supports a specific deviceId to execute on.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1334
Fixes issue #276.
OpenMP tests when run in parallel exhibit negative scaling as we
have N openMP processes each spawning N threads. We speculate that
this causes excessive context switching and swapping and reduces
performance.
The OpenMP Device Reduction algorithm previously used a std::vector<T>
to store the reduction results of each thread. This caused problems
when T=bool as the types became a proxy type which isn't usable
with vtkm BinaryOperators.
Additionally by fixing this issue in the FunctorsOpenMP we
can remove a workaround in FunctorsGeneral that caused
compile failures when using complex BinaryOperators
such as MinAndMax.
std::random_shuffle is deprecated in C++14 because it's using std::rand
which uses a non uniform distribution and the underlying algorithm is
unspecified. Using std::shuffle can provide a reliable result in a 64
bit version.
It will reduce the cost of getting the thread runtime device tracker,
and will have a better runtime overhead if user constructs a lot of
short lived threads that use VTK-m.
Previously ArrayHandleBasicImpl had no support for OpenMP since
we forgot to update the implementation. This version will
work when adding new devices without any changes.
14824bd42 Make sure people always treat DeviceAdapterId as a proper type
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !1332
2c6ca906f Enable building the worklets when OpenMP is enabled.
d595abf90 WrappedBinaryOperator now supports std::vector<bool>::reference
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1327
96ae94420 Simplified execution object creation for atomic array
0bd197af9 moved TwoLevelUniformGridExecutionObject to vtkm/exec/internal
6ce895be8 simplified how atomic arrays create execution objects
f1ee5b92a fix a rebase error
25d140361 fix bad rabse for wireframer
f892695f1 fixing so wierd merging issue
9bb00ec66 moved the execution object for TwoLevelUniform grid to vrkm::exec
db1c9bfee Change the namespacing of atomic array
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1243
The oscillator is a simple analytical source of time-varying data.
It provides a function value at each point that is computed as a
sum of Gaussian kernels -- each with a specified position, amplitude,
frequency, and phase.
Benchmarking in VTK showed significant overhead in the computation
of the reverse connectivity calculation in
ConnectivityExplicitInternals::ComputeCellToPointConnectivity.
This patch adds a ReverseConnectivityBuilder that reduces the amount of
time and memory needed to build the table by using an atomic histogram
approach that avoids a costly radix SortByKey.
Key operations in the new helper class are templated to allow this
approach to be reused by VTK-specific cell array converters.
There is no real reason why you cannot construct an
ArrayPortalFromIterators on a device, so go ahead and let that happen.
(This removes some CUDA warnings about calling __host__ from
__device__.)
Having VTKM_EXEC on algorithms for CPU devices was problematic because
the algorithms were specific to the CPU, but during a CUDA compile it
would try to compile device code (for no reasons since it was never
called on a device).
Remove these identifiers for the idea that a device implementation knows
specifically what function modifiers to use and does not need the VTK-m
defined catch-alls.
By making is_base_of part of PrepareArgForExec, we can shorten not only
the C++ code but also the code that is generated by it.
Also, return && instead of by value when passing through the argument.
Changes thanks to Robert Maynard.
This commit adds a worklet as well as a filter that modify point coordinates by moving points
along point normals by the scalar amount times the scalar factor.
It's a simpified version of the vtkWarpScalar class in VTK. Additionally the filter doesn't
modify the point coordinates, but creates a new point coordinates that have been warped.
This commit adds a worklet as well as a filter that modify point coordinates by moving points
along a vector by a certain scalar amount.
It's a simpified version of the vtkWarpVector in VTK.
It doesn't modify the point coordinates, but creates a new point coordinates that have been warped.
The original design of invoke and the transport infrastructure
relied on the implementation behavior of vtkm::cont types
such as ArrayHandle that used an internal shared_ptr to managed
state. This allowed passing by value instead of passing by
non-const ref when needing to transfer information to the device.
As VTK-m adds support for classes that use virtuals the ability
to pass by base pointer type allows for us to invoke worklets
using a base type without the risk of type slicing.
Additional by moving over to a non-const ref Invocation we
can update all transports that have 'output' to now be
by ref and therefore support types that can't be copied while
being 'more' correct.
The invocation parameters need to be non const as we want to
be able to call non-const methods like `PrepareForOutput` on them
from a transport function.
The original implementation abused the fact that everything
could be copied by value and have that work properly. But
when we start introducing virtual classes copying by value of
a base type can cause type slicing.
Most of the arguments given to device adapter algorithms are actually
control-side arguments that get converted to execution objects internally
(usually a `vtkm::cont::ArrayHandle`). However, some of the algorithms,
take an argument that is passed directly to the execution environment, such
as the predicate argument of `Sort`. If the argument is a plain-old-data
(POD) type, which is common enough, then you can just pass the object
straight through. However, if the object has any special elements that have
to be transferred to the execution environment, such as internal arrays,
passing this to the `vtkm::cont::Algorithm` functions becomes
problematic.
To cover this use case, all the `vtkm::cont::Algorithm` functions now
support automatically transferring objects that support the `ExecObject`
worklet convention. If any argument to any of the `vtkm::cont::Algorithm`
functions inherits from `vtkm::cont::ExecutionObjectBase`, then the
`PrepareForExecution` method is called with the device the algorithm is
running on, which allows these device-specific objects to be used without
the hassle of creating a `TryExecute`.
1. Have a per-thread pinned array for cuda errors
2. Check for errors before scheduling new tasks and at explicit sync points
3. Remove explicit synchronizations from most places
Addresses part 2 of #168
b47b1f9ae Allow NDHistogram to take custom type
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !1299
There was an error in TestingPointLocatorUniformGrid in which it was
creating arrays of type vtkm::Float32 and passing them to a worklet that
expected vtkm::FloatDefault. This is corrected.
-For the BoundingIntervalHierarchy CUDA had failures with using
.cxx file to implement the virtual methods
-Moving the contents to the .hxx file after discussing with Rob
over email
-Need to still work on the .cxx implementation after merge
Error: Throwing an exception in CUDA code.
Fix: Change method throwing exception to VTKM_CONT.
New warning: host/device warning in taotuple.
Fix: Markup additional taotuple methods with suppressions.
This also updates our taotuple checkout to match upstream master.
80b12e325 Merge branch 'coordSysFilter' of https://gitlab.kitware.com/dpugmire/vtk-m into coordSysFilter
ab5eeab18 Fixes for making the filters non-templated.
27dade145 Fixes for coordinate systems w/ help from Sujin.
db5ded3a6 Add files for coord sys transform filters.
17087a26a Filter for coordinate system transform.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1284
- Reducing the stack allocation for CUDA for the BIH unit test
- Adding changes from Ken's review
- Suppress ptxas stack size warning for BoundingIntervalHierarchy
The vtkm::exec::PointLocatorUniformGrid has a default constructor. It
was "helpfully" declared as VTKM_EXEC_CONT, but apparently that is the
wrong thing to do for constructors that are set to default.
Previously when PointLocatorUniformGrid.h was compiled by the CUDA
compiler, the compiler would crash. Apparently during the ptxas
part of the compiler goes into a crazy recursion and runs out of
stack space. This appears to be a long-standing bug in CUDA
(been there for multiple releases) without a clear reason why it
sometimes rears its ugly head. (See for example
https://devtalk.nvidia.com/default/topic/1028825/cuda-programming-and-performance/-ptxas-died-with-status-0xc00000fd-stack_overflow-/)
The problem appears to be when having a doubly or triply nested
loop over a box of values to check in the uniform array. This
appears to fix the problem by converting that to a single for
loop with some index magic to convert that to 3D indices.
- Changing the name PrepareForExecutionOnDevice to PrepareForExecutionImpl
- Adding changes suggested by Ollie and Ken to return the execution object
from PrepareForExecutionImpl using VirtualObjectHandle
- Updating PrepareForExecutionFunctor
Previously, to query whether an ArrayHandle was writable with
IsWriteableArrayHandle, you had to specify a device adapter. The idea
was that it would look at the portal used for that device adapter.
Instead, check the control pointer, which should give the same
indication without having to have a separate check for every type of
device.
947496550 constexpr construction for Vec classes
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1254
880d8a989 Add `vtkm/Geometry.h` and test it.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1262
Vec class objects can now be constructed during compile-time
as constant expressions by calling Vec( T, ... ) constructors
or through brace-initialization.
Constant expression using fill constructor and nested vectors
of sizes greater than 4 are not supported yet.
Changes made to WrappedOperators.h for resolving overload
ambiguities in Vec construction and typecasting.
Appropriate test cases were added to UnitTestTypes.cxx.
Addresses issue #199.
This commit adds several geometric constructs to vtk-m
in the `vtkm/Geometry.h` header. They may be used from
both the execution and control environments.
We also add methods to perform projection and Gram-Schmidt
orthonormalization to `vtkm/VectorAnalysis.h`.
See `docs/changelog/geometry.md` included in this commit
for more information.
Found via `codespell` and `grep`
more typos
includes source typo change and a typo that needs further review
follow-up typos
Follow-up typos
Revert a commit