This commit adds several geometric constructs to vtk-m
in the `vtkm/Geometry.h` header. They may be used from
both the execution and control environments.
We also add methods to perform projection and Gram-Schmidt
orthonormalization to `vtkm/VectorAnalysis.h`.
See `docs/changelog/geometry.md` included in this commit
for more information.
When copying small arrays from cpu memory to pascal memory we would
see subsequent kernels fail as the memory transfer hadn't finished.
This is a bug as each stream should act like a FIFO queue. So
for now when encountering this use case we explicitly synchronize
after the memcpy.
This means that we not only setup the runtime device tracker
to force the intended device, it also means making sure
the default device is the error device.
The previous implementation of DeviceAdapterRuntimeDetector caused
multiple differing definitions of the same class to exist and
was causing the runtime device tracker to report CUDA as disabled
when it actually was enabled.
The ODR was caused by having a default implementation for
DeviceAdapterRuntimeDetector and a specific specialization for
CUDA. If a library had both CUDA and C++ sources it would pick up
both implementations and would have undefined behavior. In general
it would think the CUDA backend was disabled.
To avoid this kind of situation in the future I have reworked VTK-m
so that each device adapter must implement DeviceAdapterRuntimeDetector
for that device.
This is done to not only reduce the amount of code that users need
to generate but to reduce the amount of errors when using
the RuntimeDeviceTracker. If the runtime device tracker is initially
used in a library by a c++ file it will never properly detect the
cuda backend. By moving the code into vtkm_cont we can make sure
this problem doesn't occur.
The usage of uint was causing problems with CUDA + MSVC2015 as
type was not defined. Instead we use vtkm::Id as that was the expect
type to be passed to the task
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
Previously memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors
to support a Storage object that is being moved, we can clearly express that
the ArrayHandle now owns memory it didn't allocate.
Here is an example of how this is done:
```cpp
T* buffer = new T[100];
auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };
vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
storage(buffer, 100, user_free_function);
vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));
```
This fixes the three following issues with StorageBasic.
1. Memory that was allocated by VTK-m and Stolen by the user needed the
proper free function called which is generally StorageBasicAllocator::deallocate.
But that was hard for the user to hold onto. So now we provide a function
pointer to the correct free function.
2. Memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. This is now resolved by allowing the
user to specify a free function to be called on release.
3. When the CUDA backend allocates memory for an ArrayHandle that has no
control representation, and the location we are running on supports concurrent
managed access we want to specify that cuda managed memory as also the host memory.
This requires that StorageBasic be able to call an arbitrary new delete function
which is chosen at runtime.
The new and improved vtkm::cont::ColorTable provides a more feature complete
color table implementation that is modeled after
vtkDiscretizableColorTransferFunction. This class therefore supports different
color spaces ( rgb, lab, hsv, diverging ) and supports execution across all
device adapters.
By hard coding the PrepareForDevice to know about all the different VTK-m
devices, we can have a single base class do the execution allocation, and not
have that logic repeated in each child class.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
By using the auto keyword and decltype we can reduce the number of
complex typedefs that exist when writing device adapter algorithms.
The goal being that it is easier for developers to see the actual
algorithms being implemented, by reducing the amount of template
'noise'.
These two algorithms does not return meaningful return values. Generic interface and
implementation are both void. Remove erronous return type and statement for CUDA backend.
The parameter sweeping code is only enabled when tuning for new GPU's
so we should move it to a separate header to make DeviceAdapterAlgorithmThrust
easier to read.
The old templated array transfer mechanism generated a lot of code
that ended up doing a simple, type-agnostic memcpy for most devices.
This patch specialized array handles for basic storage and uses a
fast-path array transfer implementation. This reduces the size of the
vtkm_cont library by 27% on gcc (from 6.2MB to 4.5MB).
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.
Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB
WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB
libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
Previously WorkletInvokeFunctor inherited from vtkm::exec::FunctorBase,
which is also the base class for all users Worklets and for all functors
based to DeviceAdapter::Schedule.
This is done for a few reasons. The first is that we reduce the
minimum size of user worklets. Previously the users worklet would hold
a reference to the error message, and so would the wrapper class added
when calling DeviceAdapter::Schedule. Now we only have the users worklet
holding a reference.
Second, by refactoring to have two base classes we can better improve
the documentation on what responsibilities FunctorBase.h has, compared
to TaskBase.
ec6589d3 Only enable -fPIC on component static libraries when necessary.
cbfe5fdd Fix up various issues with ArrayHandles in vtkm_cont.
355eea88 Get the vtkm cont cuda object to compile properly.
6ecc22bb First pass at compiling ArrayHandle into vtkm_cont.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !715
5c735a38 this should resovle all the type conversion warnings
d8b02329 Merge branch 'scanbykey' of gitlab.kitware.com:ollielo/vtk-m into scanbykey
58ef7c8d one more attemp to get the data type right
22d0e355 attempt to fix warning on type conversion
ded4583a attempt to fix warning on type conversion
897b2f0f add tests for 1, 2 and ARRAY_SIZE elements for both ScanInclusiveByKey and ScanExclusiveByKey
c05a2c32 Merge branch 'scanbykey' of gitlab.kitware.com:ollielo/vtk-m into scanbykey
0e97fcb9 handle the case of 0 or 1 element in the input for ScanExclusiveByKey
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !746
- Exception classes cannot be exported due to MSVC's design decisions.
See http://stackoverflow.com/questions/24511376. We must leave these
classes as header only and silence the warnings.
- TransferResource in BufferState.h must remain a header-only class since
there is no vtkm_interop library to compile the class into.
- The VTKDataSetReader hierarchy must similarly remain header-only since
there is no vtkm_io library.
- The OptionParser Action classes are part of a header-only utility and
cannot be easily compiled into a library.
-
The current design for ArrayPortalVirtual makes it a requirement for all
array portals (that it wraps) to have Set defined. Thus, make sure Set is
defined for all ArrayPortal. Where Set is invalid, an assert is thrown if
something calls it at runtime.
Class that need to be passed across dynamic library boundaries such as
DynamicArrayHandle need to be properly export. One of 'tricks' of this
is that templated classes such as PolymorphicArrayHandleContainer need
the Type and Storage types to have public visibility.
This makes sure that all vtkm storage tags have public visibility so
in the future we can transfer dynamic array handles across libraries.
It is common practice to swallow a semicolon in a macro using a do/
while(false) loop that only executes once. However, the Visual Studio
2013 compiler was stupidly warning about having a constant expression
for the loop. Get around the problem by creating our own swallow
semicolon macros that disable this warning as necessary.
I noticed that when I attempted to execute a CUDA kernel with 0 blocks,
it generated a CUDA error. The error did not really matter since nothing
was really supposed to run anyway. However, once we are careful about
checking CUDA errors, it will cause test failures and likely other
misdiagnoses.
Most functions in the CUDA runtime API return an error code that must be
checked to determine whether the operation completed successfully. Most
operations in VTK-m just called the function and assumed it completed
correctly, which could lead to further errors. This change wraps most
CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and
throws an exception if the call fails.
natively thrust::reduce is unable to handle the use case where the
iterator type and the initial value/return value are two different
types. We use array handle cast to work around this problem when
we detect this usecase.
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
This allows callers to copy a subsection of an array into another array,
without clearing the contents of the destination array if a resize
is required.
The build automatically sets some macros when building CUDA files. Some
of the CUDA sources were setting the same macros, which was causing
warnings. Change the code to be more careful about setting preprocessor
macros.
First, be more explicit when we mean a range of values in a field or a
spacial bounds. Use the Range and Bounds structs in Field and
CoordinateSystem to make all of this more clear (and reduce a bit of
code as well).
These asserts are consolidated into the unified Assert.h. Also made some
minor edits to add asserts where appropriate and a little bit of
reconfiguring as found.
Add lots of checks to CUDA calls in the timer to try to identify any
problems that might be showing up on the dashboard.
Also adding some print statements around the sleep function in the
device adapter testing. For some reason the problem just went away with
them.
Previously, the timer for CUDA devices only called cudaEventSynchronize
at the end event when asking for the elapsed time. This, however, could
allow time to pass from when the timer was reset to when the start event
happened that was not recorded in the timer. This added synchronization
should make sure that all time spent in CUDA is recorded.
Previously each device adapter only had a unique string name. This was
not the best when it came to developing data structures to track the status
of a given device at runtime.
This adds in a unique numeric identifier to each device adapter. This will
allow classes to easily create bitmasks / lookup tables for the validity of
devices.
The RuntimeDeviceInformation class allows developers to check if a given
device is supported on a machine at runtime. This allows developers to properly
check for CUDA support before running any worklets.