Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
By using the auto keyword and decltype we can reduce the number of
complex typedefs that exist when writing device adapter algorithms.
The goal being that it is easier for developers to see the actual
algorithms being implemented, by reducing the amount of template
'noise'.
These two algorithms does not return meaningful return values. Generic interface and
implementation are both void. Remove erronous return type and statement for CUDA backend.
The parameter sweeping code is only enabled when tuning for new GPU's
so we should move it to a separate header to make DeviceAdapterAlgorithmThrust
easier to read.
The old templated array transfer mechanism generated a lot of code
that ended up doing a simple, type-agnostic memcpy for most devices.
This patch specialized array handles for basic storage and uses a
fast-path array transfer implementation. This reduces the size of the
vtkm_cont library by 27% on gcc (from 6.2MB to 4.5MB).
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.
Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB
WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB
libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
Previously WorkletInvokeFunctor inherited from vtkm::exec::FunctorBase,
which is also the base class for all users Worklets and for all functors
based to DeviceAdapter::Schedule.
This is done for a few reasons. The first is that we reduce the
minimum size of user worklets. Previously the users worklet would hold
a reference to the error message, and so would the wrapper class added
when calling DeviceAdapter::Schedule. Now we only have the users worklet
holding a reference.
Second, by refactoring to have two base classes we can better improve
the documentation on what responsibilities FunctorBase.h has, compared
to TaskBase.
ec6589d3 Only enable -fPIC on component static libraries when necessary.
cbfe5fdd Fix up various issues with ArrayHandles in vtkm_cont.
355eea88 Get the vtkm cont cuda object to compile properly.
6ecc22bb First pass at compiling ArrayHandle into vtkm_cont.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !715
I noticed that when I attempted to execute a CUDA kernel with 0 blocks,
it generated a CUDA error. The error did not really matter since nothing
was really supposed to run anyway. However, once we are careful about
checking CUDA errors, it will cause test failures and likely other
misdiagnoses.
Most functions in the CUDA runtime API return an error code that must be
checked to determine whether the operation completed successfully. Most
operations in VTK-m just called the function and assumed it completed
correctly, which could lead to further errors. This change wraps most
CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and
throws an exception if the call fails.
natively thrust::reduce is unable to handle the use case where the
iterator type and the initial value/return value are two different
types. We use array handle cast to work around this problem when
we detect this usecase.
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.