MultiBlock now uses `diy::reduce` for reductions rather than using proxy
collectives. To support using `diy::reduce` operations on a
vtkm::cont::MultiBlock, added AssignerMultiBlock and
DecomposerMultiBlock classes. This are helper classes that provide DIY
concepts on top of a existing MultiBlock.
By using perfect forwarding we can reduce not only the amount of TryExecute
signatures, but we can enable the ability to pass temporary functors to
TryExecute.
At the same time we have optimized TryExecute by moving the string generation
code into a single function that is compiled into the vtkm_cont library.
The end result is that the vtkm_rendering library size has been reduced from
12MB to 11MB, and we shave off about 5% of our build time.
This is a convenience method to do a deep copy of an array. This comes
up a lot, but can be a pain if you don't have a specific device adapter
on which to do the copy.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
This is part of #43, which will ultimately simplify the
ArrayHandleCompositeVector to a new implementation that can be easily
written to. Part of this effort will remove the ability to pull a single
component from a vector-typed input ArrayHandle for use in the
CompositeVector, and this new class makes sure we can still support that
usecase.
The old templated array transfer mechanism generated a lot of code
that ended up doing a simple, type-agnostic memcpy for most devices.
This patch specialized array handles for basic storage and uses a
fast-path array transfer implementation. This reduces the size of the
vtkm_cont library by 27% on gcc (from 6.2MB to 4.5MB).
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.
Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB
WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB
libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
Most uses of ArrayRangeCompute just want to get the range of the data
and probably don't have a particular device in mind. Thus, it is better
to use a TryExecute internally use whatever devices are available.
Note that when using TryExecute, the calling code is expected to be able
to support all devices. That might not always be the case. Thus, I am
experimenting a bit with how we incorporate this in a library. The
advantage of having the code compiled in a library is that you only have
to compile it once and the calling code does not need to worry about
CUDA, etc.
However, because ArrayRangeCompute is templated, we can only pre-compile
some subset of array handle types. The most common are compiled into the
code (matching all the predefined ArrayHandles as well as some special
cases). If the code wants to use some other type, it has to include
ArrayRangeCompute.hxx. The only place where this is necessary is a test
that intentially trys to find the range on an uncommon type.
If array portals were to support virtual methods, then we should be able
to modify this code so that we could precompile for all array handle
types.
The RuntimeDeviceTracker.cxx contains a library method that queries the
CUDA device, which only works if compiled as a CUDA source file.
This set up will allow code that is not compiled with CUDA use a
RuntimeDeviceTracker with other code that does use CUDA.
This allows code to include the RuntimeDeviceTracker without depending
on the device-specific adapters (I think).
Also changed the implementation to use a shared_ptr for the state so you
can pass it around and share the state easier.
Before it was in the vtkm::cont::internal namespace. However, we expect
to be using this feature quite a bit more as we want VTK-m to handle
multiple devices effectively (as in, just figure it out and go).
Previously if you constructed an array handle without allocating it, you
would get an error if you tried to use the array as input. This
conflicted with some recent changes to accept empty vectors.
Now when you try to use an unallocated ArrayHandle as input (calling
PrepareForInput or PrepareForInPlace), it internally calls Allocate(0)
(to establish internal state) and sets up a valid execution ArrayPortal
of size 0.
ArrayHandleDiscard is intended to be used for worklets that produce
multiple output arrays when one or more outputs is not needed. It
does not allocate space for its data and the Set method is a no-op,
allowing the compiler to prune unnecessary instructions.
Reading from the array handle is not allowed.
This reduces the number of weak vtables vtkm generates, resulting in
a reduction of binary sizes for projects that include vtkm classes in
multiple translation units.
This is a fancy array handle that can group entries in another array by
arbitrary amounts. This allows us to implement input and output arrays
with a different sized Vec for each instance. This is necessary for
generating new topologies with cells of different types.