Consumers of VTK-m when enabling of dropping of unused functions
will see VTK-m functions dropped. Previously this didn't happen
as VTK-m didn't build object files with the correct flags for this.
By allowing the linker to remove unused symbols we see a significant
saving the file size of VTK-m tests, examples, and benchmarks.
An OpenMP build of the tests and benchmarks goes from 168MB to
141MB which is roughly a 16% filesize reduction.
Initially I had presumed that these changes would increase link times.
But in measurements the total wall time for compilation of VTK-m has
stayed about the same ( seeing a decrease of 1.5% ). Presumably the
increased computation is offset by the reduction in file writing.
- Use AtomicInterface to implement device-specific atomic operations.
- Remove DeviceAdapterAtomicArrayImplementations.
- Extend supported atomic types to include unsigned 32/64-bit ints.
- Add a static_assert to check that AtomicArray type is supported.
- Add documentation for AtomicArrayExecutionObject, including a CAS
example.
- Add a `T Get(idx)` method to AtomicArrayExecutionObject that does
an atomic load, and update existing CAS usage to use this instead
of `Add(idx, 0)`.
The `From` and `To` nomenclature for topology mapping has been confusing for
both users and developers, especially at lower levels where the intention of
mapping attributes from one element to another is easily conflated with the
concept of mapping indices (which maps in the exact opposite direction).
These identifiers have been renamed to `VisitTopology` and `IncidentTopology`
to clarify the direction of the mapping. The order in which these template
parameters are specified for `WorkletMapTopology` have also been reversed,
since eventually there may be more than one `IncidentTopology`, and having
`IncidentTopology` at the end will allow us to replace it with a variadic
template parameter pack in the future.
Other implementation details supporting these worklets, include `Fetch` tags,
`Connectivity` classes, and methods on the various `CellSet` classes (such as
`PrepareForInput` have also reversed their template arguments. These will need
to be cautiously updated.
The convenience implementations of `WorkletMapTopology` have been renamed for
clarity as follows:
```
WorkletMapPointToCell --> WorkletVisitCellsWithPoints
WorkletMapCellToPoint --> WorkletVisitPointsWithCells
```
The `ControlSignature` tags have been renamed as follows:
```
FieldInTo --> FieldInVisit
FieldInFrom --> FieldInMap
FromCount --> IncidentElementCount
FromIndices --> IncidentElementIndices
```
The Invoker is a control side object that handles the construction
of the relevant worklet dispatcher. Moving it to control makes it
obvious that it isn't an algorithm itself but a way to launch
worklets.
There was a special case for ArrayHandleMultiplexer where if you gave it
just one type it would treat that as a value type rather than an array
to support and instead provide a default list of types. However, GCC 4.8
is having trouble compiling the code to create the default list, the
semantics are confusing, and the more I think about it the less likely I
think we will need this functionality. So, just getting rid of that.
Previously the "dynamic" array was taken from a VariantArrayHandle.
However, the VariantArrayHandle will actually cast to a basic array, so
the comparison is not particularly fair. Change that to an
ArrayHandleVirtual so that it is actually calling through a virtual
method.
Also make 2 versions of the multiplexer test. The first version has an
array that is at the 1st index and the second is at the last index. This
tests whether the compiled code has to do lots of actual comparisons to
get to the last index.
BitFields are:
- Stored in memory using a contiguous buffer of bits.
- Accessible via portals, a la ArrayHandle.
- Portals operate on individual bits or words.
- Operations may be atomic for safe use from concurrent kernels.
The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle
containing the indices of all set bits, in no particular order.
The new AtomicInterface classes provide an abstraction into bitwise
atomic operations across control and execution environments and are used
to implement the BitPortals.
VTK-m has been updated to replace old per device benchmark executables with a device
dependent shared library so that it's able to accept a device adapter at runtime through
the "--device=" argument.
The timer class now is asynchronous and device independent. it's using an
similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(),
and GetElapsedTime() function. For convenience and backward compability, Each
Start() function call will call Reset() internally and each GetElapsedTime()
function call will call Stop() function if it hasn't been called yet for keeping
backward compatibility purpose.
Bascially it can be used in two modes:
* Create a Timer without any device info. vtkm::cont::Timer time;
* It would enable timers for all enabled devices on the machine. Users can get a
specific elapsed time by passing a device id into the GetElapsedtime function.
If no device is provided, it would pick the maximum of all timer results - the
logic behind this decision is that if cuda is disabled, openmp, serial and tbb
roughly give the same results; if cuda is enabled it's safe to return the
maximum elapsed time since users are more interested in the device execution
time rather than the kernal launch time. The Ready function can be handy here
to query the status of the timer.
* Create a Timer with a device id. vtkm::cont::Timer time((vtkm::cont::DeviceAdapterTagCuda()));
* It works as the old timer that times for a specific device id.
This is a library that contains parts of worklets that can be
precompiled into a library.
Currently, this library contains the implementation of ScatterCounting.
The recent removal of type selectors in ControlSignature field tags
means that the BenchmarkFieldAlgorithms code was creating code paths
that were never followed and lead to impossible type conversions (e.g.
Vec to float). Fixed the problem by specifing the types for the
VariantArrayHandles better.
Also
- Renamed vtkm::cont::make_DeviceAdapterIdFromName to just overload
make_DeviceAdapterId.
- Refactored CMake logic for unit tests
- Since we're now querying the device tracker for the names, they
cannot be all caps.
- Updated usages of InitLogging to use Initialize instead.
- Added changelog.
Now that the dispatcher does its own TryExecute, filters do not need to
do that. This change requires all worklets called by filters to be able
to execute without knowing the device a priori.
Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.
Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.