This merge request is Phase 1 of several to implement the distributed parallel
contour tree in VTKm. This merge requests adds the base outline for the
algorithm. The implementation of the details of the algorithm in the
BoundaryRestrictedAugmentedContourTree.h is currently still missing.
However, these will require a substantial (~3000) lines of additional code.
The goal is to stage the integration process across merge requests to make
the review process simpler.
The timer class now is asynchronous and device independent. it's using an
similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(),
and GetElapsedTime() function. For convenience and backward compability, Each
Start() function call will call Reset() internally and each GetElapsedTime()
function call will call Stop() function if it hasn't been called yet for keeping
backward compatibility purpose.
Bascially it can be used in two modes:
* Create a Timer without any device info. vtkm::cont::Timer time;
* It would enable timers for all enabled devices on the machine. Users can get a
specific elapsed time by passing a device id into the GetElapsedtime function.
If no device is provided, it would pick the maximum of all timer results - the
logic behind this decision is that if cuda is disabled, openmp, serial and tbb
roughly give the same results; if cuda is enabled it's safe to return the
maximum elapsed time since users are more interested in the device execution
time rather than the kernal launch time. The Ready function can be handy here
to query the status of the timer.
* Create a Timer with a device id. vtkm::cont::Timer time((vtkm::cont::DeviceAdapterTagCuda()));
* It works as the old timer that times for a specific device id.
Now that the dispatcher does its own TryExecute, filters do not need to
do that. This change requires all worklets called by filters to be able
to execute without knowing the device a priori.