vtk-m/docs/changelog/1.6/release-notes.md
Vicente Adolfo Bolea Sanchez 19c1b99f18 1.6.0-rc1 is our 7th official release of VTK-m.
The major changes to VTK-m from 1.5.1 can be found in:
  docs/changelog/1.6.0/release-notes.md

Signed-off-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
2021-04-06 18:28:53 -04:00

112 KiB

VTK-m 1.6 Release Notes

Table of Contents

  1. Core
  • Add Kokkos backend
  • Deprecate DataSetFieldAdd
  • Move VTK file readers and writers into vtkm_io
  • Remove VTKDataSetWriter::WriteDataSet just_points parameter
  • Added VecFlat class
  • Add a vtkm::Tuple class
  • DataSet now only allows unique field names
  • Result DataSet of coordinate transform has its CoordinateSystem changed
  • vtkm::cont::internal::Buffer now can have ownership transferred
  • Configurable default types
  1. ArrayHandle
  • Shorter fancy array handle classnames
  • ReadPortal().Get(idx)
  • Precompiled ArrayCopy for UnknownArrayHandle
  • Create ArrayHandleOffsetsToNumComponents
  • Recombine extracted component arrays from unknown arrays
  • UnknownArrayHandle and UncertainArrayHandle for runtime-determined types
  • Support ArrayHandleSOA as a "default" array
  • Removed old ArrayHandle transfer mechanism
  • Order asynchronous ArrayHandle access
  • Improvements to moving data into ArrayHandle
  • Deprecate ArrayHandleVirtualCoordinates
  • ArrayHandleDecorator Allocate and Shrink Support
  • Portals may advertise custom iterators
  • Redesign of ArrayHandle to access data using typeless buffers
  • ArrayRangeCompute works on any array type without compiling device code
  • Implemented ArrayHandleRandomUniformBits and ArrayHandleRandomUniformReal
  • Extract component arrays from unknown arrays
  • ArrayHandleGroupVecVariable holds now one more offset.
  1. Control Environment
  • Algorithms for Control and Execution Environments
  1. Execution Environment
  • Scope ExecObjects with Tokens
  • Masks and Scatters Supported for 3D Scheduling
  • Virtual methods in execution environment deprecated
  • Deprecate Execute with policy
  1. Worklets and Filters
  • Enable setting invalid value in probe filter
  • Avoid raising errors when operating on cells
  • Add atomic free functions
  • Flying Edges
  • Filters specify their own field types
  1. Build
  • Disable asserts for CUDA architecture builds
  • Disable asserts for HIP architecture builds
  • Add VTKM_DEPRECATED macro
  1. Other
  • Porting layer for future std features
  • Removed OpenGL Rendering Classes
  • Reorganization of io directory
  • Implemented PNG/PPM image Readers/Writers
  • Updated Benchmark Framework
  • Provide scripts to build Gitlab-ci workers locally
  • Replaced vtkm::ListTag with vtkm::List
  • Add ListTagRemoveIf
  • Write uniform and rectilinear grids to legacy VTK files
  1. References

Core

Add Kokkos backend

Adds a new device backend Kokkos which uses the kokkos library for parallelism. User must provide the kokkos build and Vtk-m will use the default configured execution space.

Deprecate DataSetFieldAdd

The class vtkm::cont::DataSetFieldAdd is now deprecated. Its methods, AddPointField and AddCellField have been moved to member functions of vtkm::cont::DataSet, which simplifies many calls.

For example, the following code

vtkm::cont::DataSetFieldAdd fieldAdder;
fieldAdder.AddCellField(dataSet, "cellvar", values);

would now be

dataSet.AddCellField("cellvar", values);

Move VTK file readers and writers into vtkm_io

The legacy VTK file reader and writer were created back when VTK-m was a header-only library. Things have changed and we now compile quite a bit of code into libraries. At this point, there is no reason why the VTK file reader/writer should be any different.

Thus, VTKDataSetReader, VTKDataSetWriter, and several supporting classes are now compiled into the vtkm_io library. Also similarly updated BOVDataSetReader for good measure.

As a side effect, code using VTK-m will need to link to vtkm_io if they are using any readers or writers.

Remove VTKDataSetWriter::WriteDataSet just_points parameter

In the method VTKDataSetWriter::WriteDataSet, just_points parameter has been removed due to lack of usage.

The purpose of just_points was to allow exporting only the points of a DataSet without its cell data.

Added VecFlat class

vtkm::VecFlat is a wrapper around a Vec-like class that may be a nested series of vectors. For example, if you run a gradient operation on a vector field, you are probably going to get a Vec of Vecs that looks something like vtkm::Vec<vtkm::Vec<vtkm::Float32, 3>, 3>. That is fine, but what if you want to treat the result simply as a Vec of size 9?

The VecFlat wrapper class allows you to do this. Simply place the nested Vec as an argument to VecFlat and it will behave as a flat Vec class. (In fact, VecFlat is a subclass of Vec.) The VecFlat class can be copied to and from the nested Vec it is wrapping.

There is a vtkm::make_VecFlat convenience function that takes an object and returns a vtkm::VecFlat wrapped around it.

VecFlat works with any Vec-like object as well as scalar values. However, any type used with VecFlat must have VecTraits defined and the number of components must be static (i.e. known at compile time).

Add a vtkm::Tuple class

This change added a vtkm::Tuple class that is very similar in nature to std::tuple. This should replace our use of tao tuple.

The motivation for this change was some recent attempts at removing objects like Invocation and FunctionInterface. I expected these changes to speed up the build, but in fact they ended up slowing down the build. I believe the problem was that these required packing variable parameters into a tuple. I was using the tao tuple class, but it seemed to slow down the compile. (That is, compiling tao's tuple seemed much slower than compiling the equivalent FunctionInterface class.)

The implementation of vtkm::Tuple is using pyexpander to build lots of simple template cases for the object (with a backup implementation for even longer argument lists). I believe the compiler is better and parsing through thousands of lines of simple templates than to employ clever MPL to build general templates.

Usage

The vtkm::Tuple class is defined in the vtkm::Tuple.h header file. A Tuple is designed to behave much like a std::tuple with some minor syntax differences to fit VTK-m coding standards.

A tuple is declared with a list of template argument types.

vtkm::Tuple<vtkm::Id, vtkm::Vec3f, vtkm::cont::ArrayHandle<vtkm::Float32>> myTuple;

If given no arguments, a vtkm::Tuple will default-construct its contained objects. A vtkm::Tuple can also be constructed with the initial values of all contained objects.

vtkm::Tuple<vtkm::Id, vtkm::Vec3f, vtkm::cont::ArrayHandle<vtkm::Float32>> 
  myTuple(0, vtkm::Vec3f(0, 1, 2), array);

For convenience there is a vtkm::MakeTuple function that takes arguments and packs them into a Tuple of the appropriate type. (There is also a vtkm::make_tuple alias to the function to match the std version.)

auto myTuple = vtkm::MakeTuple(0, vtkm::Vec3f(0, 1, 2), array);

Data is retrieved from a Tuple by using the vtkm::Get method. The Get method is templated on the index to get the value from. The index is of type vtkm::IdComponent. (There is also a vtkm::get that uses a std::size_t as the index type as an alias to the function to match the std version.)

vtkm::Id a = vtkm::Get<0>(myTuple);
vtkm::Vec3f b = vtkm::Get<1>(myTuple);
vtkm::cont::ArrayHandle<vtkm::Float32> c = vtkm::Get<2>(myTuple);

Likewise vtkm::TupleSize and vtkm::TupleElement (and their aliases vtkm::Tuple_size, vtkm::tuple_element, and vtkm::tuple_element_t) are provided.

Extended Functionality

The vtkm::Tuple class contains some functionality beyond that of std::tuple to cover some common use cases in VTK-m that are tricky to implement. In particular, these methods allow you to use a Tuple as you would commonly use parameter packs. This allows you to stash parameter packs in a Tuple and then get them back out again.

For Each

vtkm::Tuple::ForEach() is a method that takes a function or functor and calls it for each of the items in the tuple. Nothing is returned from ForEach and any return value from the function is ignored.

ForEach can be used to check the validity of each item.

void CheckPositive(vtkm::Float64 x)
{
  if (x < 0)
  {
    throw vtkm::cont::ErrorBadValue("Values need to be positive.");
  }
}

// ...

  vtkm::Tuple<vtkm::Float64, vtkm::Float64, vtkm::Float64> tuple(
    CreateValue1(), CreateValue2(), CreateValue3());

  // Will throw an error if any of the values are negative.
  tuple.ForEach(CheckPositive);

ForEach can also be used to aggregate values.

struct SumFunctor
{
  vtkm::Float64 Sum = 0;
  
  template <typename T>
  void operator()(const T& x)
  {
    this->Sum = this->Sum + static_cast<vtkm::Float64>(x);
  }
};

// ...

  vtkm::Tuple<vtkm::Float32, vtkm::Float64, vtkm::Id> tuple(
    CreateValue1(), CreateValue2(), CreateValue3());

  SumFunctor sum;
  tuple.ForEach(sum);
  vtkm::Float64 average = sum.Sum / 3;

Transform

vtkm::Tuple::Transform is a method that builds a new Tuple by calling a function or functor on each of the items. The return value is placed in the corresponding part of the resulting Tuple, and the type is automatically created from the return type of the function.

struct GetReadPortalFunctor
{
  template <typename Array>
  typename Array::ReadPortal operator()(const Array& array) const
  {
    VTKM_IS_ARRAY_HANDLE(Array);
	return array.ReadPortal();
  }
};

// ...

  auto arrayTuple = vtkm::MakeTuple(array1, array2, array3);
  
  auto portalTuple = arrayTuple.Transform(GetReadPortalFunctor{});

Apply

vtkm::Tuple::Apply is a method that calls a function or functor using the objects in the Tuple as the arguments. If the function returns a value, that value is returned from Apply.

struct AddArraysFunctor
{
  template <typename Array1, typename Array2, typename Array3>
  vtkm::Id operator()(Array1 inArray1, Array2 inArray2, Array3 outArray) const
  {
    VTKM_IS_ARRAY_HANDLE(Array1);
    VTKM_IS_ARRAY_HANDLE(Array2);
    VTKM_IS_ARRAY_HANDLE(Array3);

    vtkm::Id length = inArray1.GetNumberOfValues();
	VTKM_ASSERT(inArray2.GetNumberOfValues() == length);
	outArray.Allocate(length);
	
	auto inPortal1 = inArray1.ReadPortal();
	auto inPortal2 = inArray2.ReadPortal();
	auto outPortal = outArray.WritePortal();
	for (vtkm::Id index = 0; index < length; ++index)
	{
	  outPortal.Set(index, inPortal1.Get(index) + inPortal2.Get(index));
	}
	
	return length;
  }
};

// ...

  auto arrayTuple = vtkm::MakeTuple(array1, array2, array3);

  vtkm::Id arrayLength = arrayTuple.Apply(AddArraysFunctor{});

If additional arguments are given to Apply, they are also passed to the function (before the objects in the Tuple). This is helpful for passing state to the function. (This feature is not available in either ForEach or Transform for technical implementation reasons.)

struct ScanArrayLengthFunctor
{
  template <std::size_t N, typename Array, typename... Remaining>
  std::array<vtkm::Id, N + 1 + sizeof...(Remaining)>
  operator()(const std::array<vtkm::Id, N>& partialResult,
             const Array& nextArray,
			 const Remaining&... remainingArrays) const
  {
    std::array<vtkm::Id, N + 1> nextResult;
	std::copy(partialResult.begin(), partialResult.end(), nextResult.begin());
    nextResult[N] = nextResult[N - 1] + nextArray.GetNumberOfValues();
	return (*this)(nextResult, remainingArray);
  }
  
  template <std::size_t N>
  std::array<vtkm::Id, N> operator()(const std::array<vtkm::Id, N>& result) const
  {
    return result;
  }
};

// ...

  auto arrayTuple = vtkm::MakeTuple(array1, array2, array3);
  
  std::array<vtkm::Id, 4> = 
    arrayTuple.Apply(ScanArrayLengthFunctor{}, std::array<vtkm::Id, 1>{ 0 });

DataSet now only allows unique field names

When you add a vtkm::cont::Field to a vtkm::cont::DataSet, it now requires every Field to have a unique name. When you attempt to add a Field to a DataSet that already has a Field of the same name and association, the old Field is removed and replaced with the new Field.

You are allowed, however, to have two Fields with the same name but different associations. For example, you could have a point Field named "normals" and also have a cell Field named "normals" in the same DataSet.

This new behavior matches how VTK's data sets manage fields.

The old behavior allowed you to add multiple Fields with the same name, but it would be unclear which one you would get if you asked for a Field by name.

Result DataSet of coordinate transform has its CoordinateSystem changed

When you run one of the coordinate transform filters, CylindricalCoordinateTransform or SphericalCoordinateTransform, the transform coordiantes are placed as the first CoordinateSystem in the returned DataSet. This means that after running this filter, the data will be moved to this new coordinate space.

Previously, the result of these filters was just placed in a named Field of the output. This caused some confusion because the filter did not seem to have any effect (unless you knew to modify the output data). Not using the result as the coordinate system seems like a dubious use case (and not hard to work around), so this is much better behavior.

vtkm::cont::internal::Buffer now can have ownership transferred

Memory once transferred to Buffer always had to be managed by VTK-m. This is problematic for applications that needed VTK-m to allocate memory, but have the memory ownership be longer than VTK-m.

Buffer::TakeHostBufferOwnership allows for easy transfer ownership of memory out of VTK-m. When taking ownership of an VTK-m buffer you are provided the following information:

  • Memory: A void* pointer to the array
  • Container: A void* pointer used to free the memory. This is necessary to support cases such as allocations transferred into VTK-m from a std::vector.
  • Delete: The function to call to actually delete the transferred memory
  • Reallocate: The function to call to re-allocate the transferred memory. This will throw an exception if users try to reallocate a buffer that was 'view' only
  • Size: The size in number of elements of the array

To properly steal memory from VTK-m you do the following:

  vtkm::cont::ArrayHandle<T> arrayHandle;

  ...

  auto stolen = arrayHandle.GetBuffers()->TakeHostBufferOwnership();
    
  ...

  stolen.Delete(stolen.Container);

Configurable default types

Because VTK-m compiles efficient code for accelerator architectures, it often has to compile for static types. This means that dynamic types often have to be determined at runtime and converted to static types. This is the reason for the CastAndCall architecture in VTK-m.

For this CastAndCall to work, there has to be a finite set of static types to try at runtime. If you don't compile in the types you need, you will get runtime errors. However, the more types you compile in, the longer the compile time and executable size. Thus, getting the types right is important.

The "right" types to use can change depending on the application using VTK-m. For example, when VTK links in VTK-m, it needs to support lots of types and can sacrifice the compile times to do so. However, if using VTK-m in situ with a fortran simulation, space and time are critical and you might only need to worry about double SoA arrays.

Thus, it is important to customize what types VTK-m uses based on the application. This leads to the oxymoronic phrase of configuring the default types used by VTK-m.

This is being implemented by providing VTK-m with a header file that defines the default types. The header file provided to VTK-m should define one or more of the following preprocessor macros:

  • VTKM_DEFAULT_TYPE_LIST - a vtkm::List of value types for fields that filters should directly operate on (where applicable).
  • VTKM_DEFAULT_STORAGE_LIST - a vtkm::List of storage tags for fields that filters should directly operate on.
  • VTKM_DEFAULT_CELL_SET_LIST_STRUCTURED - a vtkm::List of vtkm::cont::CellSet types that filters should operate on as a strutured cell set.
  • VTKM_DEFAULT_CELL_SET_LIST_UNSTRUCTURED - a vtkm::List of vtkm::cont::CellSet types that filters should operate on as an unstrutured cell set.
  • VTKM_DEFAULT_CELL_SET_LIST - a vtkm::List of vtkm::cont::CellSet types that filters should operate on (where applicable). The default of vtkm::ListAppend<VTKM_DEFAULT_CELL_SET_LIST_STRUCTURED, VTKM_DEFAULT_CELL_SET_LIST> is usually correct.

If any of these macros are not defined, a default version will be defined. (This is the same default used if no header file is provided.)

This header file is provided to the build by setting the VTKm_DEFAULT_TYPES_HEADER CMake variable. VTKm_DEFAULT_TYPES_HEADER points to the file, which will be configured and copied to VTK-m's build directory.

For convenience, header files can be added to the VTK_m source directory (conventionally under vtkm/cont/internal). If this is the case, an advanced CMake option should be added to select the provided header file.

ArrayHandle

Shorter fancy array handle classnames

Many of the fancy ArrayHandles use the generic builders like ArrayHandleTransform and ArrayHandleImplicit for their implementation. Such is fine, but because they use functors and other such generic items to template their Storage, you can end up with very verbose classnames. This is an issue for humans trying to discern classnames. It can also be an issue for compilers that end up with very long resolved classnames that might get truncated if they extend past what was expected.

The fix was for these classes to declare their own Storage tag and then implement their Storage and ArrayTransport classes as trivial subclasses of the generic ArrayHandleImplicit or ArrayHandleTransport.

As an added bonus, a lot of this shortening also means that storage that relies on other array handles now are just typed by the storage of the decorated type, not the array itself. This should make the types a little more robust.

Here is a list of classes that were updated.

ArrayHandleCast<TargetT, vtkm::cont::ArrayHandle<SourceT, SourceStorage>>

Old storage:

vtkm::cont::internal::StorageTagTransform<
  vtkm::cont::ArrayHandle<SourceT, SourceStorage>,
  vtkm::cont::internal::Cast<TargetT, SourceT>,
  vtkm::cont::internal::Cast<SourceT, TargetT>>

New Storage:

vtkm::cont::StorageTagCast<SourceT, SourceStorage>

(Developer's note: Implementing this change to ArrayHandleCast was a much bigger PITA than expected.)

ArrayHandleCartesianProduct<AH1, AH2, AH3>

Old storage:

vtkm::cont::internal::StorageTagCartesianProduct<
  vtkm::cont::ArrayHandle<ValueType, StorageTag1,
  vtkm::cont::ArrayHandle<ValueType, StorageTag2,
  vtkm::cont::ArrayHandle<ValueType, StorageTag3>>

New storage:

vtkm::cont::StorageTagCartesianProduct<StorageTag1, StorageTag2, StorageTag3>

ArrayHandleCompositeVector<AH1, AH2, ...>

Old storage:

vtkm::cont::internal::StorageTagCompositeVector<
  tao::tuple<
    vtkm::cont::ArrayHandle<ValueType, StorageType1>, 
	vtkm::cont::ArrayHandle<ValueType, StorageType2>,
	...
  >
>

New storage:

vtkm::cont::StorageTagCompositeVec<StorageType1, StorageType2>

ArrayHandleConcatinate

First an example with two simple types.

Old storage:

vtkm::cont::StorageTagConcatenate<
  vtkm::cont::ArrayHandle<ValueType, StorageTag1>,
  vtkm::cont::ArrayHandle<ValueType, StorageTag2>>

New storage:

vtkm::cont::StorageTagConcatenate<StorageTag1, StorageTag2>

Now a more specific example taken from the unit test of a concatination of a concatination.

Old storage:

vtkm::cont::StorageTagConcatenate<
  vtkm::cont::ArrayHandleConcatenate<
    vtkm::cont::ArrayHandle<ValueType, StorageTag1>,
	vtkm::cont::ArrayHandle<ValueType, StorageTag2>>,
  vtkm::cont::ArrayHandle<ValueType, StorageTag3>>

New storage:

vtkm::cont::StorageTagConcatenate<
  vtkm::cont::StorageTagConcatenate<StorageTag1, StorageTag2>, StorageTag3>

ArrayHandleConstant

Old storage:

vtkm::cont::StorageTagImplicit<
  vtkm::cont::detail::ArrayPortalImplicit<
    vtkm::cont::detail::ConstantFunctor<ValueType>>>

New storage:

vtkm::cont::StorageTagConstant

ArrayHandleCounting

Old storage:

vtkm::cont::StorageTagImplicit<vtkm::cont::internal::ArrayPortalCounting<ValueType>>

New storage:

vtkm::cont::StorageTagCounting

ArrayHandleGroupVec

Old storage:

vtkm::cont::internal::StorageTagGroupVec<
  vtkm::cont::ArrayHandle<ValueType, StorageTag>, N>

New storage:

vtkm::cont::StorageTagGroupVec<StorageTag, N>

ArrayHandleGroupVecVariable

Old storage:

vtkm::cont::internal::StorageTagGroupVecVariable<
  vtkm::cont::ArrayHandle<ValueType, StorageTag1>, 
  vtkm::cont::ArrayHandle<vtkm::Id, StorageTag2>>

New storage:

vtkm::cont::StorageTagGroupVecVariable<StorageTag1, StorageTag2>

ArrayHandleIndex

Old storage:

vtkm::cont::StorageTagImplicit<
  vtkm::cont::detail::ArrayPortalImplicit<vtkm::cont::detail::IndexFunctor>>

New storage:

vtkm::cont::StorageTagIndex

ArrayHandlePermutation

Old storage:

vtkm::cont::internal::StorageTagPermutation<
  vtkm::cont::ArrayHandle<vtkm::Id, StorageTag1>,
  vtkm::cont::ArrayHandle<ValueType, StorageTag2>>

New storage:

vtkm::cont::StorageTagPermutation<StorageTag1, StorageTag2>

ArrayHandleReverse

Old storage:

vtkm::cont::StorageTagReverse<vtkm::cont::ArrayHandle<ValueType, vtkm::cont::StorageTag>>

New storage:

vtkm::cont::StorageTagReverse<StorageTag>

ArrayHandleUniformPointCoordinates

Old storage:

vtkm::cont::StorageTagImplicit<vtkm::internal::ArrayPortalUniformPointCoordinates>

New Storage:

vtkm::cont::StorageTagUniformPoints

ArrayHandleView

Old storage:

vtkm::cont::StorageTagView<vtkm::cont::ArrayHandle<ValueType, StorageTag>>

New storage:

'vtkm::cont::StorageTagView<StorageTag>

ArrayPortalZip

Old storage:

vtkm::cont::internal::StorageTagZip<
  vtkm::cont::ArrayHandle<ValueType1, StorageTag1>,
  vtkm::cont::ArrayHandle<ValueType2, StorageTag2>>

New storage:

vtkm::cont::StorageTagZip<StorageTag1, StorageTag2>

ReadPortal().Get(idx)

Calling ReadPortal() in a tight loop is an antipattern. A call to ReadPortal() causes the array to be copied back to the control environment, and hence code like

for (vtkm::Id i = 0; i < array.GetNumberOfValues(); ++i) {
    vtkm::FloatDefault x = array.ReadPortal().Get(i);
}

is a quadratic-scaling loop.

We have remove almost all internal uses of the ReadPortal().Get antipattern, with the exception of 4 API calls into which the pattern is baked in: CellSetExplicit::GetCellShape, CellSetPermutation::GetNumberOfPointsInCell, CellSetPermutation::GetCellShape, and CellSetPermutation::GetCellPointIds. We expect these will need to be deprecated in the future.

Precompiled ArrayCopy for UnknownArrayHandle

Previously, in order to copy an UnknownArrayHandle, you had to specify some subset of types and then specially compile a copy for each potential type. With the new ability to extract a component from an UnknownArrayHandle, it is now feasible to precompile copying an UnknownArrayHandle to another array. This greatly reduces the overhead of using ArrayCopy to copy UnknownArrayHandles while simultaneously increasing the likelihood that the copy will be successful.

Create ArrayHandleOffsetsToNumComponents

ArrayHandleOffsetsToNumComponents is a fancy array that takes an array of offsets and converts it to an array of the number of components for each packed entry.

It is common in VTK-m to pack small vectors of variable sizes into a single contiguous array. For example, cells in an explicit cell set can each have a different amount of vertices (triangles = 3, quads = 4, tetra = 4, hexa = 8, etc.). Generally, to access items in this list, you need an array of components in each entry and the offset for each entry. However, if you have just the array of offsets in sorted order, you can easily derive the number of components for each entry by subtracting adjacent entries. This works best if the offsets array has a size that is one more than the number of packed vectors with the first entry set to 0 and the last entry set to the total size of the packed array (the offset to the end).

When packing data of this nature, it is common to start with an array that is the number of components. You can convert that to an offsets array using the vtkm::cont::ConvertNumComponentsToOffsets function. This will create an offsets array with one extra entry as previously described. You can then throw out the original number of components array and use the offsets with ArrayHandleOffsetsToNumComponents to represent both the offsets and num components while storing only one array.

This replaces the use of ArrayHandleDecorator in CellSetExplicit. The two implementation should do the same thing, but the new ArrayHandleOffsetsToNumComponents should be less complex for compilers.

Recombine extracted component arrays from unknown arrays

Building on the recent capability to extract component arrays from unknown arrays, there is now also the ability to recombine these extracted arrays to a single ArrayHandle. It might seem counterintuitive to break an ArrayHandle into component arrays and then combine the component arrays back into a single ArrayHandle, but this is a very handy way to run algorithms without knowing the exact ArrayHandle type.

Recall that when extracting a component array from an UnknownArrayHandle you only need to know the base component of the value type of the contained ArrayHandle. That makes extracting a component array independent from either the size of any Vec value type and any storage type.

The added UnknownArrayHandle::ExtractArrayFromComponents method allows you to use the functionality to transform the unknown array handle to a form of ArrayHandle that depends only on this base component type. This method internally uses a new ArrayHandleRecombineVec class, but this class is mostly intended for internal use by this class.

As an added convenience, UnknownArrayHandle now also provides the CastAndCallWithExtractedArray method. This method works like other CastAndCalls except that it uses the ExtractArrayFromComponents feature to allow you to handle most ArrayHandle types with few template instances.

UnknownArrayHandle and UncertainArrayHandle for runtime-determined types

Two new classes have been added to VTK-m: UnknownArrayHandle and UncertainArrayHandle. These classes serve the same purpose as the set of VariantArrayHandle classes and will replace them.

Motivated mostly by the desire to move away from ArrayHandleVirtual, we have multiple reasons to completely refactor the VariantArrayHandle class. These include changing the implementation, some behavior, and even the name.

Motivation

We have several reasons that have accumulated to revisit the implementation of VariantArrayHandle.

Move away from ArrayHandleVirtual

The current implementation of VariantArrayHandle internally stores the array wrapped in an ArrayHandleVirtual. That makes sense since you might as well consolidate the hierarchy of virtual objects into one.

Except ArrayHandleVirtual is being deprecated, so it no longer makes sense to use that internally.

So we will transition the class back to managing the data as typeless on its own. We will consider using function pointers rather than actual virtual functions because compilers can be slow in creating lots of virtual subclasses.

Reintroduce storage tag lists

The original implementation of VariantArrayHandle (which at the time was called DynamicArrayHandle) actually had two type lists: one for the array value type and one for the storage type. The storage type list was removed soon after ArrayHandleVirtual was introduced because whatever the type of array it could be access as ArrayHandleVirtual.

However, with ArrayHandleVirtual being deprecated, this feature is no longer possible. We are in need again for the list of storage types to try. Thus, we need to reintroduce this template argument to VariantArrayHandle.

More clear name

The name of this class has always been unsatisfactory. The first name, DynamicArrayHandle, makes it sound like the data is always changing. The second name, VariantArrayHandle, makes it sound like an array that holds a value type that can vary (like an std::variant).

We can use a more clear name that expresses better that it is holding an ArrayHandle of an unknown type.

Take advantage of default types for less templating

Once upon a time everything in VTK-m was templated header library. Things have changed quite a bit since then. The most recent development is the ability to select the "default types" with CMake configuration that allows you to select a global set of types you care about during compilation. This is so units like filters can be compiled into a library with all types we care about, and we don't have to constantly recompile units.

This means that we are becoming less concerned about maintaining type lists everywhere. Often we can drop the type list and pass data across libraries.

With that in mind, it makes less sense for VariantArrayHandle to actually be a using alias for VariantArrayHandleBase<VTKM_DEFAULT_TYPE_LIST>.

In response, we can revert the is-a relationship between the two. Have a completely typeless version as the base class and have a second version templated version to express when the type of the array has been partially narrowed down to given type lists.

New Name and Structure

The ultimate purpose of this class is to store an ArrayHandle where the value and storage types are unknown. Thus, an appropriate name for the class is UnknownArrayHandle.

UnknownArrayHandle is not templated. It simply stores an ArrayHandle in a typeless (void *) buffer. It does, however, contain many templated methods that allow you to query whether the contained array matches given types, to cast to given types, and to cast and call to a given functor (from either given type lists or default lists).

Rather than have a virtual class structure to manage the typeless array, the new management will use function pointers. This has shown to sometimes improve compile times and generate less code.

Sometimes it is the case that the set of potential types can be narrowed. In this case, the array ceases to be unknown and becomes uncertain. Thus, the companion class to UnknownArrayHandle is UncertainArrayHandle.

UncertainArrayHandle has two template parameters: a list of potential value types and a list of potential storage types. The behavior of UncertainArrayHandle matches that of UnknownArrayHandle (and might inherit from it). However, for CastAndCall operations, it will use the type lists defined in its template parameters.

Serializing UnknownArrayHandle

Because UnknownArrayHandle is not templated, it contains some opportunities to compile things into the vtkm_cont library. Templated methods like CastAndCall cannot be, but the specializations of DIY's serialize can be.

And since it only has to be compiled once into a library, we can spend some extra time compiling for more types. We don't have to restrict ourselves to VTKM_DEFAULT_TYPE_LIST. We can compile for vtkm::TypeListTagAll.

Support ArrayHandleSOA as a "default" array

Many programs, particularly simulations, store fields of vectors in separate arrays for each component. This maps to the storage of ArrayHandleSOA. The VTK-m code tends to prefer the AOS storage (which is what is implemented in ArrayHandleBasic, and the behavior of which is inherited from VTK). VTK-m should better support adding ArrayHandleSOA as one of the types.

We now have a set of default types for Ascent that uses SOA as one of the basic types.

Part of this change includes an intentional feature regression of ArrayHandleSOA to only support value types of Vec. Previously, scalar types were supported. However, the behavior of ArrayHandleSOA is exactly the same as ArrayHandleBasic, except a lot more template code has to be generated. That itself is not a huge deal, but because you have 2 types that essentially do the same thing, a lot of template code in VTK-m would unwind to create two separate code paths that do the same thing with the same data. To avoid creating those code paths, we simply make any use of ArrayHandleSOA without a Vec value invalid. This will prevent VTK-m from creating those code paths.

Removed old ArrayHandle transfer mechanism

Deleted the default implementation of ArrayTransfer. ArrayTransfer is used with the old ArrayHandle style to move data between host and device. The new version of ArrayHandle does not use ArrayTransfer at all because this functionality is wrapped in Buffer (where it can exist in a precompiled library).

Once all the old ArrayHandle classes are gone, this class will be removed completely. Although all the remaining ArrayHandle classes provide their own versions of ArrayTransfer, they still need the prototype to be defined to specialize. Thus, the guts of the default ArrayTransfer are removed and replaced with a compile error if you try to compile it.

Also removed ArrayManagerExecution. This class was used indirectly by the old ArrayHandle, through ArrayHandleTransfer, to move data to and from a device. This functionality has been replaced in the new ArrayHandles through the Buffer class (which can be compiled into libraries rather than make every translation unit compile their own template).

Order asynchronous ArrayHandle access

The recent feature of tokens that scope access to ArrayHandles allows multiple threads to use the same ArrayHandles without read/write hazards. The intent is twofold. First, it allows two separate threads in the control environment to independently schedule tasks. Second, it allows us to move toward scheduling worklets and other algorithms asynchronously.

However, there was a flaw with the original implementation. Once requests to an ArrayHandle get queued up, they are resolved in arbitrary order. This might mean that things run in surprising and incorrect order.

Problematic use case

To demonstrate the flaw in the original implementation, let us consider a future scenario where when you invoke a worklet (on OpenMP or TBB), the call to invoke returns immediately and the actual work is scheduled asynchronously. Now let us say we have a sequence of 3 worklets we wish to run: Worklet1, Worklet2, and Worklet3. One of Worklet1's parameters is a FieldOut that creates an intermediate ArrayHandle that we will simply call array. Worklet2 is given array as a FieldInOut to modify its values. Finally, Worklet3 is given array as a FieldIn. It is clear that for the computation to be correct, the three worklets must execute in the correct order of Worklet1, Worklet2, and Worklet3.

The problem is that if Worklet2 and Worklet3 are both scheduled before Worklet1 finishes, the order they are executed could be arbitrary. Let us say that Worklet1 is invoked, and the invoke call returns before the execution of Worklet1 finishes.

The calling code immediately invokes Worklet2. Because array is already locked by Worklet1, Worklet2 does not execute right away. Instead, it waits on a condition variable of array until it is free. But even though the scheduling of Worklet2 is blocked, the invoke returns because we are scheduling asynchronously.

Likewise, the calling code then immediately calls invoke for Worklet3. Worklet3 similarly waits on the condition variable of array until it is free.

Let us assume the likely event that both Worklet2 and Worklet3 get scheduled before Worklet1 finishes. When Worklet1 then later does finish, it's token relinquishes the lock on array, which wakes up the threads waiting for access to array. However, there is no imposed order on in what order the waiting threads will acquire the lock and run. (At least, I'm not aware of anything imposing an order.) Thus, it is quite possible that Worklet3 will wake up first. It will see that array is no longer locked (because Worklet1 has released it and Worklet2 has not had a chance to claim it).

Oops. Now Worklet3 is operating on array before Worklet2 has had a chance to put the correct values in it. The results will be wrong.

Queuing requests

What we want is to impose the restriction that locks to an ArrayHandle get resolved in the order that they are requested. In the previous example, we have 3 requests on an array that happen in a known order. We want control given to them in the same order.

To implement this, we need to impose another restriction on the condition_variable when waiting to read or write. We want the lock to go to the thread that first started waiting. To do this, we added an internal queue of Tokens to the ArrayHandle.

In ArrayHandle::WaitToRead and ArrayHandle::WaitToWrite, it first adds its Token to the back of the queue before waiting on the condition variable. In the CanRead and CanWrite methods, it checks this queue to see if the provided Token is at the front. If not, then the lock is denied and the thread must continue to wait.

Early enqueuing

Another issue that can happen in the previous example is that as threads are spawned for the 3 different worklets, they may actually start running in an unexpected order. So the thread running Worklet3 might actually start before the other 2 and place itself in the queue first.

The solution is to add a method to ArrayHandle called Enqueue. This method takes a Token object and adds that Token to the queue. However, regardless of where the Token ends up on the queue, the method immediately returns. It does not attempt to lock the ArrayHandle.

So now we can ensure that Worklet1 properly locks array with this sequence of events. First, the main thread calls array.Enqueue. Then a thread is spawned to call PrepareForOutput.

Even if control returns to the calling code and it calls invoke for Worklet2 before this spawned thread starts, Worklet2 cannot start first. When PrepareForInput is called on array, it is queued after the Token for Worklet1, even if Worklet1 has not started waiting on the array.

Improvements to moving data into ArrayHandle

We have made several improvements to adding data into an ArrayHandle.

Moving data from an std::vector

For numerous reasons, it is convenient to define data in a std::vector and then wrap that into an ArrayHandle. There are two obvious ways to do this. First, you could deep copy the data into an ArrayHandle, which has obvious drawbacks. Second, you could take the pointer for the data in the std::vector and use that as user-allocated memory in the ArrayHandle without deep copying it. The problem with this shallow copy is that it is unsafe. If the std::vector goes out of scope (or gets resized), then the data the ArrayHandle is pointing to becomes unallocated, which will lead to unpredictable behavior.

However, there is a third option. It is often the case that an std::vector is filled and then becomes unused once it is converted to an ArrayHandle. In this case, what we really want is to pass the data off to the ArrayHandle so that the ArrayHandle is now managing the data and not the std::vector.

C++11 has a mechanism to do this: move semantics. You can now pass variables to functions as an "rvalue" (right-hand value). When something is passed as an rvalue, it can pull state out of that variable and move it somewhere else. std::vector implements this movement so that an rvalue can be moved to another std::vector without actually copying the data. make_ArrayHandle now also takes advantage of this feature to move rvalue std::vectors.

There is a special form of make_ArrayHandle named make_ArrayHandleMove that takes an rvalue. There is also a special overload of make_ArrayHandle itself that handles an rvalue vector. (However, using the explicit move version is better if you want to make sure the data is actually moved.)

So if you create the std::vector in the call to make_ArrayHandle, then the data only gets created once.

auto array = vtkm::cont::make_ArrayHandleMove(std::vector<vtkm::Id>{ 2, 6, 1, 7, 4, 3, 9 });

Note that there is now a better way to express an initializer list to ArrayHandle documented below. But this form of ArrayHandleMove can be particularly useful for initializing an array to all of a particular value. For example, an easy way to initialize an array of 1000 elements all to 1 is

auto array = vtkm::cont::make_ArrayHandleMove(std::vector<vtkm::Id>(1000, 1));

You can also move the data from an already created std::vector by using the std::move function to convert it to an rvalue. When you do this, the std::vector becomes invalid after the call and any use will be undefined.

std::vector<vtkm::Id> vector;
// fill vector

auto array = vtkm::cont::make_ArrayHandleMove(std::move(vector));

Make ArrayHandle from initalizer list

A common use case for using std::vector (particularly in our unit tests) is to quickly add an initalizer list into an ArrayHandle. Repeating the example from above:

auto array = vtkm::cont::make_ArrayHandleMove(std::vector<vtkm::Id>{ 2, 6, 1, 7, 4, 3, 9 });

However, creating the std::vector should be unnecessary. Why not be able to create the ArrayHandle directly from an initializer list? Now you can by simply passing an initializer list to make_ArrayHandle.

auto array = vtkm::cont::make_ArrayHandle({ 2, 6, 1, 7, 4, 3, 9 });

There is an issue here. The type here can be a little ambiguous (for humans). In this case, array will be of type vtkm::cont::ArrayHandleBasic<int>, since that is what an integer literal defaults to. This could be a problem if, for example, you want to use array as an array of vtkm::Id, which could be of type vtkm::Int64. This is easily remedied by specifying the desired value type as a template argument to make_ArrayHandle.

auto array = vtkm::cont::make_ArrayHandle<vtkm::Id>({ 2, 6, 1, 7, 4, 3, 9 });

Deprecated make_ArrayHandle with default shallow copy

For historical reasons, passing an std::vector or a pointer to make_ArrayHandle does a shallow copy (i.e. CopyFlag defaults to Off). Although more efficient, this mode is inherintly unsafe, and making it the default is asking for trouble.

To combat this, calling make_ArrayHandle without a copy flag is deprecated. In this way, if you wish to do the faster but more unsafe creation of an ArrayHandle you should explicitly express that.

This requried quite a few changes through the VTK-m source (particularly in the tests).

Similar changes to Field

vtkm::cont::Field has a make_Field helper function that is similar to make_ArrayHandle. It also features the ability to create fields from std::vectors and C arrays. It also likewise had the same unsafe behavior by default of not copying from the source of the arrays.

That behavior has similarly been depreciated. You now have to specify a copy flag.

The ability to construct a Field from an initializer list of values has also been added.

Deprecate ArrayHandleVirtualCoordinates

As we port VTK-m to more types of accelerator architectures, supporting virtual methods is becoming more problematic. Thus, we are working to back out of using virtual methods in the execution environment.

One of the most widespread users of virtual methods in the execution environment is ArrayHandleVirtual. As a first step of deprecating this class, we first deprecate the ArrayHandleVirtualCoordinates subclass.

Not surprisingly, ArrayHandleVirtualCoordinates is used directly by CoordinateSystem. The biggest change necessary was that the GetData method returned an ArrayHandleVirtualCoordinates, which obviously would not work if that class is deprecated.

An oddness about this return type is that it is quite different from the superclass's method of the same name. Rather, Field returns a VariantArrayHandle. Since this had to be corrected anyway, it was decided to change CoordinateSystem's GetData to also return a VariantArrayHandle, although its typelist is set to just vtkm::Vec3f.

To try to still support old code that expects the deprecated behavior of returning an ArrayHandleVirtualCoordinates, CoordinateSystem::GetData actually returns a "hidden" subclass of VariantArrayHandle that automatically converts itself to an ArrayHandleVirtualCoordinates. (A deprecation warning is given if this is done.)

This approach to support deprecated code is not perfect. The returned value for CoordinateSystem::GetData can only be used as an ArrayHandle if a method is directly called on it or if it is cast specifically to ArrayHandleVirtualCoordinates or its superclass. For example, if passing it to a method argument typed as vtkm::cont::ArrayHandle<T, S> where T and S are template parameters, then the conversion will fail.

To continue to support ease of use, CoordinateSystem now has a method named GetDataAsMultiplexer that returns the data as an ArrayHandleMultiplexer. This can be employed to quickly use the CoordinateSystem as an array without the overhead of a CastAndCall.

ArrayHandleDecorator Allocate and Shrink Support

ArrayHandleDecorator can now be resized when given an appropriate decorator implementation.

Since the mapping between the size of an ArrayHandleDecorator and its source ArrayHandles is not well defined, resize operations (such as Shrink and Allocate) are not defined by default, and will throw an exception if called.

However, by implementing the methods AllocateSourceArrays and/or ShrinkSourceArrays on the implementation class, resizing the decorator is allowed. These methods are passed in a new size along with each of the ArrayHandleDecorator's source arrays, allowing developers to control how the resize operation should affect the source arrays.

For example, the following decorator implementation can be used to create a resizable ArrayHandleDecorator that is implemented using two arrays, which are combined to produce values via the expression:

[decorator value i] = [source1 value i] * 10 + [source2 value i]

Implementation:

  template <typename ValueType>
  struct DecompositionDecorImpl
  {
    template <typename Portal1T, typename Portal2T>
    struct Functor
    {
      Portal1T Portal1;
      Portal2T Portal2;

      VTKM_EXEC_CONT
      ValueType operator()(vtkm::Id idx) const
      {
        return static_cast<ValueType>(this->Portal1.Get(idx) * 10 + this->Portal2.Get(idx));
      }
    };

    template <typename Portal1T, typename Portal2T>
    struct InverseFunctor
    {
      Portal1T Portal1;
      Portal2T Portal2;

      VTKM_EXEC_CONT
      void operator()(vtkm::Id idx, const ValueType& val) const
      {
        this->Portal1.Set(idx, static_cast<ValueType>(std::floor(val / 10)));
        this->Portal2.Set(idx, static_cast<ValueType>(std::fmod(val, 10)));
      }
    };

    template <typename Portal1T, typename Portal2T>
    VTKM_CONT Functor<typename std::decay<Portal1T>::type, typename std::decay<Portal2T>::type>
    CreateFunctor(Portal1T&& p1, Portal2T&& p2) const
    {
      return { std::forward<Portal1T>(p1), std::forward<Portal2T>(p2) };
    }

    template <typename Portal1T, typename Portal2T>
    VTKM_CONT InverseFunctor<typename std::decay<Portal1T>::type, typename std::decay<Portal2T>::type>
    CreateInverseFunctor(Portal1T&& p1, Portal2T&& p2) const
    {
      return { std::forward<Portal1T>(p1), std::forward<Portal2T>(p2) };
    }

    // Resize methods:
    template <typename Array1T, typename Array2T>
    VTKM_CONT
    void AllocateSourceArrays(vtkm::Id numVals, Array1T&& array1, Array2T&& array2) const
    {
      array1.Allocate(numVals);
      array2.Allocate(numVals);
    }

    template <typename Array1T, typename Array2T>
    VTKM_CONT
    void ShrinkSourceArrays(vtkm::Id numVals, Array1T&& array1, Array2T&& array2) const
    {
      array1.Shrink(numVals);
      array2.Shrink(numVals);
    }
  };

  // Usage:
  vtkm::cont::ArrayHandle<ValueType> a1;
  vtkm::cont::ArrayHandle<ValueType> a2;
  auto decor = vtkm::cont::make_ArrayHandleDecorator(0, DecompositionDecorImpl<ValueType>{}, a1, a2);
  
  decor.Allocate(5);
  {
    auto decorPortal = decor.GetPortalControl();
    decorPortal.Set(0, 13);
    decorPortal.Set(1, 8);
    decorPortal.Set(2, 43);
    decorPortal.Set(3, 92);
    decorPortal.Set(4, 117);
  }

  // a1:    {   1,   0,   4,   9,   11 }
  // a2:    {   3,   8,   3,   2,    7 }
  // decor: {  13,   8,  43,  92,  117 }
 
  decor.Shrink(3);

  // a1:    {   1,   0,   4 }
  // a2:    {   3,   8,   3 }
  // decor: {  13,   8,  43 }

Portals may advertise custom iterators

The ArrayPortalToIterator utilities are used to produce STL-style iterators from vtk-m's ArrayHandle portals. By default, a facade class is constructed around the portal API, adapting it to an iterator interface.

However, some portals use iterators internally, or may be able to construct a lightweight iterator easily. For these, it is preferable to directly use the specialized iterators instead of going through the generic facade. A portal may now declare the following optional API to advertise that it has custom iterators:

struct MyPortal
{
  using IteratorType = ...; // alias to the portal's specialized iterator type
  IteratorType GetIteratorBegin(); // Return the begin iterator
  IteratorType GetIteratorEnd(); // Return the end iterator

  // ...rest of ArrayPortal API...
};

If these members are present, ArrayPortalToIterators will forward the portal's specialized iterators instead of constructing a facade. This works when using the ArrayPortalToIterators class directly, and also with the ArrayPortalToIteratorBegin and ArrayPortalToIteratorEnd convenience functions.

Redesign of ArrayHandle to access data using typeless buffers

The original implementation of ArrayHandle is meant to be very generic. To define an ArrayHandle, you actually create a Storage class that maintains the data and provides portals to access it (on the host). Because the Storage can provide any type of data structure it wants, you also need to define an ArrayTransfer that describes how to move the ArrayHandle to and from a device. It also has to be repeated for every translation unit that uses them.

This is a very powerful mechanism. However, one of the major problems with this approach is that every ArrayHandle type needs to have a separate compile path for every value type crossed with every device. Because of this limitation, the ArrayHandle for the basic storage has a special implementation that manages the actual data allocation and movement as void * arrays. In this way all the data management can be compiled once and put into the vtkm_cont library. This has dramatically improved the VTK-m compile time.

This new design replicates the basic ArrayHandle's success to all other storage types. The basic idea is to make the implementation of ArrayHandle storage slightly less generic. Instead of requiring it to manage the data it stores, it instead just builds ArrayPortals from void pointers that it is given. The management of void pointers can be done in non-templated classes that are compiled into a library.

This initial implementation does not convert all ArrayHandles to avoid making non-backward compatible changes before the next minor revision of VTK-m. In particular, it would be particularly difficult to convert ArrayHandleVirtual. It could be done, but it would be a lot of work for a class that will likely be removed.

Buffer

Key to these changes is the introduction of a vtkm::cont::internal::Buffer object. As the name implies, the Buffer object manages a single block of bytes. Buffer is agnostic to the type of data being stored. It only knows the length of the buffer in bytes. It is responsible for allocating space on the host and any devices as necessary and for transferring data among them. (Since Buffer knows nothing about the type of data, a precondition of VTK-m would be that the host and all devices have to have the same endian.)

The idea of the Buffer object is similar in nature to the existing vtkm::cont::internal::ExecutionArrayInterfaceBasicBase except that it will manage a buffer of data among the control and all devices rather than in one device through a templated subclass.

As explained below, ArrayHandle holds some fixed number of Buffer objects. (The number can be zero for implicit ArrayHandles.) Because all the interaction with the devices happen through Buffer, it will no longer be necessary to compile any reference to ArrayHandle for devices (e.g. you won't have to use nvcc just because the code links ArrayHandle.h).

Storage

The vtkm::cont::internal::Storage class changes dramatically. Although an instance will be kept, the intention is for Storage itself to be a stateless object. It will manage its data through Buffer objects provided from the ArrayHandle.

That said, it is possible for Storage to have some state. For example, the Storage for ArrayHandleImplicit must hold on to the instance of the portal used to manage the state.

ArrayTransport

The vtkm::cont::internal::ArrayTransfer class will be removed completely. All data transfers will be handled internally with the Buffer object

Portals

A big change for this design is that the type of a portal for an ArrayHandle will be the same for all devices and the host. Thus, we no longer need specialized versions of portals for each device. We only have one portal type. And since they are constructed from void * pointers, one method can create them all.

Advantages

The ArrayHandle interface should not change significantly for external uses, but this redesign offers several advantages.

Faster Compiles

Because the memory management is contained in a non-templated Buffer class, it can be compiled once in a library and used by all template instances of ArrayHandle. It should have similar compile advantages to our current specialization of the basic ArrayHandle, but applied to all types of ArrayHandles.

Fewer Templates

Hand-in-hand with faster compiles, the new design should require fewer templates and template instances. We have immediately gotten rid of ArrayTransport. Storage is also much shorter. Because all ArrayPortals are the same for every device and the host, we need many fewer versions of those classes. In the device adapter, we can probably collapse the three ArrayManagerExecution classes into a single, much simpler class that does simple memory allocation and copy.

Fewer files need to be compiled for CUDA

Including ArrayHandle.h no longer adds code that compiles for a device. Thus, we should no longer need to compile for a specific device adapter just because we access an ArrayHandle. This should make it much easier to achieve our goal of a "firewall". That is, code that just calls VTK-m filters does not need to support all its compilers and flags.

Simpler ArrayHandle specialization

The newer code should simplify the implementation of special ArrayHandles a bit. You need only implement an ArrayPortal that operates on one or more void * arrays and a simple Storage class.

Out of band memory sharing

With the current version of ArrayHandle, if you want to take data from one ArrayHandle you pretty much have to create a special template to wrap another ArrayHandle around that. With this new design, it is possible to take data from one ArrayHandle and give it to another ArrayHandle of a completely different type. You can't do this willy-nilly since different ArrayHandle types will interpret buffers differently. But there can be some special important use cases.

One such case could be an ArrayHandle that provides strided access to a buffer. (Let's call it ArrayHandleStride.) The idea is that it interprets the buffer as an array for a particular type (like a basic ArrayHandle) but also defines a stride, skip, and repeat so that given an index it looks up the value ((index / skip) % repeat) * stride. The point is that it can take an AoS array of tuples and represent an array of one of the components.

The point would be that if you had a VariantArrayHandle or Field, you could pull out an array of one of the components as an ArrayHandleStride. An ArrayHandleStride<vtkm::Float32> could be used to represent that data that comes from any basic ArrayHandle with vtkm::Float32 or a vtkm::Vec of that type. It could also represent data from an ArrayHandleCartesianProduct and ArrayHandleSoA. We could even represent an ArrayHandleUniformPointCoordinates by just making a small array. This allows us to statically access a whole bunch of potential array storage classes with a single type.

Potentially faster device transfers

There is currently a fast-path for basic ArrayHandles that does a block cuda memcpy between host and device. But for other ArrayHandles that do not defer their ArrayTransfer to a sub-array, the transfer first has to copy the data into a known buffer.

Because this new design stores all data in Buffer objects, any of these can be easily and efficiently copied between devices.

Disadvantages

This new design gives up some features of the original ArrayHandle design.

Can only interface data that can be represented in a fixed number of buffers

Because the original ArrayHandle design required the Storage to completely manage the data, it could represent it in any way possible. In this redesign, the data need to be stored in some fixed number of memory buffers.

This is a pretty open requirement. I suspect most data formats will be storable in this. The user's guide has an example of data stored in a std::deque that will not be representable. But that is probably not a particularly practical example.

VTK-m would only be able to support hosts and devices with the same endian

Because data are transferred as void * blocks of memory, there is no way to correct words if the endian on the two devices does not agree. As far as I know, there should be no issues with the proposed ECP machines.

If endian becomes an issue, it might be possible to specify a word length in the Buffer. That would assume that all numbers stored in the Buffer have the same word length.

ArrayPortals must be completely recompiled in each translation unit

We can declare that an ArrayHandle does not need to include the device adapter header files in part because it no longer needs specialized ArrayPortals for each device. However, that means that a translation unit compiled with the host compiler (say gcc) will produce different code for the ArrayPortals than those with the device compiler (say nvcc). This could lead to numerous linking problems.

To get around these issues, we will probably have to enforce no exporting of any of the ArrayPotal symbols and force them all to be recompiled for each translation unit. This will serve to increase the compile times a bit. We will probably also still encounter linking errors as there would be no way to enforce this requirement.

Cannot have specialized portals for the control environment

Because the new design unifies ArrayPortal types across control and execution environments, it is no longer possible to have a special version for the control environment to manage resources. This will require removing some recent behavior of control portals such as with MR !1988.

ArrayRangeCompute works on any array type without compiling device code

Originally, ArrayRangeCompute required you to know specifically the ArrayHandle type (value type and storage type) and to compile using any device compiler. The method is changed to include only overloads that have precompiled versions of ArrayRangeCompute.

Additionally, an ArrayRangeCompute overload that takes an UnknownArrayHandle has been added. In addition to allowing you to compute the range of arrays of unknown types, this implementation of ArrayRangeCompute serves as a fallback for ArrayHandle types that are not otherwise explicitly supported.

If you really want to make sure that you compute the range directly on an ArrayHandle of a particular type, you can include ArrayRangeComputeTemplate.h, which contains a templated overload of ArrayRangeCompute that directly computes the range of an ArrayHandle. Including this header requires compiling for device code.

Implemented ArrayHandleRandomUniformBits and ArrayHandleRandomUniformReal

ArrayHandleRandomUniformBits and ArrayHandleRandomUniformReal were added to provide an efficient way to generate pseudo random numbers in parallel. They are based on the Philox parallel pseudo random number generator. ArrayHandleRandomUniformBits provides 64-bits random bits in the whole range of UInt64 as its content while ArrayHandleRandomUniformReal provides random Float64 in the range of [0, 1). User can either provide a seed in the form of Vec<vtkm::Uint32, 1> or use the default random source provided by the C++ standard library. Both of the ArrayHandles are lazy evaluated as other Fancy ArrayHandles such that they only have O(1) memory overhead. They are stateless and functional and does not change once constructed. To generate a new set of random numbers, for example as part of a iterative algorithm, a new ArrayHandle needs to be constructed in each iteration. See the user's guide for more detail and examples.

Extract component arrays from unknown arrays

One of the problems with the data structures of VTK-m is that non-templated classes like DataSet, Field, and UnknownArrayHandle (formally VariantArrayHandle) internally hold an ArrayHandle of a particular type that has to be cast to the correct task before it can be reasonably used. That in turn is problematic because the list of possible ArrayHandle types is very long.

At one time we were trying to compensate for this by using ArrayHandleVirtual. However, for technical reasons this class is infeasible for every use case of VTK-m and has been deprecated. Also, this was only a partial solution since using it still required different code paths for, say, handling values of vtkm::Float32 and vtkm::Vec3f_32 even though both are essentially arrays of 32-bit floats.

The extract component feature compensates for this problem by allowing you to extract the components from an ArrayHandle. This feature allows you to create a single code path to handle ArrayHandles containing scalars or vectors of any size. Furthermore, when you extract a component from an array, the storage gets normalized so that one code path covers all storage types.

ArrayExtractComponent

The basic enabling feature is a new function named ArrayExtractComponent. This function takes takes an ArrayHandle and an index to a component. It then returns an ArrayHandleStride holding the selected component of each entry in the original array.

We will get to the structure of ArrayHandleStride later. But the important part is that ArrayHandleStride does not depend on the storage type of the original ArrayHandle. That means whether you extract a component from ArrayHandleBasic, ArrayHandleSOA, ArrayHandleCartesianProduct, or any other type, you get back the same ArrayHandleStride. Likewise, regardless of whether the input ArrayHandle has a ValueType of FloatDefault, Vec2f, Vec3f, or any other Vec of a default float, you get the same ArrayHandleStride. Thus, you can see how this feature can dramatically reduce code paths if used correctly.

It should be noted that ArrayExtractComponent will (logically) flatten the ValueType before extracting the component. Thus, nested Vecs such as Vec<Vec3f, 3> will be treated as a Vec<FloatDefault, 9>. The intention is so that the extracted component will always be a basic C type. For the purposes of this document when we refer to the "component type", we really mean the base component type.

Different ArrayHandle implementations provide their own implementations for ArrayExtractComponent so that the component can be extracted without deep copying all the data. We will visit how ArrayHandleStride can represent different data layouts later, but first let's go into the main use case.

Extract components from UnknownArrayHandle

The principle use case for ArrayExtractComponent is to get an ArrayHandle from an unknown array handle without iterating over every possible type. (Rather, we iterate over a smaller set of types.) To facilitate this, an ExtractComponent method has been added to UnknownArrayHandle.

To use UnknownArrayHandle::ExtractComponent, you must give it the component type. You can check for the correct component type by using the IsBaseComponentType method. The method will then return an ArrayHandleStride for the component type specified.

Example

As an example, let's say you have a worklet, FooWorklet, that does some per component operation on an array. Furthermore, let's say that you want to implement a function that, to the best of your ability, can apply FooWorklet on an array of any type. This function should be pre-compiled into a library so it doesn't have to be compiled over and over again. (MapFieldPermutation and MapFieldMergeAverage are real and important examples that have this behavior.)

Without the extract component feature, the implementation might look something like this (many practical details left out):

struct ApplyFooFunctor
{
  template <typename ArrayType>
  void operator()(const ArrayType& input, vtkm::cont::UnknownArrayHandle& output) const
  {
    ArrayType outputArray;
	vtkm::cont::Invoke invoke;
	invoke(FooWorklet{}, input, outputArray);
	output = outputArray;
  }
};

vtkm::cont::UnknownArrayHandle ApplyFoo(const vtkm::cont::UnknownArrayHandle& input)
{
  vtkm::cont::UnknownArrayHandle output;
  input.CastAndCallForTypes<vtkm::TypeListAll, VTKM_DEFAULT_STORAGE_LIST_TAG>(
    ApplyFooFunctor{}, output);
  return output;
}

Take a look specifically at the CastAndCallForTypes call near the bottom of this example. It calls for all types in vtkm::TypeListAll, which is about 40 instances. Then, it needs to be called for any type in the desired storage list. This could include basic arrays, SOA arrays, and lots of other specialized types. It would be expected for this code to generate over 100 paths for ApplyFooFunctor. This in turn contains a worklet invoke, which is not a small amount of code.

Now consider how we can use the ExtractComponent feature to reduce the code paths:

struct ApplyFooFunctor
{
  template <typename T>
  void operator()(T,
                  const vtkm::cont::UnknownArrayHandle& input, 
				  cont vtkm::cont::UnknownArrayHandle& output) const
  {
    if (!input.IsBasicComponentType<T>()) { return; }
	VTKM_ASSERT(output.IsBasicComponentType<T>());

	vtkm::cont::Invoke invoke;
	invoke(FooWorklet{}, input.ExtractComponent<T>(), output.ExtractComponent<T>());
  }
};

vtkm::cont::UnknownArrayHandle ApplyFoo(const vtkm::cont::UnknownArrayHandle& input)
{
  vtkm::cont::UnknownArrayHandle output = input.NewInstanceBasic();
  output.Allocate(input.GetNumberOfValues());
  vtkm::cont::ListForEach(ApplyFooFunctor{}, vtkm::TypeListScalarAll{}, input, output);
  return output;
}

The number of lines of code is about the same, but take a look at the ListForEach (which replaces the CastAndCallForTypes). This calling code takes TypeListScalarAll instead of TypeListAll, which reduces the instances created from around 40 to 13 (every basic C type). It is also no longer dependent on the storage, so these 13 instances are it. As an example of potential compile savings, changing the implementation of the MapFieldMergePermutation and MapFieldMergeAverage functions in this way reduced the filters_common library (on Mac, Debug build) by 24 MB (over a third of the total size).

Another great advantage of this approach is that even though it takes less time to compile and generates less code, it actually covers more cases. Have an array containg values of Vec<short, 13>? No problem. The values were actually stored in an ArrayHandleReverse? It will still work.

ArrayHandleStride

This functionality is made possible with the new ArrayHandleStride. This array behaves much like ArrayHandleBasic, except that it contains an offset parameter to specify where in the buffer array to start reading and a stride parameter to specify how many entries to skip for each successive entry. ArrayHandleStride also has optional parameters divisor and modulo that allow indices to be repeated at regular intervals.

Here are how ArrayHandleStride extracts components from several common arrays. For each of these examples, we assume that the ValueType of the array is Vec<T, N>. They are each extracting component.

Extracting from ArrayHandleBasic

When extracting from an ArrayHandleBasic, we just need to start at the proper component and skip the length of the Vec.

  • offset: component
  • stride: N

Extracting from ArrayHandleSOA

Since each component is held in a separate array, they are densly packed. Each component could be represented by ArrayHandleBasic, but of course we use ArrayHandleStride to keep the type consistent.

  • offset: 0
  • stride: 1

Extracting from ArrayHandleCartesianProduct

This array is the basic reason for implementing the divisor and modulo parameters. Each of the 3 components have different parameters, which are the following (given that dims[3] captures the size of the 3 arrays for each dimension).

  • offset: 0
  • stride: 1
  • case component == 0
    • divisor: ignored
    • modulo: dims[0]
  • case component == 1
    • divisor: dims[0]
    • modulo: dims[1]
  • case component == 2
    • divisor: dims[0]
    • modulo: ignored

Extracting from ArrayHandleUniformPointCoordinates

This array cannot be represented directly because it is fully implicit. However, it can be trivially converted to ArrayHandleCartesianProduct in typically very little memory. (In fact, EAVL always represented uniform point coordinates by explicitly storing a Cartesian product.) Thus, for very little overhead the ArrayHandleStride can be created.

Runtime overhead of extracting components

These benefits come at a cost, but not a large one. The "biggest" cost is the small cost of computing index arithmetic for each access into ArrayHandleStride. To make this as efficient as possible, there are conditions that skip over the modulo and divide steps if they are not necessary. (Integer modulo and divide tend to take much longer than addition and multiplication.) It is for this reason that we probably do not want to use this method all the time.

Another cost is the fact that not every ArrayHandle can be represented by ArrayHandleStride directly without copying. If you ask to extract a component that cannot be directly represented, it will be copied into a basic array, which is not great. To make matters worse, for technical reasons this copy happens on the host rather than the device.

ArrayHandleGroupVecVariable holds now one more offset.

This change affects the usage of both ConvertNumComponentsToOffsets and make_ArrayHandleGroupVecVariable.

The reason of this change is to remove a branch in ArrayHandleGroupVecVariable::Get which is used to avoid an array overflow, this in theory would increases the performance since at the CPU level it will remove penalties due to wrong branch predictions.

The change affects ConvertNumComponentsToOffsets by both:

  1. Increasing the numbers of elements in offsetsArray (its second parameter) by one.

  2. Setting sourceArraySize as the sum of all the elements plus the new one in offsetsArray

Note that not every specialization of ConvertNumComponentsToOffsets does return offsetsArray. Thus, some of them would not be affected.

Similarly, this change affects make_ArrayHandleGroupVecVariable since it expects its second parameter (offsetsArray) to be one element bigger than before.

Control Environment

Algorithms for Control and Execution Environments

The <vtkm/Algorithms.h> header has been added to provide common STL-style generic algorithms that are suitable for use in both the control and execution environments. This is necessary as the STL algorithms in the <algorithm> header are not marked up for use in execution environments such as CUDA.

In addition to the markup, these algorithms have convenience overloads to support ArrayPortals directly, simplifying their usage with VTK-m data structures.

Currently, three related algorithms are provided: LowerBounds, UpperBounds, and BinarySearch. BinarySearch differs from the STL std::binary_search algorithm in that it returns an iterator (or index) to a matching element, rather than just a boolean indicating whether a or not key is present.

The new algorithm signatures are:

namespace vtkm
{

template <typename IterT, typename T, typename Comp>
VTKM_EXEC_CONT 
IterT BinarySearch(IterT first, IterT last, const T& val, Comp comp);

template <typename IterT, typename T>
VTKM_EXEC_CONT 
IterT BinarySearch(IterT first, IterT last, const T& val);

template <typename PortalT, typename T, typename Comp>
VTKM_EXEC_CONT 
vtkm::Id BinarySearch(const PortalT& portal, const T& val, Comp comp);

template <typename PortalT, typename T>
VTKM_EXEC_CONT 
vtkm::Id BinarySearch(const PortalT& portal, const T& val);

template <typename IterT, typename T, typename Comp>
VTKM_EXEC_CONT 
IterT LowerBound(IterT first, IterT last, const T& val, Comp comp);

template <typename IterT, typename T>
VTKM_EXEC_CONT 
IterT LowerBound(IterT first, IterT last, const T& val);

template <typename PortalT, typename T, typename Comp>
VTKM_EXEC_CONT 
vtkm::Id LowerBound(const PortalT& portal, const T& val, Comp comp);

template <typename PortalT, typename T>
VTKM_EXEC_CONT 
vtkm::Id LowerBound(const PortalT& portal, const T& val);

template <typename IterT, typename T, typename Comp>
VTKM_EXEC_CONT 
IterT UpperBound(IterT first, IterT last, const T& val, Comp comp);

template <typename IterT, typename T>
VTKM_EXEC_CONT 
IterT UpperBound(IterT first, IterT last, const T& val);

template <typename PortalT, typename T, typename Comp>
VTKM_EXEC_CONT 
vtkm::Id UpperBound(const PortalT& portal, const T& val, Comp comp);

template <typename PortalT, typename T>
VTKM_EXEC_CONT 
vtkm::Id UpperBound(const PortalT& portal, const T& val);

}

Execution Environment

Scope ExecObjects with Tokens

When VTK-m's ArrayHandle was originally designed, it was assumed that the control environment would run on a single thread. However, multiple users have expressed realistic use cases in which they would like to control VTK-m from multiple threads (for example, to control multiple devices). Consequently, it is important that VTK-m's control classes work correctly when used simultaneously from multiple threads.

The original PrepareFor* methods of ArrayHandle returned an object to be used in the execution environment on a particular device that pointed to data in the array. The pointer to the data was contingent on the state of the ArrayHandle not changing. The assumption was that the calling code would immediately use the returned execution environment object and would not further change the ArrayHandle until done with the execution environment object.

This assumption is broken if multiple threads are running in the control environment. For example, if one thread has called PrepareForInput to get an execution array portal, the portal or its data could become invalid if another thread calls PrepareForOutput on the same array. Initially one would think that a well designed program should not share ArrayHandles in this way, but there are good reasons to need to do so. For example, when using vtkm::cont::PartitionedDataSet where multiple partitions share a coordinate system (very common), it becomes unsafe to work on multiple blocks in parallel on different devices.

What we really want is the code to be able to specify more explicitly when the execution object is in use. Ideally, the execution object itself would maintain the resources it is using. However, that will not work in this case since the object has to pass from control to execution environment and back. The resource allocation will break when the object is passed to an offloaded device and back.

Because we cannot use the object itself to manage its own resources, we use a proxy object we are calling a Token. The Token object manages the scope of the return execution object. As long as the Token is still in scope, the execution object will remain valid. When the Token is destroyed (or DetachFromAll is called on it), then the execution object is no longer protected.

When a Token is attached to an ArrayHandle to protect an execution object, it's read or write mode is recorded. Multiple Tokens can be attached to read the ArrayHandle at the same time. However, only one Token can be used to write to the ArrayHandle.

Basic ArrayHandle use

The basic use of the PrepareFor* methods of ArrayHandle remain the same. The only difference is the addition of a Token parameter.

template <typename Device>
void LowLevelArray(vtkm::cont::ArrayHandle<vtkm::Float32> array, Device)
{
  vtkm::cont::Token token;
  auto portal = array.PrepareForOutput(ARRAY_SIZE, Device{}, token);
  // At this point, array is locked from anyone else from reading or modifying
  vtkm::cont::DeviceAdapterAlgorithm<Device>::Schedule(MyKernel(portal), ARRAY_SIZE);

  // When the function finishes, token goes out of scope and array opens up
  // for other uses.
}

Execution objects

To make sure that execution objects are scoped correctly, many changes needed to be made to propagate a Token reference from the top of the scope to where the execution object is actually made. The most noticeable place for this was for implementations of vtkm::cont::ExecutionObjectBase. Most implementations of ExecutionObjectBase create an object that requires data from an ArrayHandle.

Previously, a subclass of ExecutionObjectBase was expected to have a method named PrepareForExecution that had a single argument: the device tag (or id) to make an object for. Now, subclasses of ExecutionObjectBase should have a PrepareForExecution that takes two arguments: the device and a Token to use for scoping the execution object.

struct MyExecObject : vtkm::cont::ExecutionObjectBase
{
  vtkm::cont::ArrayHandle<vtkm::Float32> Array;
  
  template <typename Device>
  VTKM_CONT
  MyExec<Device> PrepareForExecution(Device device, vtkm::cont::Token& token)
  {
    MyExec<Device> object;
	object.Portal = this->Array.PrepareForInput(device, token);
	return object;
  }
};

It actually still works to use the old style of PrepareForExecution. However, you will get a deprecation warning (on supported compilers) when you try to use it.

Invoke and Dispatcher

The Dispatcher classes now internally define a Token object during the call to Invoke. (Likewise, Invoker will have a Token defined during its invoke.) This internal Token is used when preparing ArrayHandles and ExecutionObjects for the execution environment. (Details in the next section on how that works.)

Because the invoke uses a Token to protect its arguments, it will block the execution of other worklets attempting to access arrays in a way that could cause read-write hazards. In the following example, the second worklet will not be able to execute until the first worklet finishes.

vtkm::cont::Invoker invoke;
invoke(Worklet1{}, input, intermediate);
invoke(Worklet2{}, intermediate, output); // Will not execute until Worklet1 finishes.

That said, invocations can share arrays if their use will not cause read-write hazards. In particular, two invocations can both use the same array if they are both strictly reading from it. In the following example, both worklets can potentially execute at the same time.

vtkm::cont::Invoker invoke;
invoke(Worklet1{}, input, output1);
invoke(Worklet2{}, input, output2); // Will not block

The same Token is used for all arguments to the Worklet. This deatil is important to prevent deadlocks if the same object is used in more than one Worklet parameter. As a simple example, if a Worklet has a control signature like

  using ControlSignature = void(FieldIn, FieldOut);

it should continue to work to use the same array as both fields.

vtkm::cont::Invoker invoke;
invoke(Worklet1{}, array, array);

Transport

The dispatch mechanism of worklets internally uses vtkm::cont::arg::Transport objects to automatically move data from the control environment to the execution environment. These Transport object now take a Token when doing the transportation. This all happens under the covers for most users.

Control Portals

The GetPortalConstControl and GetPortalControl methods have been deprecated. Instead, the methods ReadPortal and WritePortal should be used. The calling signature is the same as their predecessors, but the returned portal contains a reference back to the original ArrayHandle. The reference keeps track of whether the memory allocation has changed.

If the ArrayHandle is changed while the ArrayPortal still exists, nothing will happen immediately. However, if the portal is subsequently accessed (i.e. Set or Get is called on it), then a fatal error will be reported to the log.

Deadlocks

Now that portal objects from ArrayHandles have finite scope (as opposed to able to be immediately invalidated), the scopes have the ability to cause operations to block. This can cause issues if the ArrayHandle is attempted to be used by multiple Tokens at once.

The following is a contrived example of causing a deadlock.

vtkm::cont::Token token1;
auto portal1 = array.PrepareForInPlace(Device{}, token1);

vtkm::cont::Token token2;
auto portal2 = array.PrepareForInput(Device{}, token2);

The last line will deadlock as PrepareForInput waits for token1 to detach, which will never happen. To prevent this from happening, if you use the same Token on the array, it will always allow the action. Thus, the following will work fine.

vtkm::cont::Token token;

auto portal1 = array.PrepareForInPlace(Device{}, token);
auto portal2 = array.PrepareForInput(Device{}, token);

This prevents deadlock during the invocation of a worklet (so long as no intermediate object tries to create its own Token, which would be bad practice).

Deadlocks are more likely when actually running multiple threads in the control environment, but still pretty unlikely. One way it can occur is if you have one (or more) worklet that has two output fields. You then try to run the worklet(s) simultaneously on multiple threads. It could be that one thread locks the first output array and the other thread locks the second output array.

However, having multiple threads trying to write to the same output arrays at the same time without its own coordination is probably a bad idea in itself.

Masks and Scatters Supported for 3D Scheduling

Previous to this change worklets that wanted to use non-default vtkm::worklet::Mask or vtkm::worklet::Scatter wouldn't work when scheduled to run across vtkm::cont::CellSetStructured or other InputDomains that supported 3D scheduling.

This restriction was an inadvertent limitation of the VTK-m worklet scheduling algorithm. Lifting the restriction and providing sufficient information has been achieved in a manner that shouldn't degrade performance of any existing worklets.

Virtual methods in execution environment deprecated

The use of classes with any virtual methods in the execution environment is deprecated. Although we had code to correctly build virtual methods on some devices such as CUDA, this feature was not universally supported on all programming models we wish to support. Plus, the implementation of virtual methods is not hugely convenient on CUDA because the virtual methods could not be embedded in a library. To get around virtual methods declared in different libraries, all builds had to be static, and a special linking step to pull in possible virtual method implementations was required.

For these reasons, VTK-m is no longer relying on virtual methods. (Other approaches like multiplexers are used instead.) The code will be officially removed in version 2.0. It is still supported in a deprecated sense (you should get a warning). However, if you want to build without virtual methods, you can set the VTKm_NO_DEPRECATED_VIRTUAL CMake flag, and they will not be compiled.

Deprecate Execute with policy

The version of Filter::Execute that takes a policy as an argument is now deprecated. Filters are now able to specify their own fields and types, which is often why you want to customize the policy for an execution. The other reason is that you are compiling VTK-m into some other source that uses a particular types of storage. However, there is now a mechanism in the CMake configuration to allow you to provide a header that customizes the "default" types used in filters. This is a much more convenient way to compile filters for specific types.

One thing that filters were not able to do was to customize what cell sets they allowed using. This allows filters to self-select what types of cell sets they support (beyond simply just structured or unstructured). To support this, the lists SupportedCellSets, SupportedStructuredCellSets, and SupportedUnstructuredCellSets have been added to Filter. When you apply a policy to a cell set, you now have to also provide the filter.

Worklets and Filters

Enable setting invalid value in probe filter

Initially, the probe filter would simply not set a value if a sample was outside the input DataSet. This is not great as the memory could be left uninitalized and lead to unpredictable results. The testing compared these invalid results to 0, which seemed to work but is probably unstable.

This was partially fixed by a previous change that consolidated to mapping of cell data with a general routine that permuted data. However, the fix did not extend to point data in the input, and it was not possible to specify a particular invalid value.

This change specifically updates the probe filter so that invalid values are set to a user-specified value.

Avoid raising errors when operating on cells

Cell operations like interpolate and finding parametric coordinates can fail under certain conditions. The previous behavior was to call RaiseError on the worklet. By design, this would cause the worklet execution to fail. However, that makes the worklet unstable for a conditin that might be relatively common in data. For example, you wouldn't want a large streamline worklet to fail just because one cell was not found correctly.

To work around this, many of the cell operations in the execution environment have been changed to return an error code rather than raise an error in the worklet.

Error Codes

To support cell operations efficiently returning errors, a new enum named vtkm::ErrorCode is available. This is the current implementation of ErrorCode.

enum class ErrorCode
{
  Success,
  InvalidShapeId,
  InvalidNumberOfPoints,
  WrongShapeIdForTagType,
  InvalidPointId,
  InvalidEdgeId,
  InvalidFaceId,
  SolutionDidNotConverge,
  MatrixFactorizationFailed,
  DegenerateCellDetected,
  MalformedCellDetected,
  OperationOnEmptyCell,
  CellNotFound,

  UnknownError
};

A convenience function named ErrorString is provided to make it easy to convert the ErrorCode to a descriptive string that can be placed in an error.

New Calling Specification

Previously, most execution environment functions took as an argument the worklet calling the function. This made it possible to call RaiseError on the worklet. The result of the operation was typically returned. For example, here is how the old version of interpolate was called.

FieldType interpolatedValue =
  vtkm::exec::CellInterpolate(fieldValues, pcoord, shape, worklet);

The worklet is now no longer passed to the function. It is no longer needed because an error is never directly raised. Instead, an ErrorCode is returned from the function. Because the ErrorCode is returned, the computed result of the function is returned by passing in a reference to a variable. This is usually placed as the last argument (where the worklet used to be). here is the new version of how interpolate is called.

FieldType interpolatedValue;
vtkm::ErrorCode result =
  vtkm::exec::CellInterpolate(fieldValues, pcoord, shape, interpolatedValue);

The success of the operation can be determined by checking that the returned ErrorCode is equal to vtkm::ErrorCode::Success.

Add atomic free functions

Previously, all atomic functions were stored in classes named AtomicInterfaceControl and AtomicInterfaceExecution, which required you to know at compile time which device was using the methods. That in turn means that anything using an atomic needed to be templated on the device it is running on.

That can be a big hassle (and is problematic for some code structure). Instead, these methods are moved to free functions in the vtkm namespace. These functions operate like those in Math.h. Using compiler directives, an appropriate version of the function is compiled for the current device the compiler is using.

Flying Edges

Added the flying edges contouring algorithm to VTK-m. This algorithm only works on structured grids, but operates much faster than the traditional Marching Cubes algorithm.

The speed of VTK-m's flying edges is comprable to VTK's running on the same CPUs. VTK-m's implementation also works well on CUDA hardware.

The Flying Edges algorithm was introduced in this paper:

Schroeder, W.; Maynard, R. & Geveci, B. "Flying edges: A high-performance scalable isocontouring algorithm." Large Data Analysis and Visualization (LDAV), 2015. DOI 10.1109/LDAV.2015.7348069

Filters specify their own field types

Previously, the policy specified which field types the filter should operate on. The filter could remove some types, but it was not able to add any types.

This is backward. Instead, the filter should specify what types its supports and the policy may cull out some of those.

Build

Disable asserts for CUDA architecture builds

assert is supported on recent CUDA cards, but compiling it appears to be very slow. By default, the VTKM_ASSERT macro has been disabled whenever compiling for a CUDA device (i.e. when __CUDA_ARCH__ is defined).

Asserts for CUDA devices can be turned back on by turning the VTKm_NO_ASSERT_CUDA CMake variable off. Turning this CMake variable off will enable assertions in CUDA kernels unless there is another reason turning off all asserts (such as a release build).

Disable asserts for HIP architecture builds

assert is supported on recent HIP cards, but compiling it is very slow, as it triggers the usage of printf which. Currently (ROCm 3.7) printf has a severe performance penalty and should be avoided when possible. By default, the VTKM_ASSERT macro has been disabled whenever compiling for a HIP device via kokkos.

Asserts for HIP devices can be turned back on by turning the VTKm_NO_ASSERT_HIP CMake variable off. Turning this CMake variable off will enable assertions in HIP kernels unless there is another reason turning off all asserts (such as a release build).

Add VTKM_DEPRECATED macro

The VTKM_DEPRECATED macro allows us to remove (and usually replace) features from VTK-m in minor releases while still following the conventions of semantic versioning. The idea is that when we want to remove or replace a feature, we first mark the old feature as deprecated. The old feature will continue to work, but compilers that support it will start to issue a warning that the use is deprecated and should stop being used. The deprecated features should remain viable until at least the next major version. At the next major version, deprecated features from the previous version may be removed.

Declaring things deprecated

Classes and methods are marked deprecated using the VTKM_DEPRECATED macro. The first argument of VTKM_DEPRECATED should be set to the first version in which the feature is deprecated. For example, if the last released version of VTK-m was 1.5, and on the master branch a developer wants to deprecate a class foo, then the VTKM_DEPRECATED release version should be given as 1.6, which will be the next minor release of VTK-m. The second argument of VTKM_DEPRECATED, which is optional but highly encouraged, is a short message that should clue developers on how to update their code to the new changes. For example, it could point to the replacement class or method for the changed feature.

VTKM_DEPRECATED can be used to deprecate a class by adding it between the struct or class keyword and the class name.

struct VTKM_DEPRECATED(1.6, "OldClass replaced with NewClass.") OldClass
{
};

Aliases can similarly be depreciated, except the VTKM_DEPRECATED macro goes after the name in this case.

using OldAlias VTKM_DEPRECATED(1.6, "Use NewClass instead.") = NewClass;

Functions and methods are marked as deprecated by adding VTKM_DEPRECATED as a modifier before the return value and any markup (VTKM_CONT, VTKM_EXEC, or VTKM_EXEC_CONT).

VTKM_DEPRECATED(1.6, "You must now specify a tolerance.") void ImportantMethod(double x)
VTKM_EXEC_CONT
{
  this->ImportantMethod(x, 1e-6);
}

enums can be deprecated like classes using similar syntax.

enum struct VTKM_DEPRECATED(1.7, "Use NewEnum instead.") OldEnum
{
  OLD_VALUE
};

Individual items in an enum can also be marked as deprecated and intermixed with regular items.

enum struct NewEnum
{
  OLD_VALUE1 VTKM_DEPRECATED(1.7, "Use NEW_VALUE instead."),
  NEW_VALUE,
  OLD_VALUE2 VTKM_DEPRECATED(1.7) = 42
};

Using deprecated items

Using deprecated items should work, but the compiler will give a warning. That is the point. However, sometimes you need to legitimately use a deprecated item without a warning. This is usually because you are implementing another deprecated item or because you have a test for a deprecated item (that can be easily removed with the deprecated bit). To support this a pair of macros, VTKM_DEPRECATED_SUPPRESS_BEGIN and VTKM_DEPRECATED_SUPPRESS_END are provided. Code that legitimately uses deprecated items should be wrapped in these macros.

VTKM_DEPRECATED(1.6, "You must now specify both a value and tolerance.")
VTKM_EXEC_CONT
void ImportantMethod()
{
  // It can be the case that to implement a deprecated method you need to
  // use other deprecated features. To do that, just temporarily suppress
  // those warnings.
  VTKM_DEPRECATED_SUPPRESS_BEGIN
  this->ImportantMethod(0.0);
  VTKM_DEPRECATED_SUPPRESS_END
}

Other

Porting layer for future std features

Currently, VTK-m is using C++11. However, it is often useful to use features in the std namespace that are defined for C++14 or later. We can provide our own versions (sometimes), but it is preferable to use the version provided by the compiler if available.

There were already some examples of defining portable versions of C++14 and C++17 classes in a vtkmstd namespace, but these were sprinkled around the source code.

There is now a top level vtkmstd directory and in it are header files that provide portable versions of these future C++ classes. In each case, preprocessor macros are used to select which version of the class to use.

Removed OpenGL Rendering Classes

When the rendering library was first built, OpenGL was used to implement the components (windows, mappers, annotation, etc.). However, as the native ray casting became viable, the majority of the work has focused on using that. Since then, the original OpenGL classes have been largely ignored.

It has for many months been determined that it is not work attempting to maintain two different versions of the rendering libraries as features are added and changed. Thus, the OpenGL classes have fallen out of date and did not actually work.

These classes have finally been officially removed.

Reorganization of io directory

The vtkm/io directory has been flattened. Namely, the files in vtkm/io/reader and vtkm/io/writer have been moved up into vtkm/io, with the associated changes in namespaces.

In addition, vtkm/cont/EncodePNG.h and vtkm/cont/DecodePNG.h have been moved to a more natural home in vtkm/io.

Implemented PNG/PPM image Readers/Writers

The original implementation of writing image data was only performed as a proxy through the Canvas rendering class. In order to implement true support for image-based regression testing, this interface needed to be expanded upon to support reading/writing arbitrary image data and storing it in a vtkm::DataSet. Using the new vtkm::io::PNGReader and vtkm::io::PPMReader it is possible to read data from files and Cavases directly and store them as a point field in a 2D uniform vtkm::DataSet

auto reader = vtkm::io::PNGReader();
auto imageDataSet = reader.ReadFromFile("read_image.png");

Similarly, the new vtkm::io::PNGWriter and vtkm::io::PPMWriter make it possible to write out a 2D uniform vtkm::DataSet directly to a file.

auto writer = vtkm::io::PNGWriter();
writer.WriteToFile("write_image.png", imageDataSet);

If canvas data is to be written out, the reader provides a method for converting a canvas's data to a vtkm::DataSet.

auto reader = vtkm::io::PNGReader();
auto dataSet = reader.CreateImageDataSet(canvas);
auto writer = vtkm::io::PNGWriter();
writer.WriteToFile("output.png", dataSet);

Updated Benchmark Framework

The benchmarking framework has been updated to use Google Benchmark.

A benchmark is now a single function, which is passed to a macro:

void MyBenchmark(::benchmark::State& state)
{
  MyClass someClass;

  // Optional: Add a descriptive label with additional benchmark details:
  state.SetLabel("Blah blah blah.");

  // Must use a vtkm timer to properly capture eg. CUDA execution times.
  vtkm::cont::Timer timer;
  for (auto _ : state)
  {
    someClass.Reset();

    timer.Start();
    someClass.DoWork();
    timer.Stop();

    state.SetIterationTime(timer.GetElapsedTime());
  }

  // Optional: Report items and/or bytes processed per iteration in output:
  state.SetItemsProcessed(state.iterations() * someClass.GetNumberOfItems());
  state.SetBytesProcessed(state.iterations() * someClass.GetNumberOfBytes());
}
}
VTKM_BENCHMARK(MyBenchmark);

Google benchmark also makes it easy to implement parameter sweep benchmarks:

void MyParameterSweep(::benchmark::State& state)
{
  // The current value in the sweep:
  const vtkm::Id currentValue = state.range(0);

  MyClass someClass;
  someClass.SetSomeParameter(currentValue);

  vtkm::cont::Timer timer;
  for (auto _ : state)
  {
    someClass.Reset();

    timer.Start();
    someClass.DoWork();
    timer.Stop();

    state.SetIterationTime(timer.GetElapsedTime());
  }
}
VTKM_BENCHMARK_OPTS(MyBenchmark, ->ArgName("Param")->Range(32, 1024 * 1024));

will generate and launch several benchmarks, exploring the parameter space of SetSomeParameter between the values of 32 and (1024*1024). The chain of functions calls in the second argument is applied to an instance of ::benchmark::internal::Benchmark. See Google Benchmark's documentation for more details.

For more complex benchmark configurations, the VTKM_BENCHMARK_APPLY macro accepts a function with the signature void Func(::benchmark::internal::Benchmark*) that may be used to generate more complex configurations.

To instantiate a templated benchmark across a list of types, the VTKM_BENCHMARK_TEMPLATE* macros take a vtkm::List of types as an additional parameter. The templated benchmark function will be instantiated and called for each type in the list:

template <typename T>
void MyBenchmark(::benchmark::State& state)
{
  MyClass<T> someClass;

  // Must use a vtkm timer to properly capture eg. CUDA execution times.
  vtkm::cont::Timer timer;
  for (auto _ : state)
  {
    someClass.Reset();

    timer.Start();
    someClass.DoWork();
    timer.Stop();

    state.SetIterationTime(timer.GetElapsedTime());
  }
}
}
VTKM_BENCHMARK_TEMPLATE(MyBenchmark, vtkm::List<vtkm::Float32, vtkm::Vec3f_32>);

The benchmarks are executed by calling the VTKM_EXECUTE_BENCHMARKS(argc, argv) macro from main. There is also a VTKM_EXECUTE_BENCHMARKS_PREAMBLE(argc, argv, some_string) macro that appends the contents of some_string to the Google Benchmark preamble.

If a benchmark is not compatible with some configuration, it may call state.SkipWithError("Error message"); on the ::benchmark::State object and return. This is useful, for instance in the filter tests when the input is not compatible with the filter.

When launching a benchmark executable, the following options are supported by Google Benchmark:

  • --benchmark_list_tests: List all available tests.
  • --benchmark_filter="[regex]": Only run benchmark with names that match [regex].
  • --benchmark_filter="-[regex]": Only run benchmark with names that DON'T match [regex].
  • --benchmark_min_time=[float]: Make sure each benchmark repetition gathers [float] seconds of data.
  • --benchmark_repetitions=[int]: Run each benchmark [int] times and report aggregate statistics (mean, stdev, etc). A "repetition" refers to a single execution of the benchmark function, not an "iteration", which is a loop of the for(auto _:state){...} section.
  • --benchmark_report_aggregates_only="true|false": If true, only the aggregate statistics are reported (affects both console and file output). Requires --benchmark_repetitions to be useful.
  • --benchmark_display_aggregates_only="true|false": If true, only the aggregate statistics are printed to the terminal. Any file output will still contain all repetition info.
  • --benchmark_format="console|json|csv": Specify terminal output format: human readable (console) or csv/json formats.
  • --benchmark_out_format="console|json|csv": Specify file output format: human readable (console) or csv/json formats.
  • --benchmark_out=[filename]: Specify output file.
  • --benchmark_color="true|false": Toggle color output in terminal when using console output.
  • --benchmark_counters_tabular="true|false": Print counter information (e.g. bytes/sec, items/sec) in the table, rather than appending them as a label.

For more information and examples of practical usage, take a look at the existing benchmarks in vtk-m/benchmarking/.

Provide scripts to build Gitlab-ci workers locally

To simplify reproducing docker based CI workers locally, VTK-m has python program that handles all the work automatically for you.

The program is located in [Utilities/CI/reproduce_ci_env.py ] and requires python3 and pyyaml.

To use the program is really easy! The following two commands will create the build:rhel8 gitlab-ci worker as a docker image and setup a container just as how gitlab-ci would be before the actual compilation of VTK-m. Instead of doing the compilation, instead you will be given an interactive shell.

./reproduce_ci_env.py create rhel8
./reproduce_ci_env.py run rhel8

To compile VTK-m from the the interactive shell you would do the following:

> src]## cd build/
> build]## cmake --build .

Replaced vtkm::ListTag with vtkm::List

The original vtkm::ListTag was designed when we had to support compilers that did not provide C++11's variadic templates. Thus, the design hides type lists, which were complicated to support.

Now that we support C++11, variadic templates are trivial and we can easily create templated type aliases with using. Thus, it is now simpler to deal with a template that lists types directly.

Hence, vtkm::ListTag is deprecated and vtkm::List is now supported. The main difference between the two is that whereas vtkm::ListTag allowed you to create a list by subclassing another list, vtkm::List cannot be subclassed. (Well, it can be subclassed, but the subclass ceases to be considered a list.) Thus, where before you would declare a list like

struct MyList : vtkm::ListTagBase<Type1, Type2, Type3>
{
};

you now make an alias

using MyList = vtkm::List<Type1, Type2, Type3>;

If the compiler reports the MyList type in an error or warning, it actually uses the fully qualified vtkm::List<Type1, Type2, Type3>. Although this makes errors more verbose, it makes it easier to diagnose problems because the types are explicitly listed.

The new vtkm::List comes with a list of utility templates to manipulate lists that mostly mirrors those in vtkm::ListTag: VTKM_IS_LIST, ListApply, ListSize, ListAt, ListIndexOf, ListHas, ListAppend, ListIntersect, ListTransform, ListRemoveIf, and ListCross. All of these utilities become vtkm::List<> types (where applicable), which makes them more consistent than the old vtkm::ListTag versions.

Thus, if you have a declaration like

vtkm::ListAppend(vtkm::List<Type1a, Type2a>, vtkm::List<Type1b, Type2b>>

this gets changed automatically to

vtkm::List<Type1a, Type2a, Type1b, Type2b>

This is in contrast to the equivalent old version, which would create a new type for vtkm::ListTagAppend in addition to the ultimate actual list it constructs.

Add ListTagRemoveIf

It is sometimes useful to remove types from ListTags. This is especially the case when combining lists of types together where some of the type combinations may be invalid and should be removed. To handle this situation, a new ListTag type is added: ListTagRemoveIf.

ListTagRemoveIf is a template structure that takes two arguments. The first argument is another ListTag type to operate on. The second argument is a template that acts as a predicate. The predicate takes a type and declares a Boolean value that should be true if the type should be removed and false if the type should remain.

Here is an example of using ListTagRemoveIf to get basic types that hold only integral values.

template <typename T>
using IsRealValue =
  std::is_same<
    typename vtkm::TypeTraits<typename vtkm::VecTraits<T>::BaseComponentType>::NumericTag,
    vtkm::TypeTraitsRealTag>;

using MixedTypes =
  vtkm::ListTagBase<vtkm::Id, vtkm::FloatDefault, vtkm::Id3, vtkm::Vec3f>;

using IntegralTypes = vtkm::ListTagRemoveIf<MixedTypes, IsRealValue>;
// IntegralTypes now equivalent to vtkm::ListTagBase<vtkm::Id, vtkm::Id3>

Write uniform and rectilinear grids to legacy VTK files

As a programming convenience, all vtkm::cont::DataSet written by vtkm::io::VTKDataSetWriter were written as a structured grid. Although technically correct, it changed the structure of the data. This meant that if you wanted to capture data to run elsewhere, it would run as a different data type. This was particularly frustrating if the data of that structure was causing problems and you wanted to debug it.

Now, VTKDataSetWriter checks the type of the CoordinateSystem to determine whether the data should be written out as STRUCTURED_POINTS (i.e. a uniform grid), RECTILINEAR_GRID, or STRUCTURED_GRID (curvilinear).

References

Feature Merge Request
Add Kokkos backend Merge-request: !2164
Extract component arrays from unknown arrays Merge-request: !2354
ArrayHandleGroupVecVariable holds now one more offset. Merge-request: !1964
Create ArrayHandleOffsetsToNumComponents Merge-request: !2299
Implemented ArrayHandleRandomUniformBits and ArrayHandleRandomUniformReal Merge-request: !2116
ArrayRangeCompute works on any array type without compiling device code Merge-request: !2409
Algorithms for Control and Execution Environments Merge-request: !1920
Redesign of ArrayHandle to access data using typeless buffers Merge-request: !2347
vtkm::cont::internal::Buffer now can have ownership transferred Merge-request: !2200
Provide scripts to build Gitlab-ci workers locally Merge-request: !2030
Configurable default types Merge-request: !1997
Result DataSet of coordinate transform has its CoordinateSystem changed Merge-request: !2099
Precompiled ArrayCopy for UnknownArrayHandle Merge-request: !2396
Disable asserts for CUDA architecture builds Merge-request: !2157
Portals may advertise custom iterators Merge-request: !1929
DataSet now only allows unique field names Merge-request: !2099
ArrayHandleDecorator Allocate and Shrink Support Merge-request: !1933
Deprecate ArrayHandleVirtualCoordinates Merge-request: !2177
Deprecate DataSetFieldAdd Merge-request: !2106
Deprecate Execute with policy Merge-request: !2093
Virtual methods in execution environment deprecated Merge-request: !2256
Add VTKM_DEPRECATED macro Merge-request: !2266
Filters specify their own field types Merge-request: !2099
Flying Edges Merge-request: !2099
Updated Benchmark Framework Merge-request: !1936
Disable asserts for HIP architecture builds Merge-request: !2270
Implemented PNG/PPM image Readers/Writers Merge-request: !1967
Reorganization of io directory Merge-request: !2067
Add ListTagRemoveIf Merge-request: !1901
Masks and Scatters Supported for 3D Scheduling Merge-request: !1975
Improvements to moving data into ArrayHandle Merge-request: !2184
Avoid raising errors when operating on cells Merge-request: !2099
Order asynchronous ArrayHandle access Merge-request: !2130
Enable setting invalid value in probe filter Merge-request: !2122
ReadPortal().Get(idx) Merge-request: !2078
Recombine extracted component arrays from unknown arrays Merge-request: !2381
Removed old ArrayHandle transfer mechanism Merge-request: !2347
Removed OpenGL Rendering Classes Merge-request: !2099
Scope ExecObjects with Tokens Merge-request: !1988
Shorter fancy array handle classnames Merge-request: !1937
Support ArrayHandleSOA as a "default" array Merge-request: !2349
Porting layer for future std features Merge-request: !1977
Add a vtkm::Tuple class Merge-request: !1977
UnknownArrayHandle and UncertainArrayHandle for runtime-determined types Merge-request: !2202
Added VecFlat class Merge-request: !2354
Remove VTKDataSetWriter::WriteDataSet just_points parameter Merge-request: !2185
Move VTK file readers and writers into vtkm_io Merge-request: !2100
Write uniform and rectilinear grids to legacy VTK files Merge-request: !2173
Add atomic free functions Merge-request: !2223
Replaced vtkm::ListTag with vtkm::List Merge-request: !1918