The previous implementation of the `DotProduct` filter only worked with
arrays containing "common" types. The filter has been updated to use the
extract component feature of `UnknownArrayHandle` to support computing
the dot product of arrays of almost any type.
These methods somewhat simplify doing the `CastAndCall` from an
`UnknownArrayhandle` that comes from an input field.
The motivation for this change is for the User's guide, where I am
having a chicken-and-egg problem of wanting to describe how to make a
simple filter implementation without having to go into details about
`UnknownArrayHandle`.
The `ArrayCopy` method has been changed to be precompiled. It handles
most standard array types. But there are some special `ArrayHandle`
types that are not correctly handled, and these go to a slow fallback.
Find places in the code that use that fallback and fix them.
There are also some instances of replacing an `ArrayHandleCounting` with
an `ArrayHandleIndex`. This change is probably not strictly necessary to
make the `ArrayCopy` faster, but when it can be used `ArrayHandleIndex`
is generally better.
Many arrays decorate other arrays but still allow an efficient component
extraction. However, the component can only be extracted if it can be
efficiently extracted from the array being decorated. In this case, the
array reported that it could efficiently extract even though it could
not.
Fixed this by having the `ArrayExtractComponentImpl` classes inherit
from the respective superclass. This will in turn inhert the
`ArrayExtractComponentImplInefficient` if it is the base class.
There are some common uses of `ArrayCopy` that don't use basic arrays.
Rather than move away from `ArrayCopy` for these use cases, compile a
special fast path for these cases.
This required some restructuring of the code to get the template
resolution to work correctly.
Rather than require `ArrayCopy` to create special versions of copy for
all arrays, use a precompiled versions. This should speed up compiles,
reduce the amount of code being generated, and require the device
compiler on fewer source files.
There are some cases where you still need to copy arrays that are not
well supported by the precompiled versions in `ArrayCopy`. (It will
always work, but the fallback is very slow.) In this case, you will want
to switch over to `ArrayCopyDevice`, which has the old behavior.
There was a precompiled version of mapping permutations in the
filter library. However, there are reasons to be able to copy
a permutation of an array outside of filters. So add the
capability to the more general filter library.