ThreadIndicies constructor was templated on the invocation type, which created
thousand's of versions of that symbol which all had the same behavior. So now
remove that and move that logic into a Worklet function since it requires
the invocation info.
The two worklets for marching cubes use tables stored in arrays that
have random access. Previously, they arrays were passed using the
ExecObject tag in ControlSignature along with ExecutionWholeArrayConst.
This changes to using a WholeArrayIn tag and just passing the
ArrayHandle directly to the Invoke method. The end result is the same,
but the code is a bit cleaner.
The WholeArrayIn, WholeArrayInOut, and WholeArrayOut ControlSignature
tags behave similarly to using an ExecObject tag with an
ExecutionWholeArray or ExecutionWholeArrayConst object. However, the
WholeArray* tags can simplify some implementations in two ways. First,
it allows you to specify more precisely what data is passed in. You have
to pass in an ArrayHandle or else an error will occur (as opposed to be
able to pass in any type of execution object). Second, this allows you
to easily pass in arrays stored in DynamicArrayHandle objects. The
Invoke mechanism will automatically find the appropriate static class.
This cannot be done easily with ExecutionWholeArray.
The previous implementation was declaring static arrays in methods,
which cannot be used on a CUDA device. Instead, make static tables that
can be passed to the device with array handles (much like clip tables
do).
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.
To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.
Also added the VisitIndex execution signature tag, which relies on this
behavior.
Previously we always ran DynamicTransformCont even if we knew all the types.
By checking for Dynamic types first, we save roughly 3% on the binary size.
This also is a good starting point for a redesign of DynamicTransformCont
DispatcherMapField was templated on the device adapter but it
actually doesn't need to be, only BasicInvoke and subsequent
methods need to be templated on the device.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
Xcode 7 warnings
The XCode 7 compiler has a new warning for unused typedefs. The Boost code we use has some instances where this warning gets issued. Suppress these warnings.
See merge request !199
This is to be used in place of BOOST_STATIC_ASSERT so that we can
control its implementation.
The implementation is designed to fix the issue where the latest XCode
clang compiler gives a warning about a unused typedefs when the boost
static assert is used within a function. (This warning also happens when
using the C++11 static_assert keyword.) You can suppress this warning
with _Pragma commands, but _Pragma commands inside a block is not
supported in GCC. The implementation of VTKM_STATIC_ASSERT handles all
current cases.
The boost assert macros seem to have an issue where they define an
unused typedef. This is causing the XCode 7 compiler to issue a warning.
Since the offending code is in a macro, the warning is identified with
the VTK-m header even though the code is in boost. To get around this,
wrap all uses of the boost assert that is causing the warning in the
third party pre/post macros to disable the warning.
The names of the cell shapes and header files changed superficially. The
check in missed the clip and external faces. Update that code.
There are also some less superficial changes that allow you to manage
cell shapes with tags. It might be worthwhile to update this code to use
that new infrastructure.
On one of my compile platforms, GCC was giving conversion warnings from
any boost include that was not wrapped in pragmas to disable conversion
warnings. To make things easier and more robust, I created a pair of
macros, VTKM_BOOST_PRE_INCLUDE and VTKM_BOOST_POST_INCLUDE, that should
be wrapped around any #include of a boost header file.
Using enable_if/disable_if as a return type has a negative impact on
binary size compared to use a boost::true/false_type as a method parameter.
For comparison the WorkletTests_TBB sees a 6-7% reduction in binary size when
compiled with O3.
Origin WorkletTests_TBB size details:
__TEXT __DATA __OBJC others dec hex
2363392 49152 0 4297793536 4300206080 1004ff000
Updated WorkletTests_TBB size details:
__TEXT __DATA __OBJC others dec hex
2215936 49152 0 4297568256 4299833344 1004a4000
The ExecObject tag for the ControlSignature was not declared right so
would cause a compile error if it was ever used. Clearly this was not
being tested properly, so the dispatcher base unit test now passes an
ExecObject parameter.
Previously, DispatcherBase had an ivar to determine whether to use the
numInstances passed on the stack or to use a 3D range held in a
different ivar. This change allows either a 1D range or 3D range to be
passed on the stack, which I expect to be closer to how we we handle
this when 3D ranges are fully supported.
This also fixes a bug introduced with commit
fdac208acbfa47b613d899a36cefc32a01e8f0a8 where the Use3DSchedule ivar
was not set correctly in UnitTestDispatcherBase.
The functors in the ForEach, StaticTransform, and DynamicTransform
methods sometimes can use the index of the parameter that they are
operating on. This can be a helpful diagnostic in compile and run-time
errors. It is also helpful when linking parameters from one
FunctionInterface with those of another.
This new features are now replacing implementations using the Zip
functionality that was removed earlier. The implementation is actually
simplified a bit.
It's easy to put accidently put something that is not a valid tag in a
ControlSignature or ExecutionSignature. Previously, when you did that
you got a weird error at the end of a very long template instantiation
chain that made it difficult to find the offending worklet.
This adds some type checks when the dispatcher is instantated to check
the signatures. It doesn't point directly to the signature or its
parameter, but it is much closer.
Instead of just checking that a dispatcher's Invoke input is an
ArrayHandle, also check that the ValueType of the ArrayHandle is
compatible with the types of the worklet operator. This is done by
adding a template argument to the ControlSignature tags that is a type
list tag that gets passed to the type check.
These changes support the implementation of DispatcherBase. This class
provides the basic functionality for calling an Invoke method in the
control environment, transferring data to the execution environment,
scheduling threads in the execution environment, pulling data for each
calling of the worklet method, and actually calling the worklet.