Previously WorkletInvokeFunctor inherited from vtkm::exec::FunctorBase,
which is also the base class for all users Worklets and for all functors
based to DeviceAdapter::Schedule.
This is done for a few reasons. The first is that we reduce the
minimum size of user worklets. Previously the users worklet would hold
a reference to the error message, and so would the wrapper class added
when calling DeviceAdapter::Schedule. Now we only have the users worklet
holding a reference.
Second, by refactoring to have two base classes we can better improve
the documentation on what responsibilities FunctorBase.h has, compared
to TaskBase.
The current design for ArrayPortalVirtual makes it a requirement for all
array portals (that it wraps) to have Set defined. Thus, make sure Set is
defined for all ArrayPortal. Where Set is invalid, an assert is thrown if
something calls it at runtime.
Previously, ExternalFaces really only supported tetrahedral meshes that
have only triangular faces. These changes support all mixes of cells and
their faces.
Most functions in the CUDA runtime API return an error code that must be
checked to determine whether the operation completed successfully. Most
operations in VTK-m just called the function and assumed it completed
correctly, which could lead to further errors. This change wraps most
CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and
throws an exception if the call fails.
The cell derivative/gradient functions were all designed with scalars in
mind. Although the field type is templated and you could pass in a
vector type for the field, many of these classes would perform the
computation incorrectly. These changes specifically support derivatives
of vector types.
The shape array in CellSetExplicit was changed a while ago to be of type
vtkm::UInt8. However, the shape types remained at vtkm::IdComponent. I
can think of no reason why they should not be the same type.
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
8dadf560 Add in support for vector fields to Gradient worklet.
d53f43fb CellDerivaties now support computing the derivatives of vtkm::Vec fields.
b4378c85 Allow vtkm::Matrix to support T values which are vtkm::Vec.
9caabf97 vtkm::Vec now supports +=, -=, *=, and /=.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !608
This usage of raw_referenc_cast returns a const PortalValue which is than
assigned too. So to work around the problem we need to mark operator= on
the class as const.
There were many tests that created code paths for every base and Vec
type that VTK-m supports (up to 4 components). Although this is
admirable, it is also excessive, and our compile times for the tests are
very long.
To shorten compile times, remove the TryAllTypes method. Replace it with
a version of TryTypes that uses a default list of "exemplar" set of
integers, floats, and Vecs.
The UnitTestParametricCoordinates test uses a pseudorandom number
generator to create some random set of parametric coordinates, convert
to world coordinates, and then back to parametric coordinates.
This has been working fine except that the world coordinate to parametric
coordinate conversion is not extremely precise for some cell shapes. (It
would not be cost effective to make it more precise.) Because of this,
the test_equal for this comparison has a pretty high threshold.
While looking at a dashboard I happened across a failure for this test.
It turns out that one of the parametric coordinates created for the
pyramid test for seed 1465529014 was just outside of this threshold, but
otherwise correct. I raised the threshold a little to try to prevent
this error.
ThreadIndicies constructor was templated on the invocation type, which created
thousand's of versions of that symbol which all had the same behavior. So now
remove that and move that logic into a Worklet function since it requires
the invocation info.
These asserts are consolidated into the unified Assert.h. Also made some
minor edits to add asserts where appropriate and a little bit of
reconfiguring as found.
All the other math functions are in the vtkm package. This one was in
vtkm::exec because it uses a callback method. This can be problematic on
CUDA the the declaration of NewtonsMethod does not match the callback
method. However, we now have a VTKM_SUPPRESS_EXEC_WARNINGS macro that
allows a VTKM_EXEC_CONT_EXPORT function (like NewtonsMethod) to call
either a VTKM_EXEC_EXPORT or VTKM_CONT_EXPORT without a warning.
5b705a52 Fixing return value for void function
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !334
As part of the work to reduce the number of copies of array handles the CUDA
backend was broken. The transportation of stack allocated classes to CUDA
relies on all member variables being value based, not references/pointers.
This correct the issue of sending references to host side memory to CUDA, at
the cost of two copies of the Invocation object.
When we move to C++11 we need to revisit this work and see if std::move
can help reduce the cost of these copies.
Scatter in worklets
Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.
See merge request !221
I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.
I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.
This has been requested on the mailing list to make it easier to
interpolate integer vectors.
There are a couple of downsides to this addition. First, it implicitly
casts doubles back to whatever the vector type is, which can cause a
loss of precision. Second, it makes it more likely to get overload
errors when multiplying with Vec. In particular, the operator to cast
Vec of size 1 to the component class had to be removed.
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.
To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
Previously, each VecFromPortalPermute (the type that held the from field
values) held its own copy of the indices. For point to cell on
structured grids, this was a lot of repeated data values, which has the
potential to fill up cache and registers. Instead, just use pointer
references.
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.
Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.
Also added the VisitIndex execution signature tag, which relies on this
behavior.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.
This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.
This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
When you create a CellSetPermutation you provide an array of the cell ids that
you want to iterate. This allows the user to do custom blanking of a data set,
or to do multi iteration over a set of cells.
Xcode 7 warnings
The XCode 7 compiler has a new warning for unused typedefs. The Boost code we use has some instances where this warning gets issued. Suppress these warnings.
See merge request !199
This is to be used in place of BOOST_STATIC_ASSERT so that we can
control its implementation.
The implementation is designed to fix the issue where the latest XCode
clang compiler gives a warning about a unused typedefs when the boost
static assert is used within a function. (This warning also happens when
using the C++11 static_assert keyword.) You can suppress this warning
with _Pragma commands, but _Pragma commands inside a block is not
supported in GCC. The implementation of VTKM_STATIC_ASSERT handles all
current cases.
fd685210 Always install all device headers even when device isn't enabled.
b1663b24 Add an example of using multiple backends from a single translation unit.
fc0ff69d Methods with try/catch need to be host only.
4d635d64 DeviceAdapter Tags now always exist, and contain if the device is valid.
cf32b430 Teach Configure.h to store if TBB and CUDA are enabled.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !198
The boost assert macros seem to have an issue where they define an
unused typedef. This is causing the XCode 7 compiler to issue a warning.
Since the offending code is in a macro, the warning is identified with
the VTK-m header even though the code is in boost. To get around this,
wrap all uses of the boost assert that is causing the warning in the
third party pre/post macros to disable the warning.
When ExecutionWholeArray is passed to a worklet, it is expected to
behave like an array portal. However, it was missing the
GetNumberOfValues method and the ValueType typedef. These are now added.
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.
Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan
Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
median = 0.0117049s
median abs dev = 0.00324614s
mean = 0.0167615s
std dev = 0.00786269s
min = 0.00845875s
max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
median = 0.0234463s
median abs dev = 0.000317249s
mean = 0.021452s
std dev = 0.00470307s
min = 0.011255s
max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
median = 0.0310486s
median abs dev = 0.000182129s
mean = 0.0286914s
std dev = 0.00634102s
min = 0.0116225s
max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
median = 0.0310617s
median abs dev = 0.000193583s
mean = 0.0295779s
std dev = 0.00491531s
min = 0.0147257s
max = 0.032307s
In the special case where you are loading the point coordinates for a
structured grid in a point to cell map (an important use case), create a
VecRectilinearPointCoordinates rather than build a Vec of the values.
This will activate the cell specalizations in previous commits.
These changes also added some flat-to-logical index conversion and vice
versa in ConnectivityStructuredInternals. This change also fixed a bug
in getting cells attached to points in 2D grids. (Actually, technically
someone else fixed it and checked it in first. The changes were merged
during a rebase.)
I also added a specalization to Vec for 1D that implicitly converts
between the 1D Vec and the component. This can be convenient when
templating on the Vec length.
There are several switch statements in ParametricCoordinates.h that have
a default clause that is an error condition. The code was simply calling
the RaiseError method on the worklet before quiting. Although the
behavior is supposed to be similar to throwing an exception, flow
actually does still continue. These error default clauses kept going
without breaking from the switch or initializing the variables the
method is supposed to set. Although the values should not matter once
the error is raised, I was getting compiler warnings about it. It's also
bad practice as certain NaN values can cause the program to signal out,
so it is best to set the data to something.
These cell types are inherited from VTK, but they are basically the same
as quad and hexahedron, respectively. The only useful difference is that
pixel and voxel are supposed to be axis aligned, but you cannot
determine that by the cell shape alone (at least not just from the cell
set).
A big issue with these is that their indexing is different that of quad
and hex. The development team had a long discussion about the benefits
of the alternate indexing, but after consulting with Berk Geveci and
Will Schroder from the VTK team, that indexing is not really taken
advantage of at the cell level. Thus, it is really just a nuisance in
VTK-m.
The original way was overconstrained with respect to the number of
parametric coordinates.
Also implemented WorldCoordinatesToParametricCoordinates for polygons.
The general polygon version is not implemented yet because I need to
change the way it is interpolated.
To get wedge to work correctly, I had to change the interpolation
slightly there. Previously the interpolation had a singularity at
the tip.
The functions for doing interpolations and derivatives were called
CellInterpolate and CellDerivative, but the file names were
Interpolate.h and Derivative. Now the files are CellInterpolate.h and
CellDerivative.h so they are more consistent and a bit easier to find.
The interpolate and derivative functions has the shape tag at end of the
function arguments (just before the "worklet" parameter, which is just
the error handling mechanism). However, the parametric coordinate
functions had the shape tag at the beginning. Moved the shape tag to the
end to be more consistent within these functions and also in other uses
throughtout VTK-m.
Previously, when you requested a CellShape in the ExecutionSignature,
you just got an ID stored in a vtkm::IdComponent. This change returns a
cell shape tag of the appropriate type (or generic if the type is not
known at compile time). This will allow functions called from the
worklet to specialize on the cell type better.
We have been using the term "shape" in the cell set and connectivity
classes. To be consistent, use the term "shape" for the geometric
identify of the cell everywhere.
The constructor for execution whole array was wrong and would result in compilation error if used. This fixes that and adds a test path for testing ExecutionWholeArray with ExecutionWholeArrayConst.
Variable topology fields
Changes to fetching in topology maps that lets you properly deal with cases where you do not know how many values are being fetched at compile time. For example, explicit cell sets can have any number of cell shapes that have different numbers of nodes.
This change should resolve issue #26.
See merge request !128
Workaround thrust 1.8 inclusive scan issue.
Starting in thrust 1.8 the implementation of scan inclusive inside
thrust became highly optimized by using parallel task groups. This
new implementation has a bug that only exists when using custom
binary operators, large size arrays, release mode, and no
debugger or mem-checker attached.
While I have submitted the issue to thrust, we need to be able
to work around the existing issue. The solution I have chosen is
to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators
as being commutative as far as thrust is concerened. To make
sure we don't get any unexpected behavior I have also had
to create WrappedBinaryPredicate so that we don't mark any
predicate as commutative.
See merge request !129
Starting in thrust 1.8 the implementation of scan inclusive inside
thrust became highly optimized by using parallel task groups. This
new implementation has a bug that only exists when using custom
binary operators, large size arrays, release mode, and no
debugger or mem-checker attached.
While I have submitted the issue to thrust, we need to be able
to work around the existing issue. The solution I have chosen is
to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators
as being commutative as far as thrust is concerened. To make
sure we don't get any unexpected behavior I have also had
to create WrappedBinaryPredicate so that we don't mark any
predicate as commutative.
BOOST_MPL_ASSERT is causing warnings in the PGI compiler. Apparently,
when BOOST_MPL_ASSERT succeeds it declares a static object with a unqiue
name scoped to the file. The problem is that the PGI compiler is pretty
picky about things being declared without being used, so it was emitting
useless warnings about successful BOOST_MPL_ASSERTs. However,
BOOST_STATIC_ASSERT does not seem to have this problem, so for the benefit
of PGI change the compile-time assert method.
The PGI compiler is fussy about finding declared variables and methods
that have limited scope and are never used. Thus, it is complaining about
some internal test classes that are properly implementing the ArrayPortal
interface even though not all of it is being accessed.
To get around the problem, put them in a non-anonymous namespace with a
name unlikely to conflict with anything. The compiler will recognize that
it is possible to access these classes outside the scope of the file and
shut up about items not being used.
This class was used to store a group of from field data in a topology
map. However, the fetching has been changed to use a customized class
for each type of fetch that can be optimized for the fetch type and does
not require to know the number of items in the fetch at compile time.
Thus, this class is no longer needed, so it is being removed.
This change removes the requirement to specify some maximum cell length
in each of the worklets, which is basically impossible. It also makes
some of the loading more lazy, which might help reduce the number of
registers required in a worklet.
Previously when you fetched the indices from an explicit cell set, you
would get back a Vec of a fixed length an expected to use a subset of
it. Now you get back a Vec-like object that reports the exact length.
This Vec-like is implemented with VecFromPortal, so that the data does
not need to be copied to the stack. Rather, it is pulled from memory as
requested.
We want to be able to get topological connections where it is difficult
to know how many values you get each time. In this change, the type of
the vector holding the from indices is determined from the connectivity
object, and the worklet does not know the type (it must be templated).
Although you do not need to specify the max number for this value set
(you still currently do for field values), we still need to change the
type for explicit sets that uses something that does not rely on the Vec
class. The cell-to-point method also needs a Vec wrapper that allows it
to shorten the vector dynamically.
Previously, all arrays passed to worklets were designated as either
input or output. No in-place operation was permitted. This change adds
the FieldInOut tag for ControlSignature in both WorkletMapField and
WorkletMapTopology that allows you to read and write from the same
array.
(Re-) Add a helper structure that holds the connectivity information for
a particular topology connection (e.g. from points to cells) to make it
easier to manage connections in multiple different directions in
CellSetExplicit.
Unlike the previous version of connectivity, this structure is
considered "internal" and not exposed through the API so that
CellSetExplicit can better manage the data. Also, many of the helper
methods remain in CellSetExplicit since they were specific for point-to-
Also, CellSetExplicit has a mechanism to take an arbitrary pair of
TopologyElementTags and get the appropriate connectivity. This should
simplify adding connections in the future.
In the CellSet and related classes, a connection was referred to by a
"from" topology element and a "to" topology element. However, in the
worklet control signature tags the elements were referred to by "src"
and "dest." To make things consistent, use "from" and "to" everywhere.
Most of VTK-m follows the convention of calling the 0D topology elements
"points" (which follows the convention of VTK). However, there were
several places where they were referred to as "nodes." Make things
consistent by calling them points everywhere.
Also merged some redundant ExecutionSignature tags.
Previously there was a Connectivity* structure for both the control
environment and the execution environment. This was necessary before
because the connectivity is explicit to the from and to topology
elements, so you would get this structure from the appropriate call to
CellSet*. However, the symantics are changed so that the type of
connectivity is selected in the worklet's dispatcher. Thus, it is now
much cleaner to manage the CellSet structure in the CellSet class itself
and just have a single set of Connectivity* classes in the execution
environment.
Also moved from vtkm namespace to vtkm::internal namespace. This change
is to then move the structured connectivity classes to the cont and exec
namespaces.
C and C++ has a funny feature where operations on small integers (char
and short) actually promote the result to a 32 bit integer. Most often
in our code the result is pushed back to the same type, and picky compilers
can then give a warning about an implicit type conversion (that we
inevitably don't care about). Here are a lot of changes to suppress
the warnings.
On one of my compile platforms, GCC was giving conversion warnings from
any boost include that was not wrapped in pragmas to disable conversion
warnings. To make things easier and more robust, I created a pair of
macros, VTKM_BOOST_PRE_INCLUDE and VTKM_BOOST_POST_INCLUDE, that should
be wrapped around any #include of a boost header file.
Some fixes to VertexClustering
VertexClustering previously only worked with data of a specific floating
point type (32 bit for point coordinates). Add some templates to accept
either 32 bit or 64 bit floating points for point coordintes and be a
bit more careful about implicit type conversions.
I also made some changes to conform better with the VTK-m coding
standards. The most common changes are using 2 space indentation for all
block levels, capitolizing and using camel case for all class members,
and prefixing "this->" to all use of internal class members.
See merge request !64
VertexClustering previously only worked with data of a specific floating
point type (32 bit for point coordinates). Add some templates to accept
either 32 bit or 64 bit floating points for point coordintes and be a
bit more careful about implicit type conversions.
I also made some changes to conform better with the VTK-m coding
standards. The most common changes are using 2 space indentation for all
block levels, capitolizing and using camel case for all class members,
and prefixing "this->" to all use of internal class members.
This is built ontop of the ExecutionObjectBase work, and is designed to show
other developers how they can create custom objects that are shared among
all worklets, but are passed as parameters to the worklet.
By mistake the cuda texture memory load code was not being used, so correct
that issue and allow loading of vtkm::Vec and primitive types. Currently
the only issue is loading int8/int16 uint8/uint16 through texture memory.
Instead of having a single specialization for sort and zip handles,
we know handle any fancy handles being passed to the cuda device adapter.
This was done by reworking how we represent fancy iterators inside thrust,
and instead of using a transform iterator + counting iterator we just use
a iterator_facade.
ExplicitConnectivity can now use different storage backends to allow
for cheaper representations of the data. For example a pool of triangles
can now implicit handles for shape and num indices.
You can use function level statics, but instead you must use class level
statics, this is due to how nvcc treats method statics as being shared
across all threads in a warp.
This includes changing methods like LoadDataForInput to PrepareForInput.
It also changed the interface a bit to save a reference to the storage
object. (Maybe it would be better to save a pointer?) These changes also
extend up to the ArrayManagerExecution class, so it can effect device
adapter implementations.
Porting the dax device adapter over to vtkm. Unlike the dax version, doesn't
use the thrust::device_vector, but instead uses thrust::system calls so that
we can support multiple thrust based backends.
Also this has Texture Memory support for input array handles. Some more work
will need to be done to ArrayHandle so that everything works when using an
ArrayHandle inplace with texture memory bindings.
ICC can be pretty thorough about finding unused elements. In this case
it was picking up an unused method in instances of a templated class
in an anonymous namespace. It was a method that should be there due to
the nature of the class, but it happened to not be used (which was OK,
too). To get around the problem, I just added some use of that method
in another method.
It's easy to put accidently put something that is not a valid tag in a
ControlSignature or ExecutionSignature. Previously, when you did that
you got a weird error at the end of a very long template instantiation
chain that made it difficult to find the offending worklet.
This adds some type checks when the dispatcher is instantated to check
the signatures. It doesn't point directly to the signature or its
parameter, but it is much closer.
MSVC is picky about type conversions. To get it to shut up, explicitly
cast the worklet return value to the fetch value in the
WorkletInvokeFunctor. The good is that it will help with needing
explicit conversions on these return values. But that is also bad in
that it might make some unexpected conversions possible.
One fix is a simple (pointless) compiler warning about precision. The
other fix is an error in one of the test codes that did not clear out
the message string in an error message buffer like it was supposed to.
These changes support the implementation of DispatcherBase. This class
provides the basic functionality for calling an Invoke method in the
control environment, transferring data to the execution environment,
scheduling threads in the execution environment, pulling data for each
calling of the worklet method, and actually calling the worklet.
The Fetch class is responsible for moving data in and out of some
collection in the execution environment. The Fetch class is templated
with a pair of tags (the type of fetch and the aspect) that control the
mechanism used for the fetch.
The Transport class is responsible for moving data from the control
environment to the execution environment. (Actually, it might be more
accurate to say it gets the execution environment associated with a
given control object.) The Transport class is templated with a tag that
controls the mechanism used for the transport.
Whenever creating a functor to be launched in the execution environment
using the device adapter Schedule algorithm, you had to also create a
couple of methods to handle error message buffers. For convenience, lots
of code started to just inherit from WorkletBase. Although this worked,
it was a misnomer (and might cause problems in the future if worklets
later require different things from its base). To get around this
problem, add a FunctorBase class that is intended to be used as the
superclass to functors called with Schedule.