This is only set while compiling device code, and is useful
for code that needs different implementations on devices (e.g.
they call CUDA device intrinsics, etc).
`vtkm::cont::testing` now initializes with logging enabled and support
for device being passed on the command line, `vtkm::testing` only
enables logging.
The purpose of the TestBuild infrastructure was to confirm that
VTK-m didn't have any lexical issues when it was a pure header
only project. As we now move to have more compiled components
the need for this form of testing is mitigated. Combined
with the issue of TestBuilds causing MSVC issues, we should
just remove this infrastructure.
Adds a variable `GlobalPointIndexStart` to `CellSetStructured`.
Adding this to the cell-set, instead of the coordinate system, enables this
feature for different types of datasets like uniform grid, rectilinear, etc.,
with this one change.
The extents can be computed using `GlobalPointIndexStart` and `PointDimensions`.
Did a bit of renaming of the support classes used for
WorkletPointNeighborhood. First, the OnBoundary tag is changed to
Boundary to match other tags and reflect some changes in the resulting
methods. Also moved the BoundaryState and Neighborhood classes from
vtkm::exec::arg to vtkm::exec to be more accessible. Finally, the
Neighborhood class name was changed to FieldNeighborhood to be more
specific on what role this class plays with neighborhood.
Previously, WorkletPointNeighborhood had a template argument to select
the size of the neighborhood. This change removes that template
argument. Instead, the vtkm::exec::arg::BoundaryState methods now take
in a size parameter when determining when it overlaps the boundary.
If in the future we want to add the ability to select the neighborhood
size at compile-time (for performance reasons), I suggest adding this
template argument to the OnBoundary tag for ExecutionSignature.
Previously, vtkm::exec::arg::BoundaryState only provided methods that
said whether or not the neighborhood extened past the boundary of a
mesh. That is fine for a 3x3x3 neighborhood, which can only extend over
the boundary by one. However, that is problematic for larger
neighborhoods where you may need to know how far neighborhood extends
over the boundary.
This changes allows you to query how far the neighborhood extends within
the constrains of the boundary.
2b0548739 Add ExecutionAndControlObjectBase
62d3b9f4f Correct warning about potential divide by zero
98a0a20fe Allow ArrayHandleTransform to work with ExecObject
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1403
This is a subclass of ExecutionObject and a superset of its
functionality. In addition to having a PrepareForExecution method, it
also has a PrepareForControl method that gets an object appropriate for
the control environment. This is helpful for situations where you need
code to work in both environments, such as the functor in an
ArrayHandleTransform.
Also added several runtime checks for execution objects and execution
and cotnrol objects.
96ae94420 Simplified execution object creation for atomic array
0bd197af9 moved TwoLevelUniformGridExecutionObject to vtkm/exec/internal
6ce895be8 simplified how atomic arrays create execution objects
f1ee5b92a fix a rebase error
25d140361 fix bad rabse for wireframer
f892695f1 fixing so wierd merging issue
9bb00ec66 moved the execution object for TwoLevelUniform grid to vrkm::exec
db1c9bfee Change the namespacing of atomic array
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1243
1. Have a per-thread pinned array for cuda errors
2. Check for errors before scheduling new tasks and at explicit sync points
3. Remove explicit synchronizations from most places
Addresses part 2 of #168
There was an error in TestingPointLocatorUniformGrid in which it was
creating arrays of type vtkm::Float32 and passing them to a worklet that
expected vtkm::FloatDefault. This is corrected.
- Reducing the stack allocation for CUDA for the BIH unit test
- Adding changes from Ken's review
- Suppress ptxas stack size warning for BoundingIntervalHierarchy
The vtkm::exec::PointLocatorUniformGrid has a default constructor. It
was "helpfully" declared as VTKM_EXEC_CONT, but apparently that is the
wrong thing to do for constructors that are set to default.
Previously when PointLocatorUniformGrid.h was compiled by the CUDA
compiler, the compiler would crash. Apparently during the ptxas
part of the compiler goes into a crazy recursion and runs out of
stack space. This appears to be a long-standing bug in CUDA
(been there for multiple releases) without a clear reason why it
sometimes rears its ugly head. (See for example
https://devtalk.nvidia.com/default/topic/1028825/cuda-programming-and-performance/-ptxas-died-with-status-0xc00000fd-stack_overflow-/)
The problem appears to be when having a doubly or triply nested
loop over a box of values to check in the uniform array. This
appears to fix the problem by converting that to a single for
loop with some index magic to convert that to 3D indices.
Vec class objects can now be constructed during compile-time
as constant expressions by calling Vec( T, ... ) constructors
or through brace-initialization.
Constant expression using fill constructor and nested vectors
of sizes greater than 4 are not supported yet.
Changes made to WrappedOperators.h for resolving overload
ambiguities in Vec construction and typecasting.
Appropriate test cases were added to UnitTestTypes.cxx.
Addresses issue #199.
Found via `codespell` and `grep`
more typos
includes source typo change and a typo that needs further review
follow-up typos
Follow-up typos
Revert a commit
183bcf109 Add initial version of an OpenMP backend.
7b5ad3e80 Expand device scheduler test to check for overlap.
e621b6ba3 Generalize the TBB radix sort implementation.
d60278434 Specialize swap for ArrayPortalValueReference types.
761f8986f Cache inputs to SSI::Unique benchmark.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1099
There was a direct edit to WorkletInvokeFunctorDetail.h that was not
reflected in WorkletInvokeFunctorDetail.h.in. This makes the two files
consistent so that future edits will not loose the changes.
This commit adds a worklet and a filter to compute
signed integral measures of cells.
It also begins the practic of adding a markdown
changelog entry in `doc/changelog`. See the
changelog entry for details.
While making changes to how execution objects work, we had agreed to
name the base object ExecutionObjectBase instead of its original name of
ExecutionObjectFactoryBase. Somehow that change did not make it through.
Rather than require all ExecutionObjectFactoryBase classes to declare a
templated ExecObjectType type, get the type of the execution object
directly from the result of the PrepareForExecution method.
In order to make the change from the current way execution obejcts are utilized to the new proposed executionObjectFactory process type checks now has to look for the new execution object factory class to check against.
When converting integer fields the interpolate code generates lots
of warnings that we are promoting into floating point space.
The quickest solution is to suppress these conversion warnings
all together for all interpolation functions.
adcabb03 VTK-m Vec<> constructors are now more conservative.
6267deb6 CellDerivativeFor3DCell has a better version for Vec of Vec fields.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1170
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
Previously we would compute a 3x3 matrix where each element was a Vec. Using
the jacobain of a single component is sufficient so by using that we safe 2 to
3 times the memory space.
Additionally by moving to doing the derivative as a per component algorithm
we work around the issues found in bug
https://gitlab.kitware.com/vtk/vtk-m/issues/221. In effect when trying to
construct a Vec of Vec from a component of a different floating type e.g.:
Vec3d x(0.0, 1.0, 2.0);
Vec3f a = x;
Vec3x3f b = x;
Vec3x3f c(x, x, x);
Generates bad values vectors such as b which look like:
b: [[0,0,0],[1,1,1],[2,2,2]]
c: [[0,1,2],[0,1,2],[0,1,2]]
The new and improved vtkm::cont::ColorTable provides a more feature complete
color table implementation that is modeled after
vtkDiscretizableColorTransferFunction. This class therefore supports different
color spaces ( rgb, lab, hsv, diverging ) and supports execution across all
device adapters.
Due to recent changes to remove static arrays that are not supported on
some devices, the function to return all the local point indices on a
face was removed. That left no way to get the structure of cell faces
short of pulling an internal data structure.
This change resurrects a function to get point indices for a face. The
interface for this method has necessarily changed, so I also changed the
corresponding function for getting edge indices.
These changes now allow VTK-m to compile on CUDA 7.5 by using const arrays,
when compiling with CUDA 8+ support we upgrade to static const arrays, and
lastly when CUDA is disabled we fully elevate to static constexpr.
1. Add a 'FastVec' class that copies input vector types to an efficient
Vec type on the stack. Specializations avoid copies of already efficient
types.
2. Update 'WorldCoordinatesToParametricCoordinates' functions to utilize the
'FastVec' class. This should improve performance when the passed in
vectors are of slow types like 'vtkm::VecFromPortalPermute'.
3. Since most input Vec types will convert to the same 'FastVec' type this
also reduces the code generations. Some code refactoring was required for
this.
When the implementation for the cell derivative functions was made, the
special cases for creating the Jacobian for wedge and pyramid cell
shapes was not working. Instead, it just used the hexahedra case for a
degenerate cell. This fixes the issues with the special cases.
There were multiple issues to be fixed. There were some complaints
by the compiler about types. There was a mistake in the pyramid table.
But the biggest issue was a problem with macro expansions. It was
the classic tale of forgetting that you have to encase parameters
in parenthesis to make sure that operator precedence comes out as
expected.
Created split implementation. Parallel radix
sort calls moved to vtkm_cont library.
Added key value radix sorts. SortByKey will invoke
radix sort when the key is a fundamental C++ numeric
or character type.
Added fast path for vtkm::SortLess and vtkm::SortGreater
calls to Sort and SortByKey.
If a global static array is declared with VTKM_EXEC_CONSTANT and the code
is compiled by nvcc (for multibackend code) then the array is only accesible
on the GPU. If for some reason a worklet fails on the cuda backend and it is
re-executed on any of the CPU backends, it will continue to fail.
We couldn't find a simple way to declare the array once and have it available
on both CPU and GPU. The approach we are using here is to declare the arrays
as static inside some "Get" function which is marked as VTKM_EXEC_CONT.
1147edb1 MarchingCubes now uses Gradient fast paths when possible.
d7d5da4f More changes to Neighborhood code to make it more easy to use.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !916
There are still some warnings left:
* Some text in markdown files are incorrectly picked up as
doxygen commands
* ArrayPortalTransform weirdly inherits from a specialized
version of itself. It's technically correct C++ code, but
gives doxygen fits.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
c5232e99 Simplify the implementation of vtkm::ForEach
6069c19f Brigand.hpp now works around CUDA 9 compiler issues.
6a4e91d5 ExecutionPolicy now handles CUDA9 removal of __CUDACC_VER__
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !902
1d5a4d47 Update ExternalFaces worklet to use hashes
fc7b90ac Add hash function
b1e6c1e3 Add method for getting a cononical edge id
8e72dc73 Add method for getting a cononical face id
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !900
Previously once an ArrayHandle was stolen it was placed in an invalid state
where it could not used again by VTK-m. Now instead after being stolen it
is placed into a state where it is identical to memory allocated outside
of VTK-m and passed in.
All types of cell sets should have a consistent interface. This is
tested (and fixed) for explicit and permutation cell sets. However, an
unfixed bug that has been identified is that permutation cell sets only
work for point to cell topologies. All others are not supported
correctly.
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.
Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB
WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB
libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
Previously when you permuted a structured cellset you would get a permuted
point coordinates. This would cause the Cell operations to take non
optimal paths.
Previously WorkletInvokeFunctor inherited from vtkm::exec::FunctorBase,
which is also the base class for all users Worklets and for all functors
based to DeviceAdapter::Schedule.
This is done for a few reasons. The first is that we reduce the
minimum size of user worklets. Previously the users worklet would hold
a reference to the error message, and so would the wrapper class added
when calling DeviceAdapter::Schedule. Now we only have the users worklet
holding a reference.
Second, by refactoring to have two base classes we can better improve
the documentation on what responsibilities FunctorBase.h has, compared
to TaskBase.
The current design for ArrayPortalVirtual makes it a requirement for all
array portals (that it wraps) to have Set defined. Thus, make sure Set is
defined for all ArrayPortal. Where Set is invalid, an assert is thrown if
something calls it at runtime.
Previously, ExternalFaces really only supported tetrahedral meshes that
have only triangular faces. These changes support all mixes of cells and
their faces.
Most functions in the CUDA runtime API return an error code that must be
checked to determine whether the operation completed successfully. Most
operations in VTK-m just called the function and assumed it completed
correctly, which could lead to further errors. This change wraps most
CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and
throws an exception if the call fails.
The cell derivative/gradient functions were all designed with scalars in
mind. Although the field type is templated and you could pass in a
vector type for the field, many of these classes would perform the
computation incorrectly. These changes specifically support derivatives
of vector types.
The shape array in CellSetExplicit was changed a while ago to be of type
vtkm::UInt8. However, the shape types remained at vtkm::IdComponent. I
can think of no reason why they should not be the same type.
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.
Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
8dadf560 Add in support for vector fields to Gradient worklet.
d53f43fb CellDerivaties now support computing the derivatives of vtkm::Vec fields.
b4378c85 Allow vtkm::Matrix to support T values which are vtkm::Vec.
9caabf97 vtkm::Vec now supports +=, -=, *=, and /=.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !608
This usage of raw_referenc_cast returns a const PortalValue which is than
assigned too. So to work around the problem we need to mark operator= on
the class as const.
There were many tests that created code paths for every base and Vec
type that VTK-m supports (up to 4 components). Although this is
admirable, it is also excessive, and our compile times for the tests are
very long.
To shorten compile times, remove the TryAllTypes method. Replace it with
a version of TryTypes that uses a default list of "exemplar" set of
integers, floats, and Vecs.
The UnitTestParametricCoordinates test uses a pseudorandom number
generator to create some random set of parametric coordinates, convert
to world coordinates, and then back to parametric coordinates.
This has been working fine except that the world coordinate to parametric
coordinate conversion is not extremely precise for some cell shapes. (It
would not be cost effective to make it more precise.) Because of this,
the test_equal for this comparison has a pretty high threshold.
While looking at a dashboard I happened across a failure for this test.
It turns out that one of the parametric coordinates created for the
pyramid test for seed 1465529014 was just outside of this threshold, but
otherwise correct. I raised the threshold a little to try to prevent
this error.
ThreadIndicies constructor was templated on the invocation type, which created
thousand's of versions of that symbol which all had the same behavior. So now
remove that and move that logic into a Worklet function since it requires
the invocation info.
These asserts are consolidated into the unified Assert.h. Also made some
minor edits to add asserts where appropriate and a little bit of
reconfiguring as found.
All the other math functions are in the vtkm package. This one was in
vtkm::exec because it uses a callback method. This can be problematic on
CUDA the the declaration of NewtonsMethod does not match the callback
method. However, we now have a VTKM_SUPPRESS_EXEC_WARNINGS macro that
allows a VTKM_EXEC_CONT_EXPORT function (like NewtonsMethod) to call
either a VTKM_EXEC_EXPORT or VTKM_CONT_EXPORT without a warning.
5b705a52 Fixing return value for void function
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !334
As part of the work to reduce the number of copies of array handles the CUDA
backend was broken. The transportation of stack allocated classes to CUDA
relies on all member variables being value based, not references/pointers.
This correct the issue of sending references to host side memory to CUDA, at
the cost of two copies of the Invocation object.
When we move to C++11 we need to revisit this work and see if std::move
can help reduce the cost of these copies.
Scatter in worklets
Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.
See merge request !221
I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.
I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.
The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.