how did any of this work?
match other CellSet file layouts.
???
compile in CUDA.
unit tests.
also only serial.
make error message accurate
Well, this compiles and works now.
Did it ever?
use CellShapeTagGeneric
UnitTest matches previous changes.
whoops
Fix linking problems.
Need the same interface
as other ThreadIndices.
add filter test
okay, let's try duplicating CellSetStructure.
okay
inching...
change to wedge in CellSetListTag
Means changing these to support it.
switch back to wedge from generic
compiles and runs
remove ExtrudedType
need vtkm_worklet
vtkm_worklet needs to be included
fix segment count for wedge specialization
need to actually save the index
for the other constructor.
specialize on Explicit
clean up warning
angled brackets not quotes.
formatting
401b12bd6 Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into add_polyLine
fea18190f Specialized cases for cell-edge functions on polylines.
d310ec3aa return type is void. Call vertex/line methods, then just return.
d6898b805 Fix cell deriv for polylines and remove print statements.
9157004ac Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into add_polyLine
d6e2e9588 Remove debugging print statements.
b9d109ab3 Fix for CellEdgeFace test. Case is identical to polygon.
d7e793861 Fix compiler warnings. Comment out std::cout usage for testing with cuda.
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1677
It is very easy to cause ODR violations with DeviceAdapterTagCuda.
If you include that header from a C++ file and a CUDA file inside
the same program we an ODR violation. The reasons is that the C++
versions will say the tag is invalid, and the CUDA will say the
tag is valid.
The solution to this is that any compilation unit that includes
DeviceAdapterTagCuda from a version of VTK-m that has CUDA enabled
must be invoked by the cuda compiler.
This adds an ExecutionSignature tag named Device that passes the
DeviceAdapterTag as an argument to the worklet's operator(). This allows
worklets to specialize their code based on the device.
CUDA devices have problems with recursive algorithms that have no well-
defined depth because the stack on a CUDA device tends to be pretty
short. Fix the problem for BoundingIntervalHierarchyExec by changing to
a state-machine based algorithm that follows the hierarchy up and down.
Mask objects allow you to specify which output values should be
generated when a worklet is run. That is, the Mask allows you to skip
the invocation of a worklet for any number of outputs.
The ArrayPortalValueReference is supposed to behave just like the value
it encapsulates and does so by automatically converting to the base type
when necessary. However, when it is possible to convert that to
something else, it is possible to get errors about ambiguous overloads.
To avoid these, add specialized versions of the operators to specify
which ones should be used.
Also consolidated the CUDA version of an ArrayPortalValueReference to the
standard one. The two implementations were equivalent and we would like
changes to apply to both.
This is only set while compiling device code, and is useful
for code that needs different implementations on devices (e.g.
they call CUDA device intrinsics, etc).
`vtkm::cont::testing` now initializes with logging enabled and support
for device being passed on the command line, `vtkm::testing` only
enables logging.
The purpose of the TestBuild infrastructure was to confirm that
VTK-m didn't have any lexical issues when it was a pure header
only project. As we now move to have more compiled components
the need for this form of testing is mitigated. Combined
with the issue of TestBuilds causing MSVC issues, we should
just remove this infrastructure.
Adds a variable `GlobalPointIndexStart` to `CellSetStructured`.
Adding this to the cell-set, instead of the coordinate system, enables this
feature for different types of datasets like uniform grid, rectilinear, etc.,
with this one change.
The extents can be computed using `GlobalPointIndexStart` and `PointDimensions`.
Did a bit of renaming of the support classes used for
WorkletPointNeighborhood. First, the OnBoundary tag is changed to
Boundary to match other tags and reflect some changes in the resulting
methods. Also moved the BoundaryState and Neighborhood classes from
vtkm::exec::arg to vtkm::exec to be more accessible. Finally, the
Neighborhood class name was changed to FieldNeighborhood to be more
specific on what role this class plays with neighborhood.
Previously, WorkletPointNeighborhood had a template argument to select
the size of the neighborhood. This change removes that template
argument. Instead, the vtkm::exec::arg::BoundaryState methods now take
in a size parameter when determining when it overlaps the boundary.
If in the future we want to add the ability to select the neighborhood
size at compile-time (for performance reasons), I suggest adding this
template argument to the OnBoundary tag for ExecutionSignature.
Previously, vtkm::exec::arg::BoundaryState only provided methods that
said whether or not the neighborhood extened past the boundary of a
mesh. That is fine for a 3x3x3 neighborhood, which can only extend over
the boundary by one. However, that is problematic for larger
neighborhoods where you may need to know how far neighborhood extends
over the boundary.
This changes allows you to query how far the neighborhood extends within
the constrains of the boundary.
2b0548739 Add ExecutionAndControlObjectBase
62d3b9f4f Correct warning about potential divide by zero
98a0a20fe Allow ArrayHandleTransform to work with ExecObject
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1403
This is a subclass of ExecutionObject and a superset of its
functionality. In addition to having a PrepareForExecution method, it
also has a PrepareForControl method that gets an object appropriate for
the control environment. This is helpful for situations where you need
code to work in both environments, such as the functor in an
ArrayHandleTransform.
Also added several runtime checks for execution objects and execution
and cotnrol objects.
96ae94420 Simplified execution object creation for atomic array
0bd197af9 moved TwoLevelUniformGridExecutionObject to vtkm/exec/internal
6ce895be8 simplified how atomic arrays create execution objects
f1ee5b92a fix a rebase error
25d140361 fix bad rabse for wireframer
f892695f1 fixing so wierd merging issue
9bb00ec66 moved the execution object for TwoLevelUniform grid to vrkm::exec
db1c9bfee Change the namespacing of atomic array
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1243
1. Have a per-thread pinned array for cuda errors
2. Check for errors before scheduling new tasks and at explicit sync points
3. Remove explicit synchronizations from most places
Addresses part 2 of #168
There was an error in TestingPointLocatorUniformGrid in which it was
creating arrays of type vtkm::Float32 and passing them to a worklet that
expected vtkm::FloatDefault. This is corrected.
- Reducing the stack allocation for CUDA for the BIH unit test
- Adding changes from Ken's review
- Suppress ptxas stack size warning for BoundingIntervalHierarchy
The vtkm::exec::PointLocatorUniformGrid has a default constructor. It
was "helpfully" declared as VTKM_EXEC_CONT, but apparently that is the
wrong thing to do for constructors that are set to default.
Previously when PointLocatorUniformGrid.h was compiled by the CUDA
compiler, the compiler would crash. Apparently during the ptxas
part of the compiler goes into a crazy recursion and runs out of
stack space. This appears to be a long-standing bug in CUDA
(been there for multiple releases) without a clear reason why it
sometimes rears its ugly head. (See for example
https://devtalk.nvidia.com/default/topic/1028825/cuda-programming-and-performance/-ptxas-died-with-status-0xc00000fd-stack_overflow-/)
The problem appears to be when having a doubly or triply nested
loop over a box of values to check in the uniform array. This
appears to fix the problem by converting that to a single for
loop with some index magic to convert that to 3D indices.
Vec class objects can now be constructed during compile-time
as constant expressions by calling Vec( T, ... ) constructors
or through brace-initialization.
Constant expression using fill constructor and nested vectors
of sizes greater than 4 are not supported yet.
Changes made to WrappedOperators.h for resolving overload
ambiguities in Vec construction and typecasting.
Appropriate test cases were added to UnitTestTypes.cxx.
Addresses issue #199.
Found via `codespell` and `grep`
more typos
includes source typo change and a typo that needs further review
follow-up typos
Follow-up typos
Revert a commit
183bcf109 Add initial version of an OpenMP backend.
7b5ad3e80 Expand device scheduler test to check for overlap.
e621b6ba3 Generalize the TBB radix sort implementation.
d60278434 Specialize swap for ArrayPortalValueReference types.
761f8986f Cache inputs to SSI::Unique benchmark.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1099
There was a direct edit to WorkletInvokeFunctorDetail.h that was not
reflected in WorkletInvokeFunctorDetail.h.in. This makes the two files
consistent so that future edits will not loose the changes.
This commit adds a worklet and a filter to compute
signed integral measures of cells.
It also begins the practic of adding a markdown
changelog entry in `doc/changelog`. See the
changelog entry for details.
While making changes to how execution objects work, we had agreed to
name the base object ExecutionObjectBase instead of its original name of
ExecutionObjectFactoryBase. Somehow that change did not make it through.
Rather than require all ExecutionObjectFactoryBase classes to declare a
templated ExecObjectType type, get the type of the execution object
directly from the result of the PrepareForExecution method.
In order to make the change from the current way execution obejcts are utilized to the new proposed executionObjectFactory process type checks now has to look for the new execution object factory class to check against.
When converting integer fields the interpolate code generates lots
of warnings that we are promoting into floating point space.
The quickest solution is to suppress these conversion warnings
all together for all interpolation functions.
adcabb03 VTK-m Vec<> constructors are now more conservative.
6267deb6 CellDerivativeFor3DCell has a better version for Vec of Vec fields.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1170
This allows for easier host side logic when determining grid and block
sizes, and allows for a smaller library side by moving some logic
into compiled in functions.
Previously we would compute a 3x3 matrix where each element was a Vec. Using
the jacobain of a single component is sufficient so by using that we safe 2 to
3 times the memory space.
Additionally by moving to doing the derivative as a per component algorithm
we work around the issues found in bug
https://gitlab.kitware.com/vtk/vtk-m/issues/221. In effect when trying to
construct a Vec of Vec from a component of a different floating type e.g.:
Vec3d x(0.0, 1.0, 2.0);
Vec3f a = x;
Vec3x3f b = x;
Vec3x3f c(x, x, x);
Generates bad values vectors such as b which look like:
b: [[0,0,0],[1,1,1],[2,2,2]]
c: [[0,1,2],[0,1,2],[0,1,2]]
The new and improved vtkm::cont::ColorTable provides a more feature complete
color table implementation that is modeled after
vtkDiscretizableColorTransferFunction. This class therefore supports different
color spaces ( rgb, lab, hsv, diverging ) and supports execution across all
device adapters.
Due to recent changes to remove static arrays that are not supported on
some devices, the function to return all the local point indices on a
face was removed. That left no way to get the structure of cell faces
short of pulling an internal data structure.
This change resurrects a function to get point indices for a face. The
interface for this method has necessarily changed, so I also changed the
corresponding function for getting edge indices.
These changes now allow VTK-m to compile on CUDA 7.5 by using const arrays,
when compiling with CUDA 8+ support we upgrade to static const arrays, and
lastly when CUDA is disabled we fully elevate to static constexpr.
1. Add a 'FastVec' class that copies input vector types to an efficient
Vec type on the stack. Specializations avoid copies of already efficient
types.
2. Update 'WorldCoordinatesToParametricCoordinates' functions to utilize the
'FastVec' class. This should improve performance when the passed in
vectors are of slow types like 'vtkm::VecFromPortalPermute'.
3. Since most input Vec types will convert to the same 'FastVec' type this
also reduces the code generations. Some code refactoring was required for
this.
When the implementation for the cell derivative functions was made, the
special cases for creating the Jacobian for wedge and pyramid cell
shapes was not working. Instead, it just used the hexahedra case for a
degenerate cell. This fixes the issues with the special cases.
There were multiple issues to be fixed. There were some complaints
by the compiler about types. There was a mistake in the pyramid table.
But the biggest issue was a problem with macro expansions. It was
the classic tale of forgetting that you have to encase parameters
in parenthesis to make sure that operator precedence comes out as
expected.
Created split implementation. Parallel radix
sort calls moved to vtkm_cont library.
Added key value radix sorts. SortByKey will invoke
radix sort when the key is a fundamental C++ numeric
or character type.
Added fast path for vtkm::SortLess and vtkm::SortGreater
calls to Sort and SortByKey.
If a global static array is declared with VTKM_EXEC_CONSTANT and the code
is compiled by nvcc (for multibackend code) then the array is only accesible
on the GPU. If for some reason a worklet fails on the cuda backend and it is
re-executed on any of the CPU backends, it will continue to fail.
We couldn't find a simple way to declare the array once and have it available
on both CPU and GPU. The approach we are using here is to declare the arrays
as static inside some "Get" function which is marked as VTKM_EXEC_CONT.
1147edb1 MarchingCubes now uses Gradient fast paths when possible.
d7d5da4f More changes to Neighborhood code to make it more easy to use.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !916
There are still some warnings left:
* Some text in markdown files are incorrectly picked up as
doxygen commands
* ArrayPortalTransform weirdly inherits from a specialized
version of itself. It's technically correct C++ code, but
gives doxygen fits.
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
c5232e99 Simplify the implementation of vtkm::ForEach
6069c19f Brigand.hpp now works around CUDA 9 compiler issues.
6a4e91d5 ExecutionPolicy now handles CUDA9 removal of __CUDACC_VER__
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !902
1d5a4d47 Update ExternalFaces worklet to use hashes
fc7b90ac Add hash function
b1e6c1e3 Add method for getting a cononical edge id
8e72dc73 Add method for getting a cononical face id
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !900
Previously once an ArrayHandle was stolen it was placed in an invalid state
where it could not used again by VTK-m. Now instead after being stolen it
is placed into a state where it is identical to memory allocated outside
of VTK-m and passed in.
All types of cell sets should have a consistent interface. This is
tested (and fixed) for explicit and permutation cell sets. However, an
unfixed bug that has been identified is that permutation cell sets only
work for point to cell topologies. All others are not supported
correctly.