This refactor aims to increase the performance of
AddField / Getfield at the expense of memory usage and
some bits of performances when only few fields are used
Moving to non-contiguous memory will impact cpu cache benefits,
however, since each of the Field has again another pointer to
its actual data, this benefits where actually just small.
Also the choice of map v.s. unordered_map is about the number of
elements, very likely few rehashing happenings from until <100
This will also reduce the memory fragmentation caused by vectors.
Signed-off-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
Previously, the policy specified which field types the filter should
operate on. The filter could remove some types, but it was not able to
add any types.
This is backward. Instead, the filter should specify what types its
supports and the policy may cull out some of those.
This change is needed for being able to use different thread indices types
without changing Fetchs. Basically decoupling those two areas.
1. This commit removes concrete specialization instantiations of
ThreadIndicesTypes in all of the Fetch's specializations.
2. It also moves the ThreadIndicesType template parameter from the Fetch
struct to a template parameter in their methods Load/Store.
Signed-off-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
This fixes, which where triggered since in the new CI, one of the
docker runner set `OMP_NUM_THREADS=3`:
1. `UnitTestOpenMPDeviceAdapter`
2. `UnitTestMeshQualityFilter`
In the redution optimized implementation for _OpenMP_, it unrolls
the reduce loop in iterations of four elements. The last iteration
in the loop might overflow the loop end element (when it is not a
multiple of four).
This commit fixes this by setting the OpenMP unrolled reduce loop
end element to its previous closest multiple of four of the original end
element.
Signed-off-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
If you gave ReduceByKey a fancy output array that decorated another
array, you could get a runtime error for using an invalid array (if the
device adapter used the generic algorithm). The problem was that
ReduceByKey creates a temporary array, and that array was given the same
storage as the output array. That might not be valid for fancy arrays,
so instead use the default storage for the temporary array.
If you gave ScanInclusiveByKey a fancy output array that decorated
another array, you would get a runtime error for using an invalid array.
The problem was that ScanInclusiveByKey creates a temporary output array
and then copies the result to the actual output array. The problem was
that the temporary output array was given the same storage as the output
array, which won't work if the output array is fancy. Instead, make the
storage for the temporary array default.
One of the dashboard compilers (gcc 4.8 with cuda, I think) was giving
warnings about an unused parameter. Apparently, the compiler did not
like the parameters defined for `default` assignment operators.
ef49a71aa Merge branch 'philox' of gitlab.kitware.com:ollielo/vtk-m into philox
563642400 remove outdated comments on type-punning
be7c18544 tried to supress warning about type punning
c795f74b2 Merge branch 'master' into philox
2b43fcb25 change function signature for work with MSVC
13e7ac362 Merge branch 'philox' of gitlab.kitware.com:ollielo/vtk-m into philox
1e2fd540d add Doxygen documentation
f2ea17917 add Doxygen documentation
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1990
The `ArrayHandleStreaming` class stems from an old research project
experimenting with bringing data from an `ArrayHandle` in parts and
overlapping device transfer and execution. It works, but only in very
limited contexts. Thus, it is not actually used today. Plus, the feature
requires global indexing to be permutated throughout the worklet
dispatching classes of VTK-m for no further reason.
Because it is not really used, there are other more promising approaches
on the horizon, and it makes further scheduling improvements difficult,
we are removing this functionality.
42bc9a393 Fix gaps in type support
dc112b516 Enable changing policy used for library compiles
76f870150 Type check input and output array arguments differently
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1997
With recent changes to allow a configuration to change the default
types, storage, and cell sets, it is possible to feed filters and other
components types they were not previously expecting. Fix feature gaps
where these components were not accepting the types they should.
Previously, the PolicyDefault used to compile all the filters was hard-
coded. The problem was that if any external project that depends on VTK-
m needs a different policy, it had to recompile everything in its own
translation units with a custom policy.
This change allows an external project provide a simple header file that
changes the type lists used in the default policy. That allows VTK-m to
compile the filters exactly as specified by the external project.
Read-only arrays (usually) do not define Set methods. Thus, using one in
an Invoke argument that does output will result in compile errors. To
help avoid that, modify the type checks to differentiate input and
output arrays.
As a general C++ "rule of three," if one of a copy constructor, copy
assignment, or destructor is defined, all three should be defined. Some
compilers issue warnings if this rule of three is violated.
It is sometimes the case that we define a destructor simply because it
is only valid in the control environment. When doing so, add
implementations for copy constructor and assignment as well.
Previously, the ArrayPortalCheck wrapper did not allow access to the
superclass' Get for 3D indices. This solves that problem and also fixes
it for Set (assuming there is ever an instance of that).
After lots of experimenting, it appears that VS 2015 has problems when
you list a variate template argument before a normal template argument
in a specialization of a function, The compiler seems happy when the
variate argument is placed at the end of the template arguments.
The nvcc compiler was having problem resolving a template partial
specialization that contained a template param with its own number
list and another param that depended on that template.
Currently, VTK-m is using C++11. However, it is often useful to use
features in the `std` namespace that are defined for C++14 or later. We
can provide our own versions (sometimes), but it is preferable to use
the version provided by the compiler if available.
There were already some examples of defining portable versions of C++14
and C++17 classes in a `vtkmstd` namespace, but these were sprinkled
around the source code.
There is now a top level `vtkmstd` directory and in it are header files
that provide portable versions of these future C++ classes. In each
case, preprocessor macros are used to select which version of the class
to use.
Made a new vtkm::Tuple class to replace tao tuple.
This version of Tuple should hopefully compile faster. Having our own
implementation should also make it easier to port to new devices.
Previously, when a ReadPortal or a WritePortal was returned from an
ArrayHandle, it had wrapped in it a Token that was attached to the
ArrayHandle. This Token would prevent other reads and writes from the
ArrayHandle.
This added safety in the form of making sure that the ArrayPortal was
always valid. Unfortunately, it also made deadlocks very easy. They
happened when an ArrayPortal did not leave scope immediately after use
(which is not all that uncommon).
Now, the ArrayPortal no longer locks up the ArrayHandle. Instead, when
an access happens on the ArrayPortal, it checks to make sure that
nothing has happened to the data being accessed. If it has, a fatal
error is reported to the log.
This is a flag that functions in the execution environment can return to
report on the status of the operation. This way they can report an error
without forcing the entire invocation to shut down.
Cell operations like interpolate and finding parametric coordinates can
fail under certain conditions. Typically these call RaiseError on the
worklet. But that can make a worklet unstable, so provide paths where no
error is raised.
When `ArrayPortalToken::Detach` was called, the contained `Token` was
detached, but `ArrayPortal` being wrapped might also be holding its own
portal that it is decorating. To ensure that any dependent portals are
also detached, `ArrayPortalToken::Detach` resets itself.
This fixes an issue where getting a `ReadPortal` or a `WritePortal` from
an `ArrayHandleVirtual` could cause a deadlock from a held token even if
the returned portal was detached or destroyed.
The problem was that `ArrayHandleVirtual` was keeping a reference to the
`ArrayPortal` from the concrete array. This was because the returned
`ArrayPortalRef`, which was designed to work on both control and
execution environments, had no good way to destroy the portal. This
meant that the `ArrayHandleVirtual` was caching a copy of the concrete
array's portal. This was not a great idea before because the array could
get invalidated. It is worse now because it keeps the concrete array
locked.
Fixed the problem by subclassing `vtkm::ArrayPortalRef` to make a
control-specific version that will delete the concrete portal on its own
destruction.
This commit also:
- Removes a corner case not longer used at ArrayPortalGroupVecVariable::get
- Changes doc regarding the number of offset elements in the input
array handler of ConvertNumComponentsToOffsets.
- Updates invokation of make_ArrayGroupVectVariable in multiple files
- Adds its corresponding changelog entry
1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1938
The ArrayPortalWrapper is used for both execution and control portals.
When it was wrapped around a control portal that does not work on CUDA
devices, we were getting ugly warnings even though the intention was
only to use it in the control environment.
With the new thread safety/token code, the vtkm_cont library now relies
on the pthreads library (or whatever threads library is used by std on
the system). Make sure this library gets added to vtkm_cont.
Apparently when future::get returns, it is not the case that all
resources of the future are cleaned up (only that the calling function
has returned). Do not rely on this resource cleanup for the test to
pass.
The type for PortalType was declared before the class from which the
type came from. Normally this was not a big deal since the template was
resolved later, but nvcc seemed to have a problem with it.
For an ArrayHandleTransform with no inverse functor, it really
only supports read-only portals. Thus, the non-const portal
for the execution environment was set to some unused control
portal. That was causing problems with CUDA, so make the
non-const portal valid (although without a Set).
Because ArrayPortalToken does not have an IteratorType,
ArrayPortalToIterators assumed it had to wrap it in an
IteratorFromArrayPortal object. Now it uses PortalSupportsIterators
(from ArrayPortalHelpers.h) to determine whether the iterators are
there. This does work with ArrayPortalToken.
To get a portal to access ArrayHandle values in the control
environment, you now use the ReadPortal and WritePortal methods.
The portals returned are wrapped in an ArrayPortalToken object
so that the data between the portal and the ArrayHandle are
guaranteed to be consistent.
This fixes an issue where moving a Token object left the original Token
in an invalid state because the poiner to the internals was NULL. Rather
than allocate a new one, just make the Token work correctly if the
internals are NULL.
It is questionable whether there is a point to having a token object
when transfering a virtual object to a device (since there is a handle
object that is managing it anyway). Back out of passing the token all
the way down unless there is an actual need for that.
Marked the old versions of PrepareFor* that do not use tokens as
deprecated and moved all of the code to use the new versions that
require a token. This makes the scope of the execution object more
explicit so that it will be kept while in use and can potentially be
reclaimed afterward.
When a single `ArrayHandle` is given to multiple arguments of a worklet
dispatch, the `PrepareFor*` methods will be called multiple times with
the same token. If one of them is a `PrepareForInPlace` or
`PrepareForOutput`, then the two requests will deadlock. To prevent
this, allow the `PrepareFor*` to happen if the same token was used
previously.
The old version of ExecutionObject (that only takes a device) is still
supported, but you will get a deprecated warning if that is what is
defined.
Supporing this also included sending vtkm::cont::Token through the
vtkm::cont::arg::Transport mechanism, which was a change that propogated
through a lot of code.
Duplicated the new versions of PrepareFor* methods from the basic
ArrayHandle that take a token in addition to the other arguments. The
ArrayHandle attaches itself to the token and will not allow operaitons
that make the returned portal invalid until the token goes out of scope.
Later the old versions will be deprecated.
Added new versions of PrepareFor* methods that take a token in addition
to the other arguments. The ArrayHandle attaches itself to the token and
will not allow operations that make the returned portal invalid until
the token goes out of scope.
Later the old versions will be deprecated.
1f61c500e Remove non-atomic ops from BitField unit test.
5565848d9 Use a dynamic strategy for openmp 1D scheduling.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1925
b9516c116 Correct CellSetStructured compile failures
00235874d Suppress more warning types from thirdparty includes
a52af2d13 Correct double to float warning in CellAspectFrobeniusMetric
cf5ebfb16 Suppress warning about extension use, since all compilers support it
27739660b Add missing constructors/assignment operators
123f8b01a Mark virtual destructors as override where applicable
54118dfca Use noexcept instead of throw() as it was deprecated in c++11
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1943
Having a custom assignment operator means that the compiler
isn't required to generate the implicit copy constructor.
This makes sure they are constructed.
Also discovered that many C++ compilers have trouble giving warnings
for partial specialization of classes marked as deprecated. Fix
the problem by instead deprecating the items in the class.
When expanding variadic parameter packs in ArrayHandleDecorator
implementations, make sure that the types used are appropriate.
Since std::declval is used to test whether or not a method with
specific arguments exists, it is important that the reference types
are correct to ensure that the detection works as expected.
Portals are always passed to implementation functions as rvalue
references, and array handles are always passed as lvalue refs.
The convenience functions `ArrayPortalToIteratorBegin()` and
`ArrayPortalToIteratorEnd()` wouldn't detect specializations of
`ArrayPortalToIterators<PortalType>` since the specializations aren't
visible when the `Begin`/`End` functions are declared.
Since the CUDA iterators rely on a specialization, the convenience
functions would not compile on CUDA.
Now, instead of specializing `ArrayPortalToIterators` to provide custom
iterators for a particular portal, the portal may advertise custom
iterators by defining `IteratorType`, `GetIteratorBegin()`, and
`GetIteratorEnd()`. `ArrayPortalToIterators` will detect such portals
and automatically switch to using the specialized portals.
This eliminates the need for the specializations to be visible to the
convenience functions and allows them to be usable on CUDA.
There were issues with the particle advection code where a small number
of work-heavy task invocations were needed. Since we were enforcing a
minimum of 1024 invocations per thread, this effectively serialized
scheduling.
Now the scheduler dynamically adjusts for small thread launches,
allowing finer scheduling.
4659d69c7 Remove some commented out code
aec75ab1a Suppress CUDA warning about device calling host
851864d0b Work around with Visual Studio 2015 issue
452a2e1c9 Suppress warnings about CUDA host/device mismatch
4fdefe9f1 Suppress some deprecated warnings in visual studio
5cfc14482 Implement old ListTag features with new ListTag implementations
d5fe4046c Remove instances of ListTag in favor of List
92db37623 Convert uses of ListTagBase to List
...
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1918
The destructors of some control side objects (such as CellSet and
ArrayHandle) are defined. These destructors are obviously only compiled
for the control environment (i.e. for CUDA only for the host). However,
not all of the subclasses implemented their own destructors. In CUDA,
when a default destructor is used, it is compiled for both host and
device. This caused a problem as the superclass's destructor was only
compiled for the host and therefore caused a warning.
Fixed the problem by defining an empty destructor to any subclasses that
needed one.
It's weird that I ran into this problem while chaning the List TMP
class, but the solution seems fine.
A new header named TypeList.h and the type lists have been redefined in
this new file. All the types have been renamed from `TypeListTag*` to
`TypeList*`. TypeListTag.h has been gutted to provide deprecated
versions of the old type list names.
There were also some other type lists that were changed from using the
old `ListTagBase` to the new `List`.
The newer List operations should still work on the old ListTags, so make
those changes first to ensure that everything still works as expected if
given an old ListTag.
Next step is to deprecate ListTagBase itself and move all the lists to
the new types.
When it was originally created, it was assumed that the ArrayHandle
class would be used by a single thread. As the use of VTK-m expands,
that is no longer a safe assumption. To ensure that operations on
ArrayHandle happen correctly, add a mutex to the Internals of
ArrayHandle and require all operations on the Internals lock that mutex.
There is some behavior of GCC compilers before GCC 9.0 that is
incompatible with the specification of OpenMP 4.0. The workaround was
using the workaround any time a GCC compiler >= 9.0 was used. The proper
behavior is to only use the workaround when the GCC compiler is being
used and the version of the compiler is less than 9.0.
Also, switch to using VTKM_GCC to check for the GCC compiler instead of
__GNUC__. The problem with using __GNUC__ is that many other compilers
pretend to be GCC by defining this macro, but in cases like compiler
workarounds it is not accurate.
Previously when ApplyPolicyFieldOfType was used in cases where not all
of the types matched not all of the storages, those invalid arrays were
replaced with an ArrayHandleDiscard. This unnecessarily increased the
length of the type names used for the resulting ArrayHandleMultiplexer.
Now, ListTagRemoveIf is used to remove any invalid arrays so that the
resulting ArrayHandleMultiplexer only includes the valid arrays.