Commit Graph

412 Commits

Author SHA1 Message Date
Sujin Philip
452f61e290 Add Kokkos backend 2020-08-12 13:55:24 -04:00
Kenneth Moreland
18b5be92d6 Fix issue with CUDA and ArrayHandleMultiplexer
When you try to call the `Reduce` operation in the CUDA device adapter
with a sufficently complex interator type, you get a compile error
that says `error: cannot pass an argument with a user-provided
copy-constructor to a device-side kernel launch`.

This appears to be a bug in either nvcc or Thrust. I believe it is
related to the following reported issues:

* https://github.com/thrust/thrust/issues/928
* https://github.com/thrust/thrust/issues/1044

Work around this problem by making a special condition for calling
`Reduce` with an `ArrayHandleMultiplexer` that calls the generic
algorithm in `DeviceAdapterAlgorithmGeneral` instead of the algorithm in
Thrust.
2020-07-06 13:51:36 -06:00
Kenneth Moreland
a47fd42bc1 Pin user provided memory in ArrayHandle
Often when a user gives memory to an `ArrayHandle`, she wants data to be
written into the memory given to be used elsewhere. Previously, the
`Buffer` objects would delete the given buffer as soon as a write buffer
was created elsewhere. That was a problem if a user wants VTK-m to write
results right into a given buffer.

Instead, when a user provides memory, "pin" that memory so that the
`ArrayHandle` never deletes it.
2020-06-25 14:02:46 -06:00
Kenneth Moreland
56bec1dd7b Replace basic ArrayHandle implementation to use Buffers
This encapsulates a lot of the required memory management into the
Buffer object and related code.

Many now unneeded classes were deleted.
2020-06-25 14:02:26 -06:00
Kenneth Moreland
8f7b0d18be Add Buffer class
The buffer class encapsulates the movement of raw C arrays between
host and devices.

The `Buffer` class itself is not associated with any device. Instead,
`Buffer` is used in conjunction with a new templated class named
`DeviceAdapterMemoryManager` that can allocate data on a given
device and transfer data as necessary. `DeviceAdapterMemoryManager`
will eventually replace the more complicated device adapter classes
that manage data on a device.

The code in `DeviceAdapterMemoryManager` is actually enclosed in
virtual methods. This allows us to limit the number of classes that
need to be compiled for a device. Rather, the implementation of
`DeviceAdapterMemoryManager` is compiled once with whatever compiler
is necessary, and then the `RuntimeDeviceInformation` is used to
get the correct object instance.
2020-06-25 14:01:39 -06:00
Kenneth Moreland
4f9fa08fa1 Remove ArrayHandleStreaming capabilities
The `ArrayHandleStreaming` class stems from an old research project
experimenting with bringing data from an `ArrayHandle` in parts and
overlapping device transfer and execution. It works, but only in very
limited contexts. Thus, it is not actually used today. Plus, the feature
requires global indexing to be permutated throughout the worklet
dispatching classes of VTK-m for no further reason.

Because it is not really used, there are other more promising approaches
on the horizon, and it makes further scheduling improvements difficult,
we are removing this functionality.
2020-03-24 15:01:56 -06:00
Robert Maynard
8377806778 Merge topic 'introduce_mapfield_3d_scheduling'
1f1688483 Initial infrastructure to allow WorkletMapField to have 3D scheduling

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1938
2020-02-27 08:02:52 -05:00
Kenneth Moreland
ec34cb56c4 Use new ways to get array portal in control environment
Also fix deadlocks that occur when portals are not destroyed
in time.
2020-02-26 13:10:46 -07:00
Robert Maynard
1f1688483e Initial infrastructure to allow WorkletMapField to have 3D scheduling 2020-02-25 15:23:41 -05:00
Kenneth Moreland
3671cbe168 Fix token issues with CUDA 2020-02-25 09:39:30 -07:00
Kenneth Moreland
ad0a53af71 Convert execution preparation to use tokens
Marked the old versions of PrepareFor* that do not use tokens as
deprecated and moved all of the code to use the new versions that
require a token. This makes the scope of the execution object more
explicit so that it will be kept while in use and can potentially be
reclaimed afterward.
2020-02-25 09:39:19 -07:00
Kenneth Moreland
76ce9c87f0 Support using Token calling PrepareForExecution in ExecutionObject
The old version of ExecutionObject (that only takes a device) is still
supported, but you will get a deprecated warning if that is what is
defined.

Supporing this also included sending vtkm::cont::Token through the
vtkm::cont::arg::Transport mechanism, which was a change that propogated
through a lot of code.
2020-02-25 07:41:39 -07:00
Allison Vacanti
46b7155bdb Add 64-bit CUDA atomic store. 2020-01-08 10:58:51 -05:00
Allison Vacanti
539f6e5ad7 Port benchmarking framework to Google Benchmark. 2020-01-08 10:58:51 -05:00
Allison Vacanti
44c4f0838f Add vtkm/Algorithms.h header with device-friendly binary search algorithms. 2019-12-20 12:35:10 -05:00
Allison Vacanti
813f5a422f Fixup custom portal iterator logic.
The convenience functions `ArrayPortalToIteratorBegin()` and
`ArrayPortalToIteratorEnd()` wouldn't detect specializations of
`ArrayPortalToIterators<PortalType>` since the specializations aren't
visible when the `Begin`/`End` functions are declared.

Since the CUDA iterators rely on a specialization, the convenience
functions would not compile on CUDA.

Now, instead of specializing `ArrayPortalToIterators` to provide custom
iterators for a particular portal, the portal may advertise custom
iterators by defining `IteratorType`, `GetIteratorBegin()`, and
`GetIteratorEnd()`. `ArrayPortalToIterators` will detect such portals
and automatically switch to using the specialized portals.

This eliminates the need for the specializations to be visible to the
convenience functions and allows them to be usable on CUDA.
2019-12-17 15:39:51 -05:00
Kenneth Moreland
92db376236 Convert uses of ListTagBase to List 2019-12-06 15:37:46 -07:00
Kenneth Moreland
5ab0b5bb1d Access ArrayHandle internals in a critical section
Repeat the changes of the previous commit with the specialized
ArrayHandle for basic storage.
2019-11-20 14:42:58 -07:00
Robert Maynard
5c56ff945f Label tests which exercise a given Device Adapter
This allows developers an easy way to run all OpenMP tests
2019-09-13 15:52:40 -04:00
Allison Vacanti
b9affb7edc Disable copy for RAII helper. 2019-09-09 17:59:38 -04:00
Allison Vacanti
ea0bbfeefc Increase CUDA stack size for ParticleAdvection worklets.
Sometimes the CUDA runtime would not allocate sufficient stack
space for the particle advection code to run. This issue was exposed by
!1737 -- for some reason, once those changes to unrelated filters/worklets
are added to VTK, CUDA allocates less stack and the following tests would
fail:

UnitTestLagrangianFilterCUDA
UnitTestLagrangianStructuresFilterCUDA
UnitTestStreamlineFilterCUDA
UnitTestStreamSurfaceFilterCUDA

These were fixed by increasing the stack size in the particle advection
worklet Run(...) methods.

An RAII helper has been added that will restore the previous stack size
in case an exception is thrown, and the KDTree code has been updated
to use this helper when it adjusts the CUDA stack allocation.
2019-09-09 16:06:23 -04:00
Allison Vacanti
884616788a Simplify and extend AtomicArray implementation.
- Use AtomicInterface to implement device-specific atomic operations.
- Remove DeviceAdapterAtomicArrayImplementations.
- Extend supported atomic types to include unsigned 32/64-bit ints.
- Add a static_assert to check that AtomicArray type is supported.
- Add documentation for AtomicArrayExecutionObject, including a CAS
  example.
- Add a `T Get(idx)` method to AtomicArrayExecutionObject that does
  an atomic load, and update existing CAS usage to use this instead
  of `Add(idx, 0)`.
2019-08-23 15:40:37 -04:00
Allison Vacanti
0e728c8000 Update atomic interfaces to support Add/CAS for UInt32/64.
These will be used for the AtomicArray implementation.
2019-08-23 15:40:37 -04:00
Allison Vacanti
112024dae2 Fix CUDA shfl usage.
There was a bug in the implementations of CountSetBits and
BitFieldToUnorderedSet.
2019-08-01 10:57:57 -04:00
Kenneth Moreland
5e23853521 Create ArrayHandleMultiplexer 2019-07-22 08:36:28 -06:00
Mark Kim
8dbb1c4de3 Merge branch 'master' of gitlab.kitware.com:m-kim/vtk-m into advdatamodel 2019-06-26 19:37:47 -04:00
Allison Vacanti
920ef9b3b9 Merge topic 'bit_algorithms'
f370857c1 Add CountSetBits and Fill device algorithms.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1696
2019-06-25 15:42:16 -04:00
Allison Vacanti
f370857c15 Add CountSetBits and Fill device algorithms. 2019-06-25 11:30:39 -04:00
Mark Kim
699b57191f Merge branch 'master' of gitlab.kitware.com:m-kim/vtk-m into advdatamodel 2019-06-25 10:36:47 -04:00
Mark Kim
cffd3873fc Merge branch 'advdatamodel' 2019-06-20 22:20:44 -04:00
Mark Kim
6e1d3a84f0 First Extrude commit.
how did any of this work?

match other CellSet file layouts.

???

compile in CUDA.

unit tests.

also only serial.

make error message accurate

Well, this compiles and works now.

Did it ever?

use CellShapeTagGeneric

UnitTest matches previous changes.

whoops

Fix linking problems.

Need the same interface

as other ThreadIndices.

add filter test

okay, let's try duplicating CellSetStructure.

okay

inching...

change to wedge in CellSetListTag

Means changing these to support it.

switch back to wedge from generic

compiles and runs

remove ExtrudedType

need vtkm_worklet

vtkm_worklet needs to be included

fix segment count for wedge specialization

need to actually save the index

for the other constructor.

specialize on Explicit

clean up warning

angled brackets not quotes.

formatting
2019-06-20 22:17:24 -04:00
Robert Maynard
1ea386222e cuda copy functions don't launch on length zero arrays 2019-06-20 16:54:23 -04:00
Robert Maynard
8aaf922aa4 Introduce a log level that details kernel launch parameters 2019-06-18 15:01:07 -04:00
Kenneth Moreland
f11702ae92 Fix for rogue definition of PASCAL macro 2019-06-05 10:09:49 -06:00
Robert Maynard
4020f51988 RuntimeDeviceTracker can't be copied and is only accessible via reference.
As the RuntimeDeviceTracker is a per thread construct we now make
it explicit that you can only get a reference to the per-thread
version and can't copy it.
2019-05-20 11:43:05 -04:00
Robert Maynard
d1ce4a0bca Fix the default launch sizes for Tesla hardware.
The 8x8x8 is a better launch strategy for most VTK-m kernels.
The current problem is that a couple of VTK-m kernels use a
high number of registers and this number of threads combines to
require too many registers.

What we should do in the longer run is have more controls over
kernel launches on a per kernel basis. This will require VTK-m
to extract the number of registers being used by each kernel
2019-05-06 16:12:15 -04:00
Robert Maynard
770912f991 Correct compiler issues found with GCC 4.8.5 + CUDA 9.2 on summit 2019-05-02 10:27:48 -04:00
Robert Maynard
065d117838 Testing Device Adapter now uses ArrayHandle for all device transfers
The consistent API for control to execution memory transfers is
the ArrayHandle class. Previously the tests would verify memory
transfer by calling the ArrayManagerExecution class directly. This
is problematic as the class isn't used by ArrayHandle<T, StorageBasic>.
2019-04-30 13:50:08 -04:00
Robert Maynard
63c931e639 Correct location of ThrustPatches which clang formatter moved 2019-04-23 15:02:58 -04:00
Robert Maynard
ff687016ee For VTK-m libs all includes of DeviceAdapterTagCuda happen from cuda files
It is very easy to cause ODR violations with DeviceAdapterTagCuda.
If you include that header from a C++ file and a CUDA file inside
the same program we an ODR violation. The reasons is that the C++
versions will say the tag is invalid, and the CUDA will say the
tag is valid.

The solution to this is that any compilation unit that includes
DeviceAdapterTagCuda from a version of VTK-m that has CUDA enabled
must be invoked by the cuda compiler.
2019-04-22 10:39:54 -04:00
nadavi
fbcea82e78 conslidate the license statement 2019-04-17 10:57:13 -06:00
Robert Maynard
6c5c197a37 Merge topic 'support_cuda_scheduling_parameters_via_runtime'
047b64651 VTK-m now provides better scheduling parameters controls

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1643
2019-04-17 10:04:19 -04:00
Robert Maynard
047b646517 VTK-m now provides better scheduling parameters controls
VTK-m now offers a more GPU aware set of defaults for kernel scheduling.
When VTK-m first launches a kernel we do system introspection and determine
what GPU's are on the machine and than match this information to a preset
table of values. The implementation is designed in a way that allows for
VTK-m to offer both specific presets for a given GPU ( V100 ) or for
an entire generation of cards ( Pascal ).

Currently VTK-m offers preset tables for the following GPU's:
- Tesla V100
- Tesla P100

If the hardware doesn't match a specific GPU card we than try to find the
nearest know hardware generation and use those defaults. Currently we offer
defaults for
- Older than Pascal Hardware
- Pascal Hardware
- Volta+ Hardware

Some users have workloads that don't align with the defaults provided by
VTK-m. When that is the cause, it is possible to override the defaults
by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`.
As shown below:

```cpp
  ScheduleParameters CustomScheduleValues(char const* name,
                                          int major,
                                          int minor,
                                          int multiProcessorCount,
                                          int maxThreadsPerMultiProcessor,
                                          int maxThreadsPerBlock)
  {

    ScheduleParameters params  {
        64 * multiProcessorCount,  //1d blocks
        64,                        //1d threads per block
        64 * multiProcessorCount,  //2d blocks
        { 8, 8, 1 },               //2d threads per block
        64 * multiProcessorCount,  //3d blocks
        { 4, 4, 4 } };             //3d threads per block
    return params;
  }
  vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);
```
2019-04-17 08:32:16 -04:00
Robert Maynard
ff30684c8e Removes the default device macros from VTK-m
Fixes #116
2019-04-15 08:15:36 -04:00
Robert Maynard
a5dbe1ece3 Merge topic 'bitfields'
661fb64de AtomicInterfaceControl functions are marked with VTKM_SUPPRESS_EXEC_WARNINGS
0c70f9b9a Add BitFieldIn/Out/InOut worklet signature tags.
a66510e81 Add ArrayHandleBitField, a boolean-valued AH backed by a BitField.
56cc5c3d3 Add support for BitFields.
d01b97382 Allow VTKM_SUPPRESS_EXEC_WARNINGS to be used inside macros.
2f2ca9370 Add bit operations FindFirstSetBit and CountSetBits to Math.h.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1629
2019-04-11 12:32:03 -04:00
Allison Vacanti
56cc5c3d3a Add support for BitFields.
BitFields are:
- Stored in memory using a contiguous buffer of bits.
- Accessible via portals, a la ArrayHandle.
- Portals operate on individual bits or words.
- Operations may be atomic for safe use from concurrent kernels.

The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle
containing the indices of all set bits, in no particular order.

The new AtomicInterface classes provide an abstraction into bitwise
atomic operations across control and execution environments and are used
to implement the BitPortals.
2019-04-11 08:27:17 -04:00
Robert Maynard
89ec4aae2f Reduction on CUDA handles different input and output types better
When reducing an input type that differs from the output type
you need to write a custom binary operator that also implements
how to do the unary transformation.
2019-04-10 14:44:44 -04:00
Robert Maynard
1d20ae4f7b Move DeviceAdapterTag to vtkm/cont 2019-04-04 11:58:51 -04:00
Robert Maynard
7f612502ac Merge topic 'remove_unneeded_cont_exec_markup'
f1056affa Move select functions to host only to remove host/device suppressions
4f2156dfa Thrust detail::aligned_reinterpret_cast doesn't warn now
f4840618c Make sure ThrustPatches is included before thrust.
b2bbd66e6 Merge branch 'upstream-taotuple' into update_taoo
4ec6fc812 taotuple 2019-04-03 (8e70fa8a)

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1607
2019-04-04 09:32:04 -04:00
Sujin Philip
c6bead8388 Rename CellLocatorTwoLevelUniformGrid to CellLocatorUniformBins
Also make it a concrete sub-class of vtkm::cont::CellLocator
Fixes issue #251
2019-04-03 10:21:56 -04:00
Robert Maynard
f4840618cf Make sure ThrustPatches is included before thrust. 2019-04-03 08:51:05 -04:00
Robert Maynard
b9e0e541b8 VTK-m once again uses consistent include style 2019-03-28 14:12:08 -04:00
Robert Maynard
256e0c3c11 Merge topic 'rename_to_GetRuntimeDeviceTracker'
ae11e115a RuntimeDeviceTracker: Remove `Global` from names

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1592
2019-03-24 08:17:02 -04:00
Robert Maynard
ae11e115a0 RuntimeDeviceTracker: Remove Global from names 2019-03-22 08:53:26 -07:00
Robert Maynard
6cdf6cb672 Less aggressive defaults for VTK-m compared to summit.
Since we don't have per system checks currently built into
vtk-m we can't use the tuned values for Summit, as they
don't run on all our hardware.
2019-03-20 09:30:34 -07:00
Robert Maynard
3879479185 Improve VTK-m cuda scheduling based on Summit scaling study
When benchmarking the VTK-m algorithms on Summit I discovered
that our scheduling choices aren't optimal for the hardware.

This is a short term fix where we select good numbers for
Summit, and in the future make the defaults controllable
by the calling programming and/or environment variables.

Performance numbers can be found at:
  https://gitlab.kitware.com/snippets/755
2019-03-20 09:30:34 -07:00
Kenneth Moreland
4d9ce24888 Synchronize CUDA timer when stopping it
Previously, when Stop was called on a Cuda timer, it would record a stop
event but it would not synchronize it at that time. Instead, the
synchronize was only called when GetElapsedTime was called. The problem
is that the time of the event is only marked when synchronize is called.
Thus, if the event completed before GetElapsedTime was called, it would
record the time from when the event acutally happened to the time when
GetElapsedTime was called as part of the elapsed time, which is
incorrect.

Fix the problem by synchronizing when Stop is called. Although this
makes the Timer more invasive, generally using the Timer can cause
synchronization to happen. This behavior is consistent with the Timer
implementation for other devices.
2019-02-28 15:08:32 -07:00
Kenneth Moreland
85265a9c84 Add const correctness to Timer
It should be possible to query a vtkm::cont::Timer without modifying it.
As such, its query functions (such as Stopped and GetElapsedTime) should
be const.
2019-02-28 15:08:16 -07:00
Haocheng LIU
0696ae135e Merge topic 'asynchronize-timer'
415252c66 Introduce asynchronous and device independent timer

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Haocheng LIU <haocheng.liu@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1530
2019-02-05 12:02:59 -05:00
Haocheng LIU
415252c662 Introduce asynchronous and device independent timer
The timer class now is asynchronous and device independent. it's using an
similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(),
and GetElapsedTime() function. For convenience and backward compability, Each
Start() function call will call Reset() internally and each GetElapsedTime()
function call will call Stop() function if it hasn't been called yet for keeping
backward compatibility purpose.

Bascially it can be used in two modes:

* Create a Timer without any device info. vtkm::cont::Timer time;

  * It would enable timers for all enabled devices on the machine. Users can get a
specific elapsed time by passing a device id into the GetElapsedtime function.
If no device is provided, it would pick the maximum of all timer results - the
logic behind this decision is that if cuda is disabled, openmp, serial and tbb
roughly give the same results; if cuda is enabled it's safe to return the
maximum elapsed time since users are more interested in the device execution
time rather than the kernal launch time. The Ready function can be handy here
to query the status of the timer.

* Create a Timer with a device id. vtkm::cont::Timer time((vtkm::cont::DeviceAdapterTagCuda()));

  * It works as the old timer that times for a specific device id.
2019-02-05 12:01:56 -05:00
Robert Maynard
d0a70946b8 Simplify the DeviceAdapterRuntimeDetectorCuda to not do a kernel launch.
The kernel launch component of the runtime device adapter is fairly
pointless. If the hardware supports CUDA we should expect that
VTK-m has the correct kernel versions.

Plus in the original version if the CUDA device was being used
and the kernel launch returns cudaErrorDevicesUnavailable it
was never possible to restore CUDA support. Now what happens
is that the runtime tracker is marked as failed, but the
calling code can always go back and trying the device again.
2019-02-04 13:27:20 -05:00
Robert Maynard
5508d17c31 Merge topic 'correct_broken_install'
24e71d251 VTK-m yet again has properly installed headers.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1525
2019-01-24 14:59:41 -05:00
Robert Maynard
24e71d251b VTK-m yet again has properly installed headers.
Fixes the install issues mentioned in #342
2019-01-24 14:26:40 -05:00
Allison Vacanti
03fc7b66d0 Add VTKM_CUDA_DEVICE_PASS preprocessing definition.
This is only set while compiling device code, and is useful
for code that needs different implementations on devices (e.g.
they call CUDA device intrinsics, etc).
2019-01-24 11:23:45 -05:00
Robert Maynard
d6f66d17a3 Testing run methods now take argc/argv to init logging/runtime device
`vtkm::cont::testing` now initializes with logging enabled and support
for device being passed on the command line, `vtkm::testing` only
enables logging.
2019-01-17 13:16:27 -06:00
Robert Maynard
4ec5bae02d Remove VTK-m TestBuild infrastructure
The purpose of the TestBuild infrastructure was to confirm that
VTK-m didn't have any lexical issues when it was a pure header
only project. As we now move to have more compiled components
the need for this form of testing is mitigated. Combined
with the issue of TestBuilds causing MSVC issues, we should
just remove this infrastructure.
2019-01-16 10:04:33 -06:00
Kenneth Moreland
2e426ad547 Run the update-control-signature-tags.sh script 2019-01-11 12:23:10 -07:00
Robert Maynard
628dce822e Merge topic 'require_cmake38'
f1e1a524e Require CMake 3.8 to build VTK-m.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1514
2019-01-09 17:02:52 -05:00
Abhishek Yenpure
afd0409189 Merge topic 'code_sprint_locator_fixes'
9b56d41fe Fixing Rectilinear Grid Cell Locator
10e9d47dc Removing std::out print statement from test
34c7b57d8 Merge branch 'code_sprint_locator_fixes' of gitlab.kitware.com:ayenpure/vtk-m into code_sprint_locator_fixes
62ee1a2c8 Updates to the Cell Locators
7eb0de5b7 Merge branch 'code_sprint_locator_fixes' of gitlab.kitware.com:ayenpure/vtk-m into code_sprint_locator_fixes
866b0798d Resolving type warnings
c062f2e26 Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into code_sprint_locator_fixes
797c83891 Adding default constructor and removing wrong comment
...

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1395
2019-01-09 16:23:17 -05:00
Robert Maynard
f1e1a524e9 Require CMake 3.8 to build VTK-m. 2019-01-09 16:01:22 -05:00
ayenpure
62ee1a2c8a Updates to the Cell Locators
- Adding updates to uniform grid cell locator
  - adding OpenMP test, updating copyrights
- Adding rectilinear grid cell locator
  - adding unit tests for serial, tbb, OpenMP, and cuda
- Updating CMakeLists to honor the alphabetical ordering
2019-01-06 17:18:23 -08:00
Robert Maynard
718caaaeac CudaAllocator allows managed memory to be explicitly disabled 2018-12-28 11:30:29 -05:00
Robert Maynard
90bb23de6b CudaAllocator::Initialize correctly uses managed memory when possible
Previously the logic would always think managed memory wasn't supported
2018-12-20 17:21:55 -05:00
ayenpure
c062f2e26c Merge branch 'master' of https://gitlab.kitware.com/vtk/vtk-m into code_sprint_locator_fixes 2018-12-03 07:44:31 -08:00
Allison Vacanti
16c4dde2ee Merge topic 'cuda10_warning'
0e105eae6 cudaPointerAttributes::isManaged deprecated in CUDA 10.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1430
2018-10-10 15:05:57 -04:00
Allison Vacanti
0e105eae6d cudaPointerAttributes::isManaged deprecated in CUDA 10.
Update code to support both the old and new way of checking this.
2018-10-10 13:51:56 -04:00
Allison Vacanti
bd337854ec Initial implementation of general logging.
Addresses #291.
2018-10-02 11:37:55 -04:00
Kenneth Moreland
98a0a20feb Allow ArrayHandleTransform to work with ExecObject
This change allows you to set a subclass of
vtkm::cont::ExecutionObjectBase as a functor
used in ArrayHandleTransform. This latter class will then detect that
the functor is an ExecObject and will call PrepareForExecution with the
appropriate device to get the actual Functor object.

This change allows you to use virtual objects and other device dependent
objects as functors for ArrayHandleTransform without knowing a priori
what device the portal will be used on.
2018-09-05 13:11:04 -06:00
ayenpure
22ca8bce15 Fixing unit test 2018-08-30 10:19:00 -07:00
ayenpure
42e2bb7f9a Updating files with copyrights 2018-08-29 19:46:49 -07:00
ayenpure
594d1934d4 Adding CellLocatorUniformGrid
- Adding a cell locator to locate points in a uniform grid
- Adding unit tests for the new cell locator
2018-08-29 19:30:07 -07:00
Kenneth Moreland
d879188de0 Make DispatcherBase invoke using a TryExecute
Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.

Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.
2018-08-29 19:18:54 -07:00
Allison Vacanti
024a75821d Make DeviceAdapterId constructor protected.
This forces users to use a defined tag, since they shouldn't need
to create their own.
2018-08-24 16:38:08 -04:00
Haocheng LIU
7d22132253 Merge topic 'allow-disabling/enabling-cuda-managed-memory'
e34301eca Allow disabling/enabling of CUDA managed memory via an env variable

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !1359
2018-08-17 13:14:02 -04:00
Haocheng LIU
e34301eca8 Allow disabling/enabling of CUDA managed memory via an env variable
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is
capable of doing so.
2018-08-17 11:10:15 -04:00
Sujin Philip
1212081de1 Support deferred freeing for CUDA memory
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of
times this happens by having a deferred free mechanism that frees a pool
of pointers together when a threshold is reached.

Especially helpful during virtual object transfers that requires a few small
allocations and frees.
2018-08-16 12:05:36 -04:00
Allison Vacanti
f6da092146 Use CUDA_ARCH instead of CUDACC to guard device-only code.
CUDACC is defined when compiling host code under nvcc, while
CUDA_ARCH is only defined for host code.
2018-08-09 11:57:05 -04:00
Allison Vacanti
2c079b96dd Make AtomicArrays work on CUDA 8.
CUDA 8.0 is erroring out in the cuda AtomicArray implementation:

https://open.cdash.org/viewBuildError.php?buildid=5489156

This patch fixes the error. See comments in source for more info.
2018-08-08 15:26:32 -04:00
Haocheng LIU
ce9cd8072a Use std::call_once to construct singeltons
By using `call_once` from C++11, we can simplify the logic in code
where we are querying same value variables from multiple threads.
2018-08-06 16:36:03 -04:00
Sujin Philip
259d670ab5 Merge topic 'cuda-per-thread-streams-2'
06dee259f Minimize cuda synchronizations

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1288
2018-07-25 15:07:39 -04:00
Robert Maynard
e031e64967 ExecutionArrayInterfaceBasic<T> explicitly construct DeviceAdapterId objects
Rather than implicitly presume the `VTKM_DEVICE_ADAPTER_` macros can
convert to DeviceAdapterId.
2018-07-25 12:04:30 -04:00
Robert Maynard
86b9ab9969 Refactor ExecutionArrayInterfaceBasic to use inheriting constructors 2018-07-25 12:03:48 -04:00
Robert Maynard
4240111dd8 Make sure VirtualObjectHandle tests include RuntimeDeviceTracker 2018-07-18 10:37:46 -04:00
Robert Maynard
8077b031a8 Merge topic 'uncomment_cuda_range_test'
1e478bbe6 Re-enable UnitTestCudaComputeRange

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1321
2018-07-17 13:28:05 -04:00
Robert Maynard
1e478bbe63 Re-enable UnitTestCudaComputeRange 2018-07-17 11:43:19 -04:00
Robert Maynard
bf49575e00 Remove unneeded typeinfo includes 2018-07-17 11:41:53 -04:00
Robert Maynard
64958b014b VTK-m now supports passing pointers when invoking worklets.
The original design of invoke and the transport infrastructure
relied on the implementation behavior of vtkm::cont types
such as ArrayHandle that used an internal shared_ptr to managed
state. This allowed passing by value instead of passing by
non-const ref when needing to transfer information to the device.

As VTK-m adds support for classes that use virtuals the ability
to pass by base pointer type allows for us to invoke worklets
using a base type without the risk of type slicing.

Additional by moving over to a non-const ref Invocation we
can update all transports that have 'output' to now be
by ref and therefore support types that can't be copied while
being 'more' correct.
2018-07-06 14:27:36 -04:00
Robert Maynard
9238cedcab Merge topic 'ice_nvcc_on_renar'
5ced0da8f Try to ice the ubuntu 17.10 + cuda 9.1 compiler

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1305
2018-07-05 11:36:16 -04:00
Robert Maynard
5ced0da8f5 Try to ice the ubuntu 17.10 + cuda 9.1 compiler 2018-07-05 09:14:52 -04:00
Robert Maynard
e5090e1289 Make sure the PointLocatorUniform uses the correct runtime device 2018-07-03 17:42:57 -04:00