Commit Graph

106 Commits

Author SHA1 Message Date
Kenneth Moreland
1a538ca196 Merge branch 'scatter-worklets' into 'master'
Scatter in worklets

Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.

See merge request !221
2015-11-11 13:09:47 -05:00
Robert Maynard
b3687c6f3c Workaround inclusive_scan issues in thrust 1.8.X for complex value types.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.

The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
2015-11-09 17:14:30 -05:00
Kenneth Moreland
f7789f0ed7 Fix issue with const types in Thrust array management
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.

Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
2015-11-06 18:05:21 -07:00
Patricia Kroll Fasel - 090207
4757c0ae9e Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-04 13:28:08 -07:00
Patricia Kroll Fasel - 090207
a5f1f823ae Set default device to cuda in unit test of DataSetExplicit to bypass
compiler errors in CellSetExplicit.
2015-11-04 10:16:44 -07:00
Patricia Kroll Fasel - 090207
480f0bd416 Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-03 13:48:23 -07:00
Robert Maynard
1b30d6e6de Update Cuda so that SumExclusiveScan supports fancy iterators. 2015-11-03 13:28:07 -05:00
Robert Maynard
97550d5e2d Update Cuda so that UnaryPredictes work with fancy cuda array handles. 2015-11-03 13:28:07 -05:00
Patricia Kroll Fasel - 090207
02e16e7e25 Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-03 11:15:26 -07:00
Sujin Philip
1b8fe17f1b Fix for several warnings 2015-11-03 09:11:38 -05:00
Sujin Philip
fd244c4142 Fix errors and warnings caused by recent changes to device adapter tag logic 2015-11-03 09:11:38 -05:00
Robert Maynard
71cf2a7d92 Fix return statement in DeviceAdapterAlgorithmThrust. 2015-11-02 16:46:02 -05:00
Patricia Kroll Fasel - 090207
d167c75596 Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-02 11:09:07 -07:00
Patricia Kroll Fasel - 090207
ed0ecf284d Parallel CellToPoint initial code. 2015-10-30 13:59:36 -06:00
Robert Maynard
85d28667c2 Add a return statement to reduce to stop false positive warnings. 2015-10-30 14:55:13 -04:00
Kenneth Moreland
b861209a22 Fix nvcc warnings on MSVC
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.

Thanks to Tom Fogal from NVIDIA for these fixes.
2015-10-21 08:33:15 -06:00
Robert Maynard
8de216c088 Propagate vtkm::Id3 scheduling down to the ThreadIndex classes.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
2015-10-20 09:29:41 -04:00
T.J. Corona
b1665dcb32 Add an array handle for bare cuda device pointers.
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).
2015-10-09 12:41:33 -04:00
Robert Maynard
f38673f618 Replace ErrorControlOutOfMemory with ErrorControlBadAllocation. 2015-10-01 14:25:28 -04:00
Robert Maynard
056f69bf96 Remove unused variable and conversion warnings from cuda code. 2015-09-21 14:17:25 -04:00
Sujin Philip
1d2657f360 Make cuda DeviceAdapter valid only when using nvcc
Before it was valid even on a regular compiler, if cuda was available.
2015-09-18 12:21:11 -04:00
hschroot
20c1a04894 CopyInto function for ArrayHandles
ArrayHandles in DAX have a CopyInto function which allows the user to copy an array handle's data into a compatible STL type iterator. Originally this was fairly straight forward to implement since array handles in DAX are templated on the DeviceAdapterTag. In contrast, VTKm array handles use a polymorphic ArrayHandleExecutionManager under the hood allowing a single array handle to interface with multiple devices at runtime. To achieve this virtual functions are used. This makes implementing the CopyInto function difficult since it is templated on the IteratorType and virtual functions cannot be templated.

To work around this, I've implemented a concrete templated CopyInto function in the class derived from ArrayHandleExecutionManagerBase. In the ArrayHandle class, CopyInto dynamically casts the base class into the derived class, then calls the CopyInto function defined in the derived class.

The drawback to this approach is that, should the user define their own class that inherits from ArrayHandleExectionManagerBase, they are not forced to implement the CopyInto function, unlike the other virtual functions.
2015-09-17 14:26:19 -04:00
Robert Maynard
fd68521066 Always install all device headers even when device isn't enabled.
vtkm_declare_headers now is able to not test headers, by using the
TESTABLE keyword.
2015-09-17 09:28:21 -04:00
Robert Maynard
4d635d642b DeviceAdapter Tags now always exist, and contain if the device is valid.
Previously it was really hard to verify if a device adapter was valid. Since
you would have to check for the existence of the tag. Now the tag always
exists, but instead you query the traits of the DeviceAdapter to see if
it is a valid adapter.

This makes compiling with multiple backends alot easier.
2015-09-17 09:28:21 -04:00
Robert Maynard
6272fdcc54 Correct multiple signature compile issue. 2015-09-08 09:39:57 -04:00
Robert Maynard
aa7f5c34b9 Resolves Issue #42. Now all thrust API calls are in try/catch blocks. 2015-09-07 12:20:31 -04:00
Robert Maynard
72450e87f3 Make thrust use fast paths when doing sort and scan.
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.

Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan

Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0117049s
  median abs dev = 0.00324614s
  mean = 0.0167615s
  std dev = 0.00786269s
  min = 0.00845875s
  max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0234463s
  median abs dev = 0.000317249s
  mean = 0.021452s
  std dev = 0.00470307s
  min = 0.011255s
  max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0310486s
  median abs dev = 0.000182129s
  mean = 0.0286914s
  std dev = 0.00634102s
  min = 0.0116225s
  max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0310617s
  median abs dev = 0.000193583s
  mean = 0.0295779s
  std dev = 0.00491531s
  min = 0.0147257s
  max = 0.032307s
2015-09-03 16:00:37 -04:00
Robert Maynard
8422108f28 Cuda copy from host to device can't use the cuda execution policy. 2015-09-02 09:53:00 -04:00
Robert Maynard
efc9f0c5cf All occurrences of thrust invocation uses an execution policy. 2015-09-01 19:32:49 -04:00
Sujin Philip
514ac54e59 Add custom operator and initial value support to ExclusiveScan 2015-08-28 09:56:04 -04:00
Robert Maynard
619103b202 Update cuda ScanExclusive to handle dereferencing device only arrays. 2015-08-26 11:51:02 -04:00
Robert Maynard
157d8efee4 Workaround thrust 1.8 inclusive scan issue.
Starting in thrust 1.8 the implementation of scan inclusive inside
thrust became highly optimized by using parallel task groups. This
new implementation has a bug that only exists when using custom
binary operators, large size arrays, release mode, and no
debugger or mem-checker attached.

While I have submitted the issue to thrust, we need to be able
to work around the existing issue. The solution I have chosen is
to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators
as being commutative as far as thrust is concerened. To make
sure we don't get any unexpected behavior I have also had
to create WrappedBinaryPredicate so that we don't mark any
predicate as commutative.
2015-08-17 10:39:14 -04:00
Robert Maynard
ab59e34a2f Rename pragma header guard so it makes sense for tbb and thrust.
Boost is not the only thirdparty that we are supressing warnings for, so
make the name more generic.
2015-08-13 09:04:23 -04:00
Robert Maynard
8204db2f6a Use VTKM_BOOST_PRE_INCLUDE around thrust headers too. 2015-08-13 08:26:41 -04:00
Robert Maynard
bae6ff7f55 Merge branch 'introduce_binary_and_unary_operators' into 'master'
Introduce binary and unary operators

See merge request !94
2015-08-06 15:14:28 -04:00
Robert Maynard
d3fd571ef2 Add vtkm/UnaryPredicates header.
Currently includes the following predicates:
  - IsDefaultConstructor
  - NotDefaultConstructor
  - LogicalNot
2015-07-30 13:12:59 -04:00
Robert Maynard
d9fd702b1c Make detecting if we are cuda 3+ gpu running cuda 2 code faster.
The original implementing tried to run 2^31 kernels and detect a
launch failure to determine this use-case. The issue with this approach
is that on a cuda 3+ gpu, this would take multiple seconds and cause
the gpu to terminate the kernel when opengl was also loaded.
2015-07-28 17:04:24 -04:00
Robert Maynard
19aa6b8d62 Update the 3d scheduling benchmark code to use the new 1d scheduler 2015-07-23 16:31:41 -04:00
Robert Maynard
d0d11640ea Cuda now can schedule worklets that require more than 2B instances. 2015-07-23 16:31:28 -04:00
Sujin Philip
c7d3d0df5c Merge branch 'add-compute-bounds' into 'master'
Add compute bounds to Fields

See merge request !88
2015-07-22 14:26:48 -04:00
Sujin Philip
91b191bf83 Add compute bounds to Fields 2015-07-22 12:17:33 -04:00
Robert Maynard
780f3fea29 Merge branch 'correct_cuda_scheduling_over_8m' into 'master'
Correct cuda scheduling over 8m

See merge request !91
2015-07-20 15:01:05 -04:00
Robert Maynard
9f16669e3c Increase the robustness of 3d scheduling when X dim is very small. 2015-07-20 13:27:58 -04:00
Robert Maynard
7a21a08c46 Handle 1D scheduling with over 65k blocks on SM2.X arch.
The initial implementation forgot about the fact that SM2.X architectures
can only handle 65k blocks. Now we gracefully handle when compiling for
SM2.X.
2015-07-20 13:26:16 -04:00
Sujin Philip
1a9e8d1e3d Initial support for generating documentation using Doxygen 2015-07-17 15:35:59 -04:00
Robert Maynard
e74ded809a Defer more thrust iterator deduction logic to ArrayPortalToIterators. 2015-07-14 10:11:12 -04:00
Robert Maynard
4ba1f7c853 Remove the need for any portal to define an IteratorType. 2015-07-13 17:16:27 -04:00
Kenneth Moreland
4fc3626712 Fix compiler directives for icc
The Intel icc compiler tries to pretend it is gcc, but it sometimes
behaves differently. Add more explicit checks for what compiler is
being used.
2015-07-06 10:35:06 -06:00
Robert Maynard
d12375450f Make ChooseCudaDevice only consider SM3 and above equal. 2015-06-30 08:18:16 -04:00
Robert Maynard
6af949f488 Fix warnings in ChooseCudaDevice by moving performance metric to double. 2015-06-30 08:16:42 -04:00