Commit Graph

66 Commits

Author SHA1 Message Date
Robert Maynard
505e7aa1ec VTK-m now has defines for the CUDA version even when not using nvcc.
This is needed so so that the CUDA and C++ compiler generate the same code
when scanning a shared header but generating different translation units
2018-02-26 16:38:26 -05:00
Robert Maynard
41d968f68d Cuda ExecutionPolicy when using CUDA 7.5 is aware we use raw pointer now
The failure was caused by not updating CUDA 7.5 code paths when we
removed the usage of ::thrust::cuda::pointer.
2018-02-26 16:37:57 -05:00
Robert Maynard
22ea58335a iVTK-m CUDA backend doesn't use thrust::cuda::pointer any more.
This was removed as CUDA 9.0 on MSVC has issues where CUB/Thrust
would fail to compile when given these types.
2018-02-02 08:33:17 -05:00
Robert Maynard
93bc0198fe Suppress false positive warnings about calling host device functions. 2018-01-02 10:40:49 -05:00
Sujin Philip
5842da4921 Remove ArrayHandle CopyInto
Fixes #170
2017-10-27 17:28:59 -04:00
Kenneth Moreland
c3a3184d51 Update copyright for Sandia
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
2017-09-20 15:33:44 -06:00
Robert Maynard
6a4e91d5d1 ExecutionPolicy now handles CUDA9 removal of __CUDACC_VER__ 2017-08-24 12:59:16 -04:00
Robert Maynard
b9e69217ae Merge topic 'typedef_to_using_round_4'
f6863594 Convert VTK-m over to use 'using' instead of 'typedef'

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !885
2017-08-17 16:38:49 -04:00
Sujin Philip
72a6cf4a21 Change cuda calls to use the per-thread stream. 2017-08-17 11:03:02 -04:00
Robert Maynard
f68635941e Convert VTK-m over to use 'using' instead of 'typedef' 2017-08-17 10:47:25 -04:00
Robert Maynard
5dd346007b Respect VTK-m convention of parameters all or nothing on a line
clang-format BinPack settings have been disabled to make sure that the
VTK-m style guideline is obeyed.
2017-05-26 13:53:28 -04:00
Robert Maynard
60a405ef65 Add TaskTiling1D/3D which use faux virtuals to reduce binary size.
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.

Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB

WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB

libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
2017-05-25 11:00:01 -04:00
Kitware Robot
4ade5f5770 clang-format: apply to the entire tree 2017-05-25 07:51:37 -04:00
Kitware Robot
efbde1d54b clang-format: sort include directives 2017-05-18 12:59:33 -04:00
Robert Maynard
93a5662a7b Merge topic 'correct_missing_cuda_exec_include'
cc08589d Make sure ExecutionPolicy.h includes all headers it uses

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !756
2017-04-27 13:00:44 -04:00
Robert Maynard
cc08589df6 Make sure ExecutionPolicy.h includes all headers it uses 2017-04-24 13:34:39 -04:00
Robert Maynard
355eea887c Get the vtkm cont cuda object to compile properly. 2017-04-05 15:45:10 -07:00
Robert Maynard
9148bea396 Corrects ignorable warnings with msvc and cuda enabled.
These constant value warnings are ignorable as we are trying
to throw an assert.
2017-02-02 10:09:34 -05:00
Kenneth Moreland
629271bceb Make sure all ArrayPortals have a Set method.
The current design for ArrayPortalVirtual makes it a requirement for all
array portals (that it wraps) to have Set defined. Thus, make sure Set is
defined for all ArrayPortal. Where Set is invalid, an assert is thrown if
something calls it at runtime.
2017-01-31 15:46:39 -05:00
Kenneth Moreland
55c159d6f0 Check error codes from CUDA functions
Most functions in the CUDA runtime API return an error code that must be
checked to determine whether the operation completed successfully. Most
operations in VTK-m just called the function and assumed it completed
correctly, which could lead to further errors. This change wraps most
CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and
throws an exception if the call fails.
2016-12-14 10:43:44 -07:00
Kenneth Moreland
fdaccc22db Remove exports for header-only functions/methods
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.

Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
2016-11-15 22:22:13 -07:00
Robert Maynard
632d2a5211 Thrust 1.8.3 uses raw_reference_cast instead of a direct assignment operator
This usage of raw_referenc_cast returns a const PortalValue which is than
assigned too. So to work around the problem we need to mark operator= on
the class as const.
2016-11-15 17:03:59 -05:00
Robert Maynard
12810165bb Switch over to c++11 type_traits. 2016-08-31 16:11:26 -04:00
Kenneth Moreland
51a35cb4fe Fix warnings about type conversions 2016-06-27 07:50:15 -06:00
Kenneth Moreland
7ff20c9230 Fix includes for CUDA builds
The CMake CUDA build targets do not respect the
target_include_directories (yet?). Instead, add the necessary includes
to cuda_include_directories().
2016-06-22 12:53:23 -06:00
Robert Maynard
ba0a0b096b Simplify the fix for maxwell reduce by key bug. 2016-05-17 10:11:52 -04:00
Robert Maynard
e5c3f9c42d Solve reduce by key bugs with cuda 7.5 + maxwell hardware.
The concern is now all architectures are doing a hardware sync on reduce_by_key.
This isn't a super serious concern, but it is a downside.
2016-05-12 13:24:59 -04:00
Matt Larsen
eeae5c1352 Adding fast path for radix sort sort_by_key 2016-05-03 08:53:42 -07:00
Robert Maynard
bb90493920 Resolves Issue 52, we now install all vtkm files correctly. 2016-02-22 14:20:35 -05:00
Robert Maynard
bd3d29577a Fix ArrayPortalFromThrust to re-enable texture memory fast path. 2016-01-26 14:30:25 -05:00
Robert Maynard
b2cd41d765 Fix ArrayPortalFromThrust to re-enable texture memory fast path. 2016-01-26 14:29:52 -05:00
Kenneth Moreland
1a538ca196 Merge branch 'scatter-worklets' into 'master'
Scatter in worklets

Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.

See merge request !221
2015-11-11 13:09:47 -05:00
Robert Maynard
b3687c6f3c Workaround inclusive_scan issues in thrust 1.8.X for complex value types.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.

The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
2015-11-09 17:14:30 -05:00
Kenneth Moreland
f7789f0ed7 Fix issue with const types in Thrust array management
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.

Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
2015-11-06 18:05:21 -07:00
Robert Maynard
97550d5e2d Update Cuda so that UnaryPredictes work with fancy cuda array handles. 2015-11-03 13:28:07 -05:00
T.J. Corona
829c1b1f7f Install missing cuda device backend header. 2015-11-02 16:44:19 -05:00
Robert Maynard
056f69bf96 Remove unused variable and conversion warnings from cuda code. 2015-09-21 14:17:25 -04:00
Robert Maynard
9b877ef49b Merge topic 'multiple_backend_example'
fd685210 Always install all device headers even when device isn't enabled.
b1663b24 Add an example of using multiple backends from a single translation unit.
fc0ff69d Methods with try/catch need to be host only.
4d635d64 DeviceAdapter Tags now always exist, and contain if the device is valid.
cf32b430 Teach Configure.h to store if TBB and CUDA are enabled.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !198
2015-09-17 09:49:49 -04:00
Robert Maynard
fd68521066 Always install all device headers even when device isn't enabled.
vtkm_declare_headers now is able to not test headers, by using the
TESTABLE keyword.
2015-09-17 09:28:21 -04:00
Robert Maynard
1d97f886e0 Remove the thrust pragma statements that are not needed. 2015-09-15 14:20:56 -04:00
Robert Maynard
5b8cc44ed4 Merge branch 'improve_sort_perf_on_thrust' into 'master'
Tell thrust to use fast code paths when using our predicates and operators.

See merge request !176
2015-09-07 10:38:17 -04:00
Robert Maynard
72450e87f3 Make thrust use fast paths when doing sort and scan.
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.

Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan

Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0117049s
  median abs dev = 0.00324614s
  mean = 0.0167615s
  std dev = 0.00786269s
  min = 0.00845875s
  max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0234463s
  median abs dev = 0.000317249s
  mean = 0.021452s
  std dev = 0.00470307s
  min = 0.011255s
  max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0310486s
  median abs dev = 0.000182129s
  mean = 0.0286914s
  std dev = 0.00634102s
  min = 0.0116225s
  max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0310617s
  median abs dev = 0.000193583s
  mean = 0.0295779s
  std dev = 0.00491531s
  min = 0.0147257s
  max = 0.032307s
2015-09-03 16:00:37 -04:00
Robert Maynard
0d6dfb1e40 Make it possible to use Cuda TextureMemory from device/host method. 2015-09-03 11:52:40 -04:00
Robert Maynard
37403237c6 Allow us to still use __ldg texture load with the new VTKM_EXEC_CONT_EXPORT. 2015-09-02 11:34:36 -04:00
Robert Maynard
157d8efee4 Workaround thrust 1.8 inclusive scan issue.
Starting in thrust 1.8 the implementation of scan inclusive inside
thrust became highly optimized by using parallel task groups. This
new implementation has a bug that only exists when using custom
binary operators, large size arrays, release mode, and no
debugger or mem-checker attached.

While I have submitted the issue to thrust, we need to be able
to work around the existing issue. The solution I have chosen is
to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators
as being commutative as far as thrust is concerened. To make
sure we don't get any unexpected behavior I have also had
to create WrappedBinaryPredicate so that we don't mark any
predicate as commutative.
2015-08-17 10:39:14 -04:00
Robert Maynard
ab59e34a2f Rename pragma header guard so it makes sense for tbb and thrust.
Boost is not the only thirdparty that we are supressing warnings for, so
make the name more generic.
2015-08-13 09:04:23 -04:00
Robert Maynard
8204db2f6a Use VTKM_BOOST_PRE_INCLUDE around thrust headers too. 2015-08-13 08:26:41 -04:00
Kenneth Moreland
21b3b318ba Always disable conversion warnings when including boost header files
On one of my compile platforms, GCC was giving conversion warnings from
any boost include that was not wrapped in pragmas to disable conversion
warnings. To make things easier and more robust, I created a pair of
macros, VTKM_BOOST_PRE_INCLUDE and VTKM_BOOST_POST_INCLUDE, that should
be wrapped around any #include of a boost header file.
2015-07-30 17:40:40 -06:00
Robert Maynard
bb582ae4ec Update the cuda IteratorFromArrayPortal to use ptrdiff_t.
This make the advance / distance_to function signatures constant no matter
if we are building with 32/64 bit ids.
2015-07-29 09:57:42 -04:00
Robert Maynard
e74ded809a Defer more thrust iterator deduction logic to ArrayPortalToIterators. 2015-07-14 10:11:12 -04:00