Commit Graph

412 Commits

Author SHA1 Message Date
Sujin Philip
8c242cef91 Switch from faux to true virtuals 2017-11-06 15:25:29 -05:00
Robert Maynard
a6eecbe9ac ExecutionArrayInterface now can hint at how allocated memory will be used.
Certain backends desire the ability to mark allocations as being used for
reading versus writing to improve performance.
2017-11-02 10:12:57 -04:00
Matthew Letter
24d0e7766e Merge remote-tracking branch 'remotes/origin/master' into cmake_refactor 2017-10-31 16:57:41 -06:00
Sujin Philip
5842da4921 Remove ArrayHandle CopyInto
Fixes #170
2017-10-27 17:28:59 -04:00
Robert Maynard
ed8f4111ef Update all the code to work with CMake 3.3
Obviously this does mean that CUDA is not supported with 3.3.
2017-10-27 15:30:14 -04:00
Robert Maynard
56c7362258 A thought on what CMake 3.9 would mean to VTK-m. 2017-10-27 15:29:51 -04:00
Li-Ta Lo
3acd7c37a1 Merge topic 'pointlocator'
ed3a64a5 Coding style improvment
7fa800b7 Update TestingPointLocatorUniformGrid.h
f1974cab Update TestingPointLocatorUniformGrid.h
508882fa PointLocatorUniformGrid

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !973
2017-10-25 10:42:07 -04:00
Allison Vacanti
5a99dd761b Only use cuda hints for CUDA 8.0+. 2017-10-24 11:55:07 -04:00
Li-Ta Lo
508882fa21 PointLocatorUniformGrid
Provide an accelerated neareast neighbor search of points in the
dataset using a one layer uniform grid.
2017-10-19 11:44:36 -06:00
Allison Vacanti
1018d981a0 Check for overlap in CopySubRange.
Some parallel copy implementations will not handle this sanely.
2017-10-11 16:52:32 -04:00
Sujin Philip
41679cb5f9 Add a CellLocator
Implements a two-level uniform grid cell locator
2017-10-10 14:01:41 -04:00
Robert Maynard
f8f1adc962 Forward decleare DeviceAdapterAlgorithm correctly as a struct 2017-10-06 09:50:12 -04:00
Allison Vacanti
75f88b4c46 Add versioning to VTKM installed include/share dirs. 2017-10-02 11:39:10 -04:00
Robert Maynard
311618a15f Enable highest level of warnings(W4) under MSVC
This will make VTK-m warning level match the one used by VTK. This commit
also resolves the first round of warnings that W4 exposes.
2017-09-22 13:04:28 -04:00
Kenneth Moreland
c3a3184d51 Update copyright for Sandia
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
2017-09-20 15:33:44 -06:00
Allison Vacanti
00320e5dc0 Use vtkm::UInt64 for byte sizes.
vtkm::Id could be just 32bits, which limits the number of values we
would be able to store for large arrays.
2017-09-19 09:21:14 -04:00
Allison Vacanti
28ab480a40 Fix warnings on renar. 2017-09-18 15:33:02 -04:00
Robert Maynard
b9e69217ae Merge topic 'typedef_to_using_round_4'
f6863594 Convert VTK-m over to use 'using' instead of 'typedef'

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !885
2017-08-17 16:38:49 -04:00
Sujin Philip
72a6cf4a21 Change cuda calls to use the per-thread stream. 2017-08-17 11:03:02 -04:00
Robert Maynard
f68635941e Convert VTK-m over to use 'using' instead of 'typedef' 2017-08-17 10:47:25 -04:00
Robert Maynard
89f439999a Reduce the amount of typedef statements in DeviceAdapters
By using the auto keyword and decltype we can reduce the number of
complex typedefs that exist when writing device adapter algorithms.
The goal being that it is easier for developers to see the actual
algorithms being implemented, by reducing the amount of template
'noise'.
2017-08-16 14:23:21 -04:00
Allison Vacanti
326757b571 Remove ArrayHandleCuda.
!861 (b0dba9a1) adds this functionality to basic ArrayHandles.
2017-08-10 15:21:52 -04:00
Allison Vacanti
0a828189ad Reuse user-provided cuda device pointers when possible. 2017-08-03 17:21:13 -04:00
David C. Lonie
bd042ec567 Add CudaAllocator to encapsulate runtime managed memory logic.
Unified memory is now used when we detect that the hardware supports it.
2017-07-31 09:08:27 -04:00
Robert Maynard
76532264c3 Correct signed to unsigned warnings in ArrayManagerExecutionCuda. 2017-07-20 10:42:09 -04:00
David C. Lonie
379c3a0fad Use current device when allocating managed memory. 2017-07-13 12:55:22 -04:00
Li-Ta Lo
65910f139c put return value back to ScanInclusivePortal 2017-07-10 12:11:51 -06:00
Li-Ta Lo
16b61d8697 Make ScanInclusiveByKey and ScanInclusiveByKey void functions.
These two algorithms does not return meaningful return values. Generic interface and
implementation are both void. Remove erronous return type and statement for CUDA backend.
2017-07-07 15:11:42 -06:00
Robert Maynard
09a08fea8d Correct errors in TaskTuner found from the TestBuilds. 2017-07-07 08:40:03 -04:00
Robert Maynard
c11f29c093 Move the parameter sweeping code to a separate header.
The parameter sweeping code is only enabled when tuning for new GPU's
so we should move it to a separate header to make DeviceAdapterAlgorithmThrust
easier to read.
2017-07-05 14:58:19 -04:00
David C. Lonie
b2c3e41645 Refactor array transfer logic for basic storage.
The old templated array transfer mechanism generated a lot of code
that ended up doing a simple, type-agnostic memcpy for most devices.
This patch specialized array handles for basic storage and uses a
fast-path array transfer implementation. This reduces the size of the
vtkm_cont library by 27% on gcc (from 6.2MB to 4.5MB).
2017-06-29 13:18:44 -04:00
Robert Maynard
cfda0593be ArrayManagerExecutionThrustDevice stops generating casting warnings. 2017-06-16 08:50:32 -04:00
Robert Maynard
5dd346007b Respect VTK-m convention of parameters all or nothing on a line
clang-format BinPack settings have been disabled to make sure that the
VTK-m style guideline is obeyed.
2017-05-26 13:53:28 -04:00
Robert Maynard
60a405ef65 Add TaskTiling1D/3D which use faux virtuals to reduce binary size.
Redesigns the TBB and Serial backends and the vtkm::exec::Task concept so that
we can re-use the same launching logic for all Worklets, instead of generating
per worlet code. To keep the performance the same the TilingTask now is past
a range of indices to work on, rather than a single index.

Binary size reduction:
WorkletTests_SERIAL old - 19MB
WorkletTests_SERIAL new - 18MB

WorkletTests_TBB old - 39MB
WorkletTests_TBB new - 18MB

libvtkAcceleratorsVTKm old - 48MB
libvtkAcceleratorsVTKm new - 19MB
2017-05-25 11:00:01 -04:00
Kitware Robot
4ade5f5770 clang-format: apply to the entire tree 2017-05-25 07:51:37 -04:00
Kitware Robot
efbde1d54b clang-format: sort include directives 2017-05-18 12:59:33 -04:00
Robert Maynard
022c36fa4f Add vtkm::exec::TaskBase, and rename WorkletInvokeFunctor to TaskSingular
Previously WorkletInvokeFunctor inherited from vtkm::exec::FunctorBase,
which is also the base class for all users Worklets and for all functors
based to DeviceAdapter::Schedule.

This is done for a few reasons. The first is that we reduce the
minimum size of user worklets. Previously the users worklet would hold
a reference to the error message, and so would the wrapper class added
when calling DeviceAdapter::Schedule. Now we only have the users worklet
holding a reference.

Second, by refactoring to have two base classes we can better improve
the documentation on what responsibilities FunctorBase.h has, compared
to TaskBase.
2017-05-02 16:38:43 -04:00
Sujin Philip
e9898cc5cf Merge topic 'virtual-methods'
4049b5b2 Add ClipWithImplicitFunction Filter
82d02e46 Modify ImplicitFunctions to use Virtual Methods
968960c1 Add Virtual Methods Framework

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !750
2017-05-02 16:12:04 -04:00
Sujin Philip
82d02e46ef Modify ImplicitFunctions to use Virtual Methods 2017-05-01 16:55:59 -04:00
Sujin Philip
968960c1a1 Add Virtual Methods Framework 2017-05-01 16:51:42 -04:00
Robert Maynard
80b9d74a23 Merge topic 'embed_more_into_vtkm_cont'
ec6589d3 Only enable -fPIC on component static libraries when necessary.
cbfe5fdd Fix up various issues with ArrayHandles in vtkm_cont.
355eea88 Get the vtkm cont cuda object to compile properly.
6ecc22bb First pass at compiling ArrayHandle into vtkm_cont.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !715
2017-04-26 13:47:10 -04:00
Li-Ta Lo
0ba9784082 Merge topic 'scanbykey'
5c735a38 this should resovle all the type conversion warnings
d8b02329 Merge branch 'scanbykey' of gitlab.kitware.com:ollielo/vtk-m into scanbykey
58ef7c8d one more attemp to get the data type right
22d0e355 attempt to fix warning on type conversion
ded4583a attempt to fix warning on type conversion
897b2f0f add tests for 1, 2 and ARRAY_SIZE elements for both ScanInclusiveByKey and ScanExclusiveByKey
c05a2c32 Merge branch 'scanbykey' of gitlab.kitware.com:ollielo/vtk-m into scanbykey
0e97fcb9 handle the case of 0 or 1 element in the input for ScanExclusiveByKey
...

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !746
2017-04-25 15:54:25 -04:00
Li-Ta Lo
897b2f0f63 add tests for 1, 2 and ARRAY_SIZE elements for both ScanInclusiveByKey and ScanExclusiveByKey 2017-04-19 13:38:28 -06:00
Li-Ta Lo
a205f21043 make ScanExclusiveByKey return void, rearrange parameter ordering 2017-04-17 16:11:02 -06:00
Li-Ta Lo
7023266585 add both generic and Thrust ScanExclusiveByKey 2017-04-17 15:03:49 -06:00
Li-Ta Lo
e77f9fac6a add CUDA implementation of ScanInclusiveByKey using Thrust library 2017-04-14 11:25:25 -06:00
David C. Lonie
4807b3c472 Silence warnings about unavoidable weak vtables.
- Exception classes cannot be exported due to MSVC's design decisions.
  See http://stackoverflow.com/questions/24511376. We must leave these
  classes as header only and silence the warnings.
- TransferResource in BufferState.h must remain a header-only class since
  there is no vtkm_interop library to compile the class into.
- The VTKDataSetReader hierarchy must similarly remain header-only since
  there is no vtkm_io library.
- The OptionParser Action classes are part of a header-only utility and
  cannot be easily compiled into a library.
-
2017-04-13 14:06:33 -04:00
David C. Lonie
cbfe5fddd9 Fix up various issues with ArrayHandles in vtkm_cont. 2017-04-05 15:45:11 -07:00
Robert Maynard
355eea887c Get the vtkm cont cuda object to compile properly. 2017-04-05 15:45:10 -07:00
David C. Lonie
6ecc22bb8c First pass at compiling ArrayHandle into vtkm_cont. 2017-04-05 15:45:01 -07:00
Li-Ta Lo
2bdc0be5ca add cuda calls for memory advise as per Tom Fogel 2017-03-14 14:19:01 -06:00
Li-Ta Lo - 194699
6ce8a0135a Merge branch 'master' into unified-memory 2017-03-09 14:54:03 -07:00
Li-Ta Lo - 194699
b470175f98 new unified memory effort with the new Thrust device 2017-03-09 14:51:45 -07:00
Sujin Philip
9eddce6c99 Rename StreamCompact to CopyIf
Plus, removes the version that uses one array as both input and stencil.
2017-03-06 11:08:27 -05:00
Sujin Philip
8c4bbc39ad Use C++11 =delete keyword 2017-02-24 09:39:22 -05:00
Sujin Philip
a88807fd7e Catch all exceptions by reference 2017-02-23 13:25:01 -05:00
David Lonie
a0016456cc Merge topic 'default_device_to_public_header'
7a41621d Move default device selection out of private headers.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !693
2017-02-17 08:26:01 -05:00
David C. Lonie
7a41621d82 Move default device selection out of private headers.
This will make the librarification of vtk-m easier as we tread that
path.

Refs #120.
2017-02-16 13:40:35 -05:00
Tom Fogal
31e20859d4 Fix typo. 2017-02-14 10:29:00 -08:00
Li-Ta Lo - 194699
835073dae2 clean up with custom allocator 2017-02-13 11:45:17 -07:00
David C. Lonie
f601e38ba8 Simplify exception hierarchy.
Remove the ErrorControl class such that all subclasses now inherit from
error. Renamed all exception classes via s/ErrorControl/Error/.

See issue #57.
2017-02-07 15:42:38 -05:00
Robert Maynard
6e4e07b378 Merge topic 'all_array_portals_have_set'
629271bc Make sure all ArrayPortals have a Set method.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !676
2017-02-01 09:08:51 -05:00
Kenneth Moreland
629271bceb Make sure all ArrayPortals have a Set method.
The current design for ArrayPortalVirtual makes it a requirement for all
array portals (that it wraps) to have Set defined. Thus, make sure Set is
defined for all ArrayPortal. Where Set is invalid, an assert is thrown if
something calls it at runtime.
2017-01-31 15:46:39 -05:00
David C. Lonie
575d74d143 Manage cuda device memory with cudaMalloc instead of thrust::vector. 2017-01-30 15:36:37 -05:00
Christopher Sewell
82c40a6374 First support for unified memory 2017-01-18 11:43:49 -07:00
Robert Maynard
bd1ff7a5ac Allow vtkm errors to properly work with shared libraries. 2017-01-16 09:17:38 -05:00
Robert Maynard
d8d6fd1741 StorageTags are now always exported to resolve future dynamic_cast issues.
Class that need to be passed across dynamic library boundaries such as
DynamicArrayHandle need to be properly export. One of 'tricks' of this
is that templated classes such as PolymorphicArrayHandleContainer need
the Type and Storage types to have public visibility.

This makes sure that all vtkm storage tags have public visibility so
in the future we can transfer dynamic array handles across libraries.
2017-01-16 09:17:38 -05:00
Kenneth Moreland
7ec8c03489 Remove warning about constant conditional in VTKM_ASSUME
It is common practice to swallow a semicolon in a macro using a do/
while(false) loop that only executes once. However, the Visual Studio
2013 compiler was stupidly warning about having a constant expression
for the loop. Get around the problem by creating our own swallow
semicolon macros that disable this warning as necessary.
2017-01-10 10:40:17 -07:00
Kenneth Moreland
98c8cb8657 Do not attempt to execute CUDA kernels with no blocks
I noticed that when I attempted to execute a CUDA kernel with 0 blocks,
it generated a CUDA error. The error did not really matter since nothing
was really supposed to run anyway. However, once we are careful about
checking CUDA errors, it will cause test failures and likely other
misdiagnoses.
2016-12-15 11:33:48 -07:00
Kenneth Moreland
55c159d6f0 Check error codes from CUDA functions
Most functions in the CUDA runtime API return an error code that must be
checked to determine whether the operation completed successfully. Most
operations in VTK-m just called the function and assumed it completed
correctly, which could lead to further errors. This change wraps most
CUDA calls in a VTKM_CUDA_CALL macro that checks the error code and
throws an exception if the call fails.
2016-12-14 10:43:44 -07:00
Robert Maynard
b97b4cc7ef Allow thrust::reduce to work when iterator and initial value types differ.
natively thrust::reduce is unable to handle the use case where the
iterator type and the initial value/return value are two different
types. We use array handle cast to work around this problem when
we detect this usecase.
2016-11-25 13:11:19 -05:00
Robert Maynard
2cfc9743e3 Reduce can support reduce to a T type that isn't the arrayhandles T type.
This has been done so that operations such as computing the Min/Max of
an array can be done in a single reduce step.
2016-11-25 11:40:46 -05:00
Kenneth Moreland
fdaccc22db Remove exports for header-only functions/methods
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.

Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
2016-11-15 22:22:13 -07:00
Christopher Sewell
1ebf0c17b6 Attempt 10 to resolve Windows compiler warning with streaming storage 2016-10-21 13:10:49 -06:00
Christopher Sewell
05975a2325 Attempt 3 to resolve Windows compiler warning with streaming storage 2016-10-20 10:32:30 -06:00
Christopher Sewell
c6e15c1240 Merge remote-tracking branch 'upstream/master' into StreamingArray 2016-10-07 18:10:29 -06:00
Robert Maynard
0f58d6fc54 Add vtkm/cont/serial directory for the serial backend. 2016-09-28 14:22:53 -04:00
Robert Maynard
f7a9bbe0e7 All cuda / tbb unit tests follow the same naming pattern. 2016-09-28 14:22:53 -04:00
Christopher Sewell
d92f39df12 Merge branch 'master' into StreamingArray 2016-09-15 17:54:59 -06:00
Christopher Sewell
610f96a831 Adding streaming inclusive scan 2016-09-15 17:46:09 -06:00
Robert Maynard
f81c42b9b4 Replace NULL with nullptr where applicable. 2016-09-01 09:38:25 -04:00
Robert Maynard
51e50d2933 Add DeviceAdapter::CopySubRange to all device adapters.
This allows callers to copy a subsection of an array into another array,
without clearing the contents of the destination array if a resize
is required.
2016-08-24 15:42:51 -04:00
Christopher Sewell
767a5f6792 Clean up for streaming dispatcher 2016-08-03 19:47:12 -06:00
Christopher Sewell
504e6f55df Working streaming dispatcher 2016-08-03 18:48:31 -06:00
Christopher Sewell
ced9fd32db Improving streaming dispatcher 2016-08-03 15:34:43 -06:00
Christopher Sewell
2caf81a4af First commit for ArrayHandleStreaming 2016-08-02 16:54:12 -06:00
Robert Maynard
45ada6b55a Rework ArrayHandleCuda to make it stop generate warnings 2016-07-20 12:41:13 -04:00
Robert Maynard
4ca6ce2ad6 nvcc doesn't have troubles with boost shared_ptr optimizations 2016-07-19 13:26:23 -04:00
Kenneth Moreland
51a35cb4fe Fix warnings about type conversions 2016-06-27 07:50:15 -06:00
Kenneth Moreland
fd29c81bde Fix warnings about redefined macros
The build automatically sets some macros when building CUDA files. Some
of the CUDA sources were setting the same macros, which was causing
warnings. Change the code to be more careful about setting preprocessor
macros.
2016-06-27 07:50:13 -06:00
Kenneth Moreland
7ff20c9230 Fix includes for CUDA builds
The CMake CUDA build targets do not respect the
target_include_directories (yet?). Instead, add the necessary includes
to cuda_include_directories().
2016-06-22 12:53:23 -06:00
Kenneth Moreland
b5415169e2 Change Field and related methods to use Range and Bounds
First, be more explicit when we mean a range of values in a field or a
spacial bounds. Use the Range and Bounds structs in Field and
CoordinateSystem to make all of this more clear (and reduce a bit of
code as well).
2016-05-29 18:49:36 -06:00
Robert Maynard
e5c3f9c42d Solve reduce by key bugs with cuda 7.5 + maxwell hardware.
The concern is now all architectures are doing a hardware sync on reduce_by_key.
This isn't a super serious concern, but it is a downside.
2016-05-12 13:24:59 -04:00
Matt Larsen
eeae5c1352 Adding fast path for radix sort sort_by_key 2016-05-03 08:53:42 -07:00
Kenneth Moreland
cc497e6a1b Remove cont/Assert.h and exec/Assert.h
These asserts are consolidated into the unified Assert.h. Also made some
minor edits to add asserts where appropriate and a little bit of
reconfiguring as found.
2016-04-20 15:41:14 -06:00
Matt Larsen
e5c4aa3f78 Fixing cuda index error 2016-03-08 12:41:11 -08:00
Matt Larsen
249cce352b Adding type restrictions to serial atomics 2016-03-08 10:39:23 -08:00
Matt Larsen
40b6db7eee Inserted missing , 2016-03-08 09:51:50 -08:00
Matt Larsen
3b46706e1f Adding compare and swap and removing unsigned atomics 2016-03-08 09:41:02 -08:00
Matt Larsen
e8b08f2e00 Merge branch 'master' into feature/atomics 2016-03-04 08:03:33 -08:00
Robert Maynard
2f98cdf717 Resolves Issue #56: ChooseCudaDevice functions are in the proper namespace. 2016-02-26 13:51:28 -05:00
Matt Larsen
2baac9cd8b initial commit of atomic adds 2016-02-10 07:51:31 -08:00
Kenneth Moreland
3f446ad261 Add ErrorControlCuda for better CUDA error checking.
Add lots of checks to CUDA calls in the timer to try to identify any
problems that might be showing up on the dashboard.

Also adding some print statements around the sleep function in the
device adapter testing. For some reason the problem just went away with
them.
2016-01-12 15:19:54 -07:00
Kenneth Moreland
f582729803 Synchronize the CUDA timer on both the start and end events
Previously, the timer for CUDA devices only called cudaEventSynchronize
at the end event when asking for the elapsed time. This, however, could
allow time to pass from when the timer was reset to when the start event
happened that was not recorded in the timer. This added synchronization
should make sure that all time spent in CUDA is recorded.
2016-01-12 15:13:56 -07:00
Robert Maynard
c70da5fc23 Extend vtkm::DeviceAdapterTraits to include a unique numeric identifier.
Previously each device adapter only had a unique string name. This was
not the best when it came to developing data structures to track the status
of a given device at runtime.

This adds in a unique numeric identifier to each device adapter. This will
allow classes to easily create bitmasks / lookup tables for the validity of
devices.
2015-12-16 11:18:52 -05:00
Robert Maynard
a7127f0fc3 Adding vtkm::cont::RuntimeDeviceInformation.
The RuntimeDeviceInformation class allows developers to check if a given
device is supported on a machine at runtime. This allows developers to properly
check for CUDA support before running any worklets.
2015-12-15 17:25:27 -05:00
Kenneth Moreland
1a538ca196 Merge branch 'scatter-worklets' into 'master'
Scatter in worklets

Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.

See merge request !221
2015-11-11 13:09:47 -05:00
Robert Maynard
b3687c6f3c Workaround inclusive_scan issues in thrust 1.8.X for complex value types.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.

The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
2015-11-09 17:14:30 -05:00
Kenneth Moreland
f7789f0ed7 Fix issue with const types in Thrust array management
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.

Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
2015-11-06 18:05:21 -07:00
Patricia Kroll Fasel - 090207
4757c0ae9e Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-04 13:28:08 -07:00
Patricia Kroll Fasel - 090207
a5f1f823ae Set default device to cuda in unit test of DataSetExplicit to bypass
compiler errors in CellSetExplicit.
2015-11-04 10:16:44 -07:00
Patricia Kroll Fasel - 090207
480f0bd416 Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-03 13:48:23 -07:00
Robert Maynard
1b30d6e6de Update Cuda so that SumExclusiveScan supports fancy iterators. 2015-11-03 13:28:07 -05:00
Robert Maynard
97550d5e2d Update Cuda so that UnaryPredictes work with fancy cuda array handles. 2015-11-03 13:28:07 -05:00
Patricia Kroll Fasel - 090207
02e16e7e25 Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-03 11:15:26 -07:00
Sujin Philip
1b8fe17f1b Fix for several warnings 2015-11-03 09:11:38 -05:00
Sujin Philip
fd244c4142 Fix errors and warnings caused by recent changes to device adapter tag logic 2015-11-03 09:11:38 -05:00
Robert Maynard
71cf2a7d92 Fix return statement in DeviceAdapterAlgorithmThrust. 2015-11-02 16:46:02 -05:00
Patricia Kroll Fasel - 090207
d167c75596 Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point 2015-11-02 11:09:07 -07:00
Patricia Kroll Fasel - 090207
ed0ecf284d Parallel CellToPoint initial code. 2015-10-30 13:59:36 -06:00
Robert Maynard
85d28667c2 Add a return statement to reduce to stop false positive warnings. 2015-10-30 14:55:13 -04:00
Kenneth Moreland
b861209a22 Fix nvcc warnings on MSVC
There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC
that causes it to emit a warning for an undefined method that is clearly
defined. The CUDA development team is aware of the problem and is going
to fix it, but these changes will work around the problem for now.

Thanks to Tom Fogal from NVIDIA for these fixes.
2015-10-21 08:33:15 -06:00
Robert Maynard
8de216c088 Propagate vtkm::Id3 scheduling down to the ThreadIndex classes.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
2015-10-20 09:29:41 -04:00
T.J. Corona
b1665dcb32 Add an array handle for bare cuda device pointers.
Array handles for cuda device pointers have been implemented. The data for
these handles exists solely on the exec side (info such as length can be
queried from the cont side).
2015-10-09 12:41:33 -04:00
Robert Maynard
f38673f618 Replace ErrorControlOutOfMemory with ErrorControlBadAllocation. 2015-10-01 14:25:28 -04:00
Robert Maynard
056f69bf96 Remove unused variable and conversion warnings from cuda code. 2015-09-21 14:17:25 -04:00
Sujin Philip
1d2657f360 Make cuda DeviceAdapter valid only when using nvcc
Before it was valid even on a regular compiler, if cuda was available.
2015-09-18 12:21:11 -04:00
hschroot
20c1a04894 CopyInto function for ArrayHandles
ArrayHandles in DAX have a CopyInto function which allows the user to copy an array handle's data into a compatible STL type iterator. Originally this was fairly straight forward to implement since array handles in DAX are templated on the DeviceAdapterTag. In contrast, VTKm array handles use a polymorphic ArrayHandleExecutionManager under the hood allowing a single array handle to interface with multiple devices at runtime. To achieve this virtual functions are used. This makes implementing the CopyInto function difficult since it is templated on the IteratorType and virtual functions cannot be templated.

To work around this, I've implemented a concrete templated CopyInto function in the class derived from ArrayHandleExecutionManagerBase. In the ArrayHandle class, CopyInto dynamically casts the base class into the derived class, then calls the CopyInto function defined in the derived class.

The drawback to this approach is that, should the user define their own class that inherits from ArrayHandleExectionManagerBase, they are not forced to implement the CopyInto function, unlike the other virtual functions.
2015-09-17 14:26:19 -04:00
Robert Maynard
fd68521066 Always install all device headers even when device isn't enabled.
vtkm_declare_headers now is able to not test headers, by using the
TESTABLE keyword.
2015-09-17 09:28:21 -04:00
Robert Maynard
4d635d642b DeviceAdapter Tags now always exist, and contain if the device is valid.
Previously it was really hard to verify if a device adapter was valid. Since
you would have to check for the existence of the tag. Now the tag always
exists, but instead you query the traits of the DeviceAdapter to see if
it is a valid adapter.

This makes compiling with multiple backends alot easier.
2015-09-17 09:28:21 -04:00
Robert Maynard
6272fdcc54 Correct multiple signature compile issue. 2015-09-08 09:39:57 -04:00
Robert Maynard
aa7f5c34b9 Resolves Issue #42. Now all thrust API calls are in try/catch blocks. 2015-09-07 12:20:31 -04:00
Robert Maynard
72450e87f3 Make thrust use fast paths when doing sort and scan.
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.

Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan

Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0117049s
  median abs dev = 0.00324614s
  mean = 0.0167615s
  std dev = 0.00786269s
  min = 0.00845875s
  max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0234463s
  median abs dev = 0.000317249s
  mean = 0.021452s
  std dev = 0.00470307s
  min = 0.011255s
  max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0310486s
  median abs dev = 0.000182129s
  mean = 0.0286914s
  std dev = 0.00634102s
  min = 0.0116225s
  max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0310617s
  median abs dev = 0.000193583s
  mean = 0.0295779s
  std dev = 0.00491531s
  min = 0.0147257s
  max = 0.032307s
2015-09-03 16:00:37 -04:00
Robert Maynard
8422108f28 Cuda copy from host to device can't use the cuda execution policy. 2015-09-02 09:53:00 -04:00
Robert Maynard
efc9f0c5cf All occurrences of thrust invocation uses an execution policy. 2015-09-01 19:32:49 -04:00
Sujin Philip
514ac54e59 Add custom operator and initial value support to ExclusiveScan 2015-08-28 09:56:04 -04:00
Robert Maynard
619103b202 Update cuda ScanExclusive to handle dereferencing device only arrays. 2015-08-26 11:51:02 -04:00
Robert Maynard
157d8efee4 Workaround thrust 1.8 inclusive scan issue.
Starting in thrust 1.8 the implementation of scan inclusive inside
thrust became highly optimized by using parallel task groups. This
new implementation has a bug that only exists when using custom
binary operators, large size arrays, release mode, and no
debugger or mem-checker attached.

While I have submitted the issue to thrust, we need to be able
to work around the existing issue. The solution I have chosen is
to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators
as being commutative as far as thrust is concerened. To make
sure we don't get any unexpected behavior I have also had
to create WrappedBinaryPredicate so that we don't mark any
predicate as commutative.
2015-08-17 10:39:14 -04:00
Robert Maynard
ab59e34a2f Rename pragma header guard so it makes sense for tbb and thrust.
Boost is not the only thirdparty that we are supressing warnings for, so
make the name more generic.
2015-08-13 09:04:23 -04:00
Robert Maynard
8204db2f6a Use VTKM_BOOST_PRE_INCLUDE around thrust headers too. 2015-08-13 08:26:41 -04:00
Robert Maynard
bae6ff7f55 Merge branch 'introduce_binary_and_unary_operators' into 'master'
Introduce binary and unary operators

See merge request !94
2015-08-06 15:14:28 -04:00
Robert Maynard
d3fd571ef2 Add vtkm/UnaryPredicates header.
Currently includes the following predicates:
  - IsDefaultConstructor
  - NotDefaultConstructor
  - LogicalNot
2015-07-30 13:12:59 -04:00
Robert Maynard
d9fd702b1c Make detecting if we are cuda 3+ gpu running cuda 2 code faster.
The original implementing tried to run 2^31 kernels and detect a
launch failure to determine this use-case. The issue with this approach
is that on a cuda 3+ gpu, this would take multiple seconds and cause
the gpu to terminate the kernel when opengl was also loaded.
2015-07-28 17:04:24 -04:00
Robert Maynard
19aa6b8d62 Update the 3d scheduling benchmark code to use the new 1d scheduler 2015-07-23 16:31:41 -04:00
Robert Maynard
d0d11640ea Cuda now can schedule worklets that require more than 2B instances. 2015-07-23 16:31:28 -04:00
Sujin Philip
c7d3d0df5c Merge branch 'add-compute-bounds' into 'master'
Add compute bounds to Fields

See merge request !88
2015-07-22 14:26:48 -04:00
Sujin Philip
91b191bf83 Add compute bounds to Fields 2015-07-22 12:17:33 -04:00
Robert Maynard
780f3fea29 Merge branch 'correct_cuda_scheduling_over_8m' into 'master'
Correct cuda scheduling over 8m

See merge request !91
2015-07-20 15:01:05 -04:00
Robert Maynard
9f16669e3c Increase the robustness of 3d scheduling when X dim is very small. 2015-07-20 13:27:58 -04:00
Robert Maynard
7a21a08c46 Handle 1D scheduling with over 65k blocks on SM2.X arch.
The initial implementation forgot about the fact that SM2.X architectures
can only handle 65k blocks. Now we gracefully handle when compiling for
SM2.X.
2015-07-20 13:26:16 -04:00
Sujin Philip
1a9e8d1e3d Initial support for generating documentation using Doxygen 2015-07-17 15:35:59 -04:00
Robert Maynard
e74ded809a Defer more thrust iterator deduction logic to ArrayPortalToIterators. 2015-07-14 10:11:12 -04:00
Robert Maynard
4ba1f7c853 Remove the need for any portal to define an IteratorType. 2015-07-13 17:16:27 -04:00
Kenneth Moreland
4fc3626712 Fix compiler directives for icc
The Intel icc compiler tries to pretend it is gcc, but it sometimes
behaves differently. Add more explicit checks for what compiler is
being used.
2015-07-06 10:35:06 -06:00
Robert Maynard
d12375450f Make ChooseCudaDevice only consider SM3 and above equal. 2015-06-30 08:18:16 -04:00
Robert Maynard
6af949f488 Fix warnings in ChooseCudaDevice by moving performance metric to double. 2015-06-30 08:16:42 -04:00
Kenneth Moreland
3b9bb5ffeb Merge branch 'math' into 'master'
Math

Adds a set of portable math functions, most of which are "expected" to be available and are typically useful. They include things like trigonometry (sin, cos, tan, etc.), floating point information (nan, finite, etc), and min/max.

See merge request !58
2015-06-27 20:29:49 -04:00
Kenneth Moreland
4a8d69ca66 Add CUDA math test
Also fix some issues that caused the compile to fail when trying to
run some of the math functions on a CUDA device. In particular, CUDA
is picky about using a global const on a device when the const type is
not one of the basic C types.
2015-06-25 13:54:24 -06:00
Robert Maynard
47f9874132 Extend the array handle testing to be done on a per device basis.
Now we test ArrayHandle with all the vtkm types to make sure all the
transfer logic / access logic works for all T types.
2015-06-25 08:21:49 -04:00
Robert Maynard
6b70893b96 Merge branch 'clarify_unary_and_binary_operator_names' into 'master'
Clarify unary and binary operator names

Fixes the naming issues brought up in Issue #6.

See merge request !54
2015-06-23 15:49:03 -04:00
Robert Maynard
459552d795 Update the name of the binary functor for UpperBounds 2015-06-23 09:19:09 -04:00
Robert Maynard
3cbac03810 Update the name of the binary functor Unique 2015-06-23 08:58:53 -04:00
Robert Maynard
e6cadef457 Update the name of the unary functor for StreamCompact. 2015-06-23 08:56:18 -04:00
Robert Maynard
7ee3084868 Update the name of the binary functor for Sort. 2015-06-23 08:36:42 -04:00
Robert Maynard
604289c1ce Update the name of the binary functor for ScanInclusive 2015-06-23 08:26:04 -04:00
Robert Maynard
8ba0dacc65 Update the name of the binary functor for Reduce. 2015-06-22 11:55:02 -04:00
Robert Maynard
0901e71574 Update the name of the binary comparison for LowerBounds. 2015-06-22 11:49:11 -04:00
Robert Maynard
2a66dec110 Correct a bug in the logic of determining the block size. 2015-06-22 08:51:25 -04:00
Robert Maynard
fa78610ea1 Remove inclusion of iostream header. 2015-06-19 13:03:56 -04:00
Robert Maynard
6c3f669398 Remove dead code for converting zip handles to iterators for cuda.
This code isn't needed anymore as the new iterator facade handles this
use case properly.
2015-06-18 13:53:02 -04:00
Robert Maynard
9d350fb023 Correct undefined behavior that was causing scan test failures.
We need call PrepareForInput on the input argument before invoking a function.
The order of execution of parameters of a function is undefined, so we need to
make sure input is called before output, or else in-place use case breaks.
2015-06-17 15:35:27 -04:00
Robert Maynard
2ce14c7760 DeviceAdapter's now properly init all value types.
This is needed when use vtkm::Vec, as it doesn't zero the memory it contains.
2015-06-16 08:35:04 -04:00
Robert Maynard
2c91cdfa3b Update cuda/thrust backend scan algorithms to work with vec types. 2015-06-16 08:28:31 -04:00
Chun-Ming Chen
360279c29a A more general Copy that allows copying into a diff. array type. Testing codes are added. 2015-06-12 12:42:43 -04:00
Robert Maynard
2a2159b1e1 Generalize the support for zip handles inside the cuda backend.
Instead of having a single specialization for sort and zip handles,
we know handle any fancy handles being passed to the cuda device adapter.
This was done by reworking how we represent fancy iterators inside thrust,
and instead of using a transform iterator + counting iterator we just use
a iterator_facade.
2015-06-12 11:56:46 -04:00
Robert Maynard
d26cfcf5cc Merge branch 'sort_inplace_zip_arrays' into 'master'
Allow in-place sorting of ArrayHandleZip

See merge request !29
2015-06-10 15:51:40 -04:00
Robert Maynard
c052400ffb CUDA now converts ArrayHandleZip to ::thrust::zip iterator.
This is required so that we can use ArrayHandleZip with Sort/Reduce and
custom comparison operators. ArrayZip and PortalValue don't combine
well together, when used with DeviceAlgorithm that has a custom operator.
The custom operator is actually passed the PortalValue instead of
the real values, and by that point we can't fix anything since we
don't know what the original operator is.
2015-06-10 14:15:40 -04:00
Robert Maynard
549a7dd7c7 Adding support to cuda backend to write to a ZipHandle. 2015-06-10 14:15:40 -04:00
Robert Maynard
040d7e761e Explicitly invoke all thrust algorithms using the cuda policy.
This is required if we ever want to use thrust from another device adapter.
2015-06-10 09:41:06 -04:00
Robert Maynard
3a532160c9 cuda device adapter now support different value types for keys and values. 2015-06-01 17:19:33 -04:00
Robert Maynard
07970cf476 Correct all signed / unsigned and narrowing warnings ( 64bit to 32bit ). 2015-05-28 09:05:17 -04:00
Robert Maynard
d54aee7eb5 Merge branch 'fix_typo_in_copyright' 2015-05-21 10:32:08 -04:00
Robert Maynard
6b8e7822be The Copyright statement now has all the periods in the correct location. 2015-05-21 10:30:11 -04:00
Robert Maynard
f2b47ffd4a Add ReduceByKey to the DeviceAdapter. 2015-05-21 08:23:28 -04:00
Robert Maynard
be193542ac Introduce StreamCompact with a custom stencil unary predicate. 2015-05-21 08:23:28 -04:00
Robert Maynard
e38caafe37 Adding Reduce with custom operator to the DeviceAdapterAlgorithm. 2015-05-14 15:16:49 -04:00
Robert Maynard
5d9f369d0c Adding ScanInclusive with custom binary operator to DeviceAdapterAlgorithm. 2015-05-14 15:16:49 -04:00
Robert Maynard
9519737b3a Adding Reduce to the DeviceAdapterAlgorithm. 2015-05-14 15:16:49 -04:00
Robert Maynard
fbee20fc03 Mark the local value for the result of GetNumberOfValues as const.
We aren't modifying the value, so no reason to not have it as const.
2015-05-14 15:16:48 -04:00
Robert Maynard
59618b3a50 Merge branch 'sort_by_key' 2015-05-11 08:04:09 -04:00
Kenneth Moreland
66aa10aab7 Fix issues with warnings about data loss from type conversion
Fix compile warnings that come up with the flags

  -Wconversion -Wno-sign-conversion

This catches several instances (mostly in the testing framework) where
types are implicitly converted. I expect these changes to fix some of
the warnings we are seeing in MSVC.

I was going to add these flags to the list of extra warning flags, but
unfortunately the Thrust library has several warnings of these types,
and I don't know a good way to turn on the warnings for our code but
turn them off for Thrust.
2015-05-07 16:34:06 -06:00
Robert Maynard
6564d7af1c Enable SortByKey Test on the Device Adapter. 2015-05-06 09:27:59 -04:00
Robert Maynard
c2f2c166c0 Test all Fancy ArrayHandles and Array Portals on every device adapter.
Fancy array handles being Counting, Implicit, Permutation, Transform, and Zip.
2015-05-05 14:25:48 -04:00
Kenneth Moreland
d15074153d Remove comments about textures from ArrayManagerExecutionThrustDevice
Per a discussion with Robert Maynard, we no longer have plans to bind
textures or use texture objects. Instead, we are now using the __ldg
command to load from global memory inside the execution portal, which
gives us most of the benefits of textures, but doesn't incur any
bookkeeping.
2015-05-04 14:18:00 -06:00
Kenneth Moreland
e14fc427f0 Change storage references to storage pointers.
Previously ArrayTransfer and ArrayManagerExecution received a reference
to a Storage class in their constructor and held the reference as an
ivar. In retrospect, this is just asking for trouble. First, it is way
too easy to pass by value when you mean to pass by reference. Second, if
there ever is a bug where the Storage goes out of scope before the
classes holding a reference, it is that much harder to debug.
2015-04-30 21:29:58 -06:00
Kenneth Moreland
ec0adf8b16 Change interface of ArrayTransfer to be more like ArrayHandle.
This includes changing methods like LoadDataForInput to PrepareForInput.
It also changed the interface a bit to save a reference to the storage
object. (Maybe it would be better to save a pointer?) These changes also
extend up to the ArrayManagerExecution class, so it can effect device
adapter implementations.
2015-04-30 21:07:36 -06:00
Kenneth Moreland
c5b831b726 Remove LoadDataForInput that takes portals.
This API change effects both ArrayTransfer and ArrayManagerExecution.
This is in preparation for a future change to make the API more
consistent with ArrayHandle.
2015-04-28 11:30:30 -04:00
Robert Maynard
30c5ee9cf4 Remove SetThrustForCuda as we explicitly use the thrust cuda system.
The only reason SetThrustForCuda was ever needed was that we used
::thrust::device_vector instead of ::thrust::system::cuda::vector.
2015-04-23 09:23:41 -04:00
Robert Maynard
c806348c94 Remove debug code from cuda schedule algorithm. 2015-04-16 08:56:08 -04:00
Robert Maynard
3a10a63e8c make sure the 3d block size makes sense. 2015-04-08 15:57:35 -04:00
Robert Maynard
fdac208acb use cuda scheduling versus thrust scheduling. 2015-04-03 10:18:05 -04:00
Robert Maynard
d340a573d9 don't clear ::thrust::vectors as that is not our intended behavior. 2015-03-23 09:01:42 -04:00
Robert Maynard
86dc8f1d38 Move back to thrust::cuda::vector to properly handle allocating uint8's
Our approach of using the underlying allocator inside thrust was a bad approach,
for some reason it fails to properly allocate uint8's or int8's on the correct
boundaries. I expect that this logic is somewhere else in the code and
instead we should use thrust::system::cuda::vector which does this properly.
2015-03-10 10:10:03 -04:00
Robert Maynard
63b1f03187 Simplify the implementation of loading through textures.
We don't need this super complicated system for texture loading.
2015-03-09 16:37:45 -04:00
Robert Maynard
9b49973621 Use __ldg instead of texture object. 2015-03-05 18:31:44 -05:00
Robert Maynard
3c8ce36666 Properly deallocate cuda memory when we are done with it. 2015-01-28 15:54:45 -05:00
Robert Maynard
46df484ca7 Redesign the way we implement when to choose texture portals.
The logic to select texture portals is easier to understand now.
2015-01-20 13:58:28 -05:00
Robert Maynard
e26040282b Properly map TexturePortals to Iterators.
Also at the same time disable the Texture support by default.
2015-01-20 13:58:28 -05:00
Robert Maynard
04bc41cad3 Fix multiple issue in the cuda array handle unit tests.
The namespaces need to be different for each test, or else only the first
implementation of the function will be used for all tests that call that
function.

Also updated the test to verify that we can count starting from a non zero
number.
2015-01-20 09:03:24 -05:00
Robert Maynard
1b5c5a6ce5 Add in initial support for texture binding of input arrays. 2014-12-19 13:47:28 -05:00
Robert Maynard
a509dae909 Use pinned memory for error reporting on the cuda backend.
This reduces the amount of explicit cuda Device to Host memory copies.
2014-12-19 13:47:28 -05:00
Robert Maynard
d9270e408d Adding a cuda device adapter to vtkm.
Porting the dax device adapter over to vtkm. Unlike the dax version, doesn't
use the thrust::device_vector, but instead uses thrust::system calls so that
we can support multiple thrust based backends.

Also this has Texture Memory support for input array handles. Some more work
will need to be done to ArrayHandle so that everything works when using an
ArrayHandle inplace with texture memory bindings.
2014-12-19 13:47:28 -05:00