vtk-m2

Author	SHA1	Message	Date
Kenneth Moreland	1a538ca196	Merge branch 'scatter-worklets' into 'master' Scatter in worklets Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input. See merge request !221	2015-11-11 13:09:47 -05:00
Robert Maynard	b3687c6f3c	Workaround inclusive_scan issues in thrust 1.8.X for complex value types. The original workaround for inclusive_scan bugs in thrust 1.8 only solved the issue for basic arithmetic types such as int, float, double. Now we go one step further and fix the problem for all types. The solution is to provide a proper implementation of destructive_accumulate_n and make sure it exists before any includes of thrust occur.	2015-11-09 17:14:30 -05:00
Kenneth Moreland	f7789f0ed7	Fix issue with const types in Thrust array management Previously, there was a declaration ConstArrayPortalFromThrust<const T> in ArrayManagerExecutionThrustDevice. This proved problematic because values read from the array in the worklet were typed as const T rather than simply T. Any Vec or Matrix built from that type would then fail because they are not meant to work with a const value (which means they have to be set on construction and never changed. Instead, declare ConstArrayPortalFromThrust<T> and internally set all the Thrust pointers to have type const T. Also declare other thrust pointers used as method parameters to have const T rather than T. This should work as conversion from T to const T should be fine, but not the other way around.	2015-11-06 18:05:21 -07:00
Patricia Kroll Fasel - 090207	4757c0ae9e	Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point	2015-11-04 13:28:08 -07:00
Patricia Kroll Fasel - 090207	a5f1f823ae	Set default device to cuda in unit test of DataSetExplicit to bypass compiler errors in CellSetExplicit.	2015-11-04 10:16:44 -07:00
Patricia Kroll Fasel - 090207	480f0bd416	Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point	2015-11-03 13:48:23 -07:00
Robert Maynard	1b30d6e6de	Update Cuda so that SumExclusiveScan supports fancy iterators.	2015-11-03 13:28:07 -05:00
Robert Maynard	97550d5e2d	Update Cuda so that UnaryPredictes work with fancy cuda array handles.	2015-11-03 13:28:07 -05:00
Patricia Kroll Fasel - 090207	02e16e7e25	Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point	2015-11-03 11:15:26 -07:00
Sujin Philip	1b8fe17f1b	Fix for several warnings	2015-11-03 09:11:38 -05:00
Sujin Philip	fd244c4142	Fix errors and warnings caused by recent changes to device adapter tag logic	2015-11-03 09:11:38 -05:00
Robert Maynard	71cf2a7d92	Fix return statement in DeviceAdapterAlgorithmThrust.	2015-11-02 16:46:02 -05:00
Patricia Kroll Fasel - 090207	d167c75596	Merge branch 'master' of gitlab.kitware.com:Fasel/vtk-m into cell_to_point	2015-11-02 11:09:07 -07:00
Patricia Kroll Fasel - 090207	ed0ecf284d	Parallel CellToPoint initial code.	2015-10-30 13:59:36 -06:00
Robert Maynard	85d28667c2	Add a return statement to reduce to stop false positive warnings.	2015-10-30 14:55:13 -04:00
Kenneth Moreland	b861209a22	Fix nvcc warnings on MSVC There is a strange nvcc warning in CUDA 7.5 that sometimes happens on MSVC that causes it to emit a warning for an undefined method that is clearly defined. The CUDA development team is aware of the problem and is going to fix it, but these changes will work around the problem for now. Thanks to Tom Fogal from NVIDIA for these fixes.	2015-10-21 08:33:15 -06:00
Robert Maynard	8de216c088	Propagate vtkm::Id3 scheduling down to the ThreadIndex classes. This now allows for even more efficient construction of uniform point coordinates when running under the 3d scheduler, since we don't need to go from 3d index to flat index to 3d index, instead we stay in 3d index	2015-10-20 09:29:41 -04:00
T.J. Corona	b1665dcb32	Add an array handle for bare cuda device pointers. Array handles for cuda device pointers have been implemented. The data for these handles exists solely on the exec side (info such as length can be queried from the cont side).	2015-10-09 12:41:33 -04:00
Robert Maynard	f38673f618	Replace ErrorControlOutOfMemory with ErrorControlBadAllocation.	2015-10-01 14:25:28 -04:00
Robert Maynard	056f69bf96	Remove unused variable and conversion warnings from cuda code.	2015-09-21 14:17:25 -04:00
Sujin Philip	1d2657f360	Make cuda DeviceAdapter valid only when using nvcc Before it was valid even on a regular compiler, if cuda was available.	2015-09-18 12:21:11 -04:00
hschroot	20c1a04894	CopyInto function for ArrayHandles ArrayHandles in DAX have a CopyInto function which allows the user to copy an array handle's data into a compatible STL type iterator. Originally this was fairly straight forward to implement since array handles in DAX are templated on the DeviceAdapterTag. In contrast, VTKm array handles use a polymorphic ArrayHandleExecutionManager under the hood allowing a single array handle to interface with multiple devices at runtime. To achieve this virtual functions are used. This makes implementing the CopyInto function difficult since it is templated on the IteratorType and virtual functions cannot be templated. To work around this, I've implemented a concrete templated CopyInto function in the class derived from ArrayHandleExecutionManagerBase. In the ArrayHandle class, CopyInto dynamically casts the base class into the derived class, then calls the CopyInto function defined in the derived class. The drawback to this approach is that, should the user define their own class that inherits from ArrayHandleExectionManagerBase, they are not forced to implement the CopyInto function, unlike the other virtual functions.	2015-09-17 14:26:19 -04:00
Robert Maynard	fd68521066	Always install all device headers even when device isn't enabled. vtkm_declare_headers now is able to not test headers, by using the TESTABLE keyword.	2015-09-17 09:28:21 -04:00
Robert Maynard	4d635d642b	DeviceAdapter Tags now always exist, and contain if the device is valid. Previously it was really hard to verify if a device adapter was valid. Since you would have to check for the existence of the tag. Now the tag always exists, but instead you query the traits of the DeviceAdapter to see if it is a valid adapter. This makes compiling with multiple backends alot easier.	2015-09-17 09:28:21 -04:00
Robert Maynard	6272fdcc54	Correct multiple signature compile issue.	2015-09-08 09:39:57 -04:00
Robert Maynard	aa7f5c34b9	Resolves Issue #42 . Now all thrust API calls are in try/catch blocks.	2015-09-07 12:20:31 -04:00
Robert Maynard	72450e87f3	Make thrust use fast paths when doing sort and scan. By introducing our own custom thrust execution policy we can make sure to hit the fastest code paths in thrust for the sort operation. This makes sure that for UInt32,Int32, and Float32 we use the radix sort from thrust which offers a 2x to 3x speed improvement over the merge sort implementation. Secondly by telling thrust that our BinaryOperators are commutative we make sure that we get the fastest code paths when executing Inclusive and Exclusive Scan Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results: median = 0.0117049s median abs dev = 0.00324614s mean = 0.0167615s std dev = 0.00786269s min = 0.00845875s max = 0.0389063s Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results: median = 0.0234463s median abs dev = 0.000317249s mean = 0.021452s std dev = 0.00470307s min = 0.011255s max = 0.0250643s Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results: median = 0.0310486s median abs dev = 0.000182129s mean = 0.0286914s std dev = 0.00634102s min = 0.0116225s max = 0.0317379s Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results: median = 0.0310617s median abs dev = 0.000193583s mean = 0.0295779s std dev = 0.00491531s min = 0.0147257s max = 0.032307s	2015-09-03 16:00:37 -04:00
Robert Maynard	8422108f28	Cuda copy from host to device can't use the cuda execution policy.	2015-09-02 09:53:00 -04:00
Robert Maynard	efc9f0c5cf	All occurrences of thrust invocation uses an execution policy.	2015-09-01 19:32:49 -04:00
Sujin Philip	514ac54e59	Add custom operator and initial value support to ExclusiveScan	2015-08-28 09:56:04 -04:00
Robert Maynard	619103b202	Update cuda ScanExclusive to handle dereferencing device only arrays.	2015-08-26 11:51:02 -04:00
Robert Maynard	157d8efee4	Workaround thrust 1.8 inclusive scan issue. Starting in thrust 1.8 the implementation of scan inclusive inside thrust became highly optimized by using parallel task groups. This new implementation has a bug that only exists when using custom binary operators, large size arrays, release mode, and no debugger or mem-checker attached. While I have submitted the issue to thrust, we need to be able to work around the existing issue. The solution I have chosen is to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators as being commutative as far as thrust is concerened. To make sure we don't get any unexpected behavior I have also had to create WrappedBinaryPredicate so that we don't mark any predicate as commutative.	2015-08-17 10:39:14 -04:00
Robert Maynard	ab59e34a2f	Rename pragma header guard so it makes sense for tbb and thrust. Boost is not the only thirdparty that we are supressing warnings for, so make the name more generic.	2015-08-13 09:04:23 -04:00
Robert Maynard	8204db2f6a	Use VTKM_BOOST_PRE_INCLUDE around thrust headers too.	2015-08-13 08:26:41 -04:00
Robert Maynard	bae6ff7f55	Merge branch 'introduce_binary_and_unary_operators' into 'master' Introduce binary and unary operators See merge request !94	2015-08-06 15:14:28 -04:00
Robert Maynard	d3fd571ef2	Add vtkm/UnaryPredicates header. Currently includes the following predicates: - IsDefaultConstructor - NotDefaultConstructor - LogicalNot	2015-07-30 13:12:59 -04:00
Robert Maynard	d9fd702b1c	Make detecting if we are cuda 3+ gpu running cuda 2 code faster. The original implementing tried to run 2^31 kernels and detect a launch failure to determine this use-case. The issue with this approach is that on a cuda 3+ gpu, this would take multiple seconds and cause the gpu to terminate the kernel when opengl was also loaded.	2015-07-28 17:04:24 -04:00
Robert Maynard	19aa6b8d62	Update the 3d scheduling benchmark code to use the new 1d scheduler	2015-07-23 16:31:41 -04:00
Robert Maynard	d0d11640ea	Cuda now can schedule worklets that require more than 2B instances.	2015-07-23 16:31:28 -04:00
Sujin Philip	c7d3d0df5c	Merge branch 'add-compute-bounds' into 'master' Add compute bounds to Fields See merge request !88	2015-07-22 14:26:48 -04:00
Sujin Philip	91b191bf83	Add compute bounds to Fields	2015-07-22 12:17:33 -04:00
Robert Maynard	780f3fea29	Merge branch 'correct_cuda_scheduling_over_8m' into 'master' Correct cuda scheduling over 8m See merge request !91	2015-07-20 15:01:05 -04:00
Robert Maynard	9f16669e3c	Increase the robustness of 3d scheduling when X dim is very small.	2015-07-20 13:27:58 -04:00
Robert Maynard	7a21a08c46	Handle 1D scheduling with over 65k blocks on SM2.X arch. The initial implementation forgot about the fact that SM2.X architectures can only handle 65k blocks. Now we gracefully handle when compiling for SM2.X.	2015-07-20 13:26:16 -04:00
Sujin Philip	1a9e8d1e3d	Initial support for generating documentation using Doxygen	2015-07-17 15:35:59 -04:00
Robert Maynard	e74ded809a	Defer more thrust iterator deduction logic to ArrayPortalToIterators.	2015-07-14 10:11:12 -04:00
Robert Maynard	4ba1f7c853	Remove the need for any portal to define an IteratorType.	2015-07-13 17:16:27 -04:00
Kenneth Moreland	4fc3626712	Fix compiler directives for icc The Intel icc compiler tries to pretend it is gcc, but it sometimes behaves differently. Add more explicit checks for what compiler is being used.	2015-07-06 10:35:06 -06:00
Robert Maynard	d12375450f	Make ChooseCudaDevice only consider SM3 and above equal.	2015-06-30 08:18:16 -04:00
Robert Maynard	6af949f488	Fix warnings in ChooseCudaDevice by moving performance metric to double.	2015-06-30 08:16:42 -04:00

1 2 3

106 Commits