vtk-m2

Author	SHA1	Message	Date
Christopher Sewell	ced9fd32db	Improving streaming dispatcher	2016-08-03 15:34:43 -06:00
Christopher Sewell	2caf81a4af	First commit for ArrayHandleStreaming	2016-08-02 16:54:12 -06:00
Robert Maynard	c1560e2d3f	Perform less unnecessary copies when deducing a worklets parameters. One of the causes of the large library size and slow compile times has been that vtkm has been creating unnecessary copies when not needed. When the objects being copied use shared_ptr this causes a bloom in library size. I presume this bloom is caused by the atomic increment/decrement that is required by shared_ptr. For testing I used the following example: ``` struct ExampleFieldWorklet : public vtkm::worklet::WorkletMapField { typedef void ControlSignature( FieldIn<>, FieldIn<>, FieldIn<>, FieldOut<>, FieldOut<>, FieldOut<> ); typedef void ExecutionSignature( _1, _2, _3, _4, _5, _6 ); template<typename T, typename U, typename V> VTKM_EXEC_EXPORT void operator()( const vtkm::Vec< T, 3 > & vec, const U & scalar1, const V& scalar2, vtkm::Vec<T, 3>& out_vec, U& out_scalar1, V& out_scalar2 ) const { out_vec = vec * scalar1; out_scalar1 = scalar1 + scalar2; out_scalar2 = scalar2; } template<typename T, typename U, typename V, typename W, typename X, typename Y> VTKM_EXEC_EXPORT void operator()( const T & vec, const U & scalar1, const V& scalar2, W& out_vec, X& out_scalar, Y& ) const { //no-op } }; int main(int argc, char** argv) { std::vector< vtkm::Vec<vtkm::Float32, 3> > inputVec; std::vector< vtkm::Int32 > inputScalar1; std::vector< vtkm::Float64 > inputScalar2; vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleV = vtkm::cont::make_ArrayHandle(inputVec); vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS1 = vtkm::cont::make_ArrayHandle(inputVec); vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS2 = vtkm::cont::make_ArrayHandle(inputVec); vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOV; vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS1; vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS2; std::cout << "Making 3 output DynamicArrayHandles " << std::endl; vtkm::cont::DynamicArrayHandle out1(handleOV), out2(handleOS1), out3(handleOS2); typedef vtkm::worklet::DispatcherMapField<ExampleFieldWorklet> DispatcherType; std::cout << "Invoking ExampleFieldWorklet" << std::endl; DispatcherType dispatcher; dispatcher.Invoke(handleV, handleS1, handleS2, out1, out2, out3); } ``` Original vtkm would generate a binary of size 4684kb and would perform 91 ArrayHandle copies or assignments. With this branch the binary size is reduced to 2392kb and will perform 36 copies or assignments.	2016-01-19 09:20:49 -05:00
Robert Maynard	05d397cbf7	Remove unnecessary template parameters from DispatcherMapField DispatcherMapField was templated on the device adapter but it actually doesn't need to be, only BasicInvoke and subsequent methods need to be templated on the device.	2015-10-22 12:22:59 -04:00
Kenneth Moreland	99ce66c6fe	Change Fetches to use ThreadIndices instead of Invocation. Previously, all Fetch objects received an Invocation object in their Load and Store methods. The point of this was that it allowed the Fetch to get data from any of the execution objects. However, every Fetch either just got data directly from its associated execution object or else used a secondary execution object (the input domain) to get indices into their own execution object. This left two potential areas for improvement. First, pulling data out of the Invocation object was unnecessarily complicated. It would be much nicer to get data directly from the associated execution object. Second, when getting index information from the input domain, it was often the case that extra computations were necessary (particularly on structured cell sets). There was no way to share the index information among Fetches, and therefore the computations were replicated. This change removes the Invocation from the Fetch Load and Store. Instead, it passes the associated execution object and a new object type called the ThreadIndices. The ThreadIndices are customized for the input domain and therefore have all the information needed for a redirected lookup. It is also a thread-local object so it can cache computed indices and save on computation time.	2015-10-07 17:01:42 -06:00
Robert Maynard	6b8e7822be	The Copyright statement now has all the periods in the correct location.	2015-05-21 10:30:11 -04:00
Kenneth Moreland	dad18e1170	Improve 3D scheduling mechanism in DispatcherBase Previously, DispatcherBase had an ivar to determine whether to use the numInstances passed on the stack or to use a 3D range held in a different ivar. This change allows either a 1D range or 3D range to be passed on the stack, which I expect to be closer to how we we handle this when 3D ranges are fully supported. This also fixes a bug introduced with commit fdac208acbfa47b613d899a36cefc32a01e8f0a8 where the Use3DSchedule ivar was not set correctly in UnitTestDispatcherBase.	2015-05-05 08:46:23 -06:00
Robert Maynard	fdac208acb	use cuda scheduling versus thrust scheduling.	2015-04-03 10:18:05 -04:00
Kenneth Moreland	80809a8f0f	Add DispatcherMapField. This is a simple version of a dispatcher, but an important one. Note that there is an issue brought up with UnitTestWorkletMapField in that there needs to be better ways to specify worklet argument types.	2014-10-21 13:10:00 -06:00

9 Commits