Commit Graph

11 Commits

Author SHA1 Message Date
Kenneth Moreland
c3a3184d51 Update copyright for Sandia
Sandia National Laboratories recently changed management from the
Sandia Corporation to the National Technology & Engineering Solutions
of Sandia, LLC (NTESS). The copyright statements need to be updated
accordingly.
2017-09-20 15:33:44 -06:00
Robert Maynard
5dd346007b Respect VTK-m convention of parameters all or nothing on a line
clang-format BinPack settings have been disabled to make sure that the
VTK-m style guideline is obeyed.
2017-05-26 13:53:28 -04:00
Kitware Robot
4ade5f5770 clang-format: apply to the entire tree 2017-05-25 07:51:37 -04:00
Kenneth Moreland
fdaccc22db Remove exports for header-only functions/methods
Change the VTKM_CONT_EXPORT to VTKM_CONT. (Likewise for EXEC and
EXEC_CONT.) Remove the inline from these macros so that they can be
applied to everything, including implementations in a library.

Because inline is not declared in these modifies, you have to add the
keyword to functions and methods where the implementation is not inlined
in the class.
2016-11-15 22:22:13 -07:00
Robert Maynard
c1560e2d3f Perform less unnecessary copies when deducing a worklets parameters.
One of the causes of the large library size and slow compile times has been
that vtkm has been creating unnecessary copies when not needed. When the
objects being copied use shared_ptr this causes a bloom in library size. I
presume this bloom is caused by the atomic increment/decrement that is
required by shared_ptr.

For testing I used the following example:
```
struct ExampleFieldWorklet : public vtkm::worklet::WorkletMapField
{
  typedef void ControlSignature( FieldIn<>, FieldIn<>, FieldIn<>,
                                 FieldOut<>, FieldOut<>, FieldOut<> );
  typedef void ExecutionSignature( _1, _2, _3, _4, _5, _6 );

  template<typename T, typename U, typename V>
  VTKM_EXEC_EXPORT
  void operator()( const vtkm::Vec< T, 3 > & vec,
                   const U & scalar1,
                   const V& scalar2,
                   vtkm::Vec<T, 3>& out_vec,
                   U& out_scalar1,
                   V& out_scalar2 ) const
  {
    out_vec = vec * scalar1;
    out_scalar1 = scalar1 + scalar2;
    out_scalar2 = scalar2;
  }

  template<typename T, typename U, typename V, typename W, typename X, typename Y>
  VTKM_EXEC_EXPORT
  void operator()( const T & vec,
                   const U & scalar1,
                   const V& scalar2,
                   W& out_vec,
                   X& out_scalar,
                   Y& ) const
  {
  //no-op
  }
};

int main(int argc, char** argv)
{
  std::vector< vtkm::Vec<vtkm::Float32, 3> > inputVec;
  std::vector< vtkm::Int32 > inputScalar1;
  std::vector< vtkm::Float64 > inputScalar2;

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleV =
    vtkm::cont::make_ArrayHandle(inputVec);

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS1 =
    vtkm::cont::make_ArrayHandle(inputVec);

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS2 =
    vtkm::cont::make_ArrayHandle(inputVec);

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOV;
  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS1;
  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS2;

  std::cout << "Making 3 output DynamicArrayHandles " << std::endl;
  vtkm::cont::DynamicArrayHandle out1(handleOV), out2(handleOS1), out3(handleOS2);

  typedef vtkm::worklet::DispatcherMapField<ExampleFieldWorklet> DispatcherType;

  std::cout << "Invoking ExampleFieldWorklet" << std::endl;
  DispatcherType dispatcher;

  dispatcher.Invoke(handleV, handleS1, handleS2, out1, out2, out3);

}
```

Original vtkm would generate a binary of size 4684kb and would perform 91
ArrayHandle copies or assignments. With this branch the binary size is
reduced to 2392kb and will perform 36 copies or assignments.
2016-01-19 09:20:49 -05:00
Robert Maynard
05d397cbf7 Remove unnecessary template parameters from DispatcherMapField
DispatcherMapField was templated on the device adapter but it
actually doesn't need to be, only BasicInvoke and subsequent
methods need to be templated on the device.
2015-10-22 12:22:59 -04:00
Kenneth Moreland
99ce66c6fe Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.

This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.

This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
2015-10-07 17:01:42 -06:00
Robert Maynard
6b8e7822be The Copyright statement now has all the periods in the correct location. 2015-05-21 10:30:11 -04:00
Kenneth Moreland
dad18e1170 Improve 3D scheduling mechanism in DispatcherBase
Previously, DispatcherBase had an ivar to determine whether to use the
numInstances passed on the stack or to use a 3D range held in a
different ivar. This change allows either a 1D range or 3D range to be
passed on the stack, which I expect to be closer to how we we handle
this when 3D ranges are fully supported.

This also fixes a bug introduced with commit
fdac208acbfa47b613d899a36cefc32a01e8f0a8 where the Use3DSchedule ivar
was not set correctly in UnitTestDispatcherBase.
2015-05-05 08:46:23 -06:00
Robert Maynard
fdac208acb use cuda scheduling versus thrust scheduling. 2015-04-03 10:18:05 -04:00
Kenneth Moreland
80809a8f0f Add DispatcherMapField.
This is a simple version of a dispatcher, but an important one.

Note that there is an issue brought up with UnitTestWorkletMapField in
that there needs to be better ways to specify worklet argument types.
2014-10-21 13:10:00 -06:00