Commit Graph

160 Commits

Author SHA1 Message Date
Chuck Atkins
f74c0d3c88 Remove type conversion related warnings for GCC 2016-03-17 13:05:38 -04:00
Robert Maynard
8683240b85 vtkm::exec::FunctorBase now properly initializes ErrorMessageBuffer. 2016-03-14 16:57:35 -04:00
Matt Larsen
5ddade7a44 Adding some basic documentation on atomics. 2016-03-09 14:29:59 -05:00
Matt Larsen
43131ee02b Adding comments about CAS 2016-03-08 09:58:20 -08:00
Matt Larsen
3b46706e1f Adding compare and swap and removing unsigned atomics 2016-03-08 09:41:02 -08:00
Matt Larsen
e8b08f2e00 Merge branch 'master' into feature/atomics 2016-03-04 08:03:33 -08:00
Robert Maynard
bb90493920 Resolves Issue 52, we now install all vtkm files correctly. 2016-02-22 14:20:35 -05:00
Matt Larsen
2baac9cd8b initial commit of atomic adds 2016-02-10 07:51:31 -08:00
Robert Maynard
07d299209e Merge topic 'fix/ExecutionWholeArray'
5b705a52 Fixing return value for void function

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !334
2016-01-27 15:57:40 -05:00
mclarsen
5b705a5239 Fixing return value for void function 2016-01-27 08:07:20 -08:00
Robert Maynard
821096cfd7 Perform necessary copies when deducing a worklets parameters.
As part of the work to reduce the number of copies of array handles the CUDA
backend was broken. The transportation of stack allocated classes to CUDA
relies on all member variables being value based, not references/pointers.
This correct the issue of sending references to host side memory to CUDA, at
the cost of two copies of the Invocation object.

When we move to C++11 we need to revisit this work and see if std::move
can help reduce the cost of these copies.
2016-01-26 15:08:46 -05:00
Robert Maynard
bd3d29577a Fix ArrayPortalFromThrust to re-enable texture memory fast path. 2016-01-26 14:30:25 -05:00
Robert Maynard
b2cd41d765 Fix ArrayPortalFromThrust to re-enable texture memory fast path. 2016-01-26 14:29:52 -05:00
Robert Maynard
dd85fc1366 Document why we certain classes member variables need to be const ref. 2016-01-19 09:29:55 -05:00
Robert Maynard
c1560e2d3f Perform less unnecessary copies when deducing a worklets parameters.
One of the causes of the large library size and slow compile times has been
that vtkm has been creating unnecessary copies when not needed. When the
objects being copied use shared_ptr this causes a bloom in library size. I
presume this bloom is caused by the atomic increment/decrement that is
required by shared_ptr.

For testing I used the following example:
```
struct ExampleFieldWorklet : public vtkm::worklet::WorkletMapField
{
  typedef void ControlSignature( FieldIn<>, FieldIn<>, FieldIn<>,
                                 FieldOut<>, FieldOut<>, FieldOut<> );
  typedef void ExecutionSignature( _1, _2, _3, _4, _5, _6 );

  template<typename T, typename U, typename V>
  VTKM_EXEC_EXPORT
  void operator()( const vtkm::Vec< T, 3 > & vec,
                   const U & scalar1,
                   const V& scalar2,
                   vtkm::Vec<T, 3>& out_vec,
                   U& out_scalar1,
                   V& out_scalar2 ) const
  {
    out_vec = vec * scalar1;
    out_scalar1 = scalar1 + scalar2;
    out_scalar2 = scalar2;
  }

  template<typename T, typename U, typename V, typename W, typename X, typename Y>
  VTKM_EXEC_EXPORT
  void operator()( const T & vec,
                   const U & scalar1,
                   const V& scalar2,
                   W& out_vec,
                   X& out_scalar,
                   Y& ) const
  {
  //no-op
  }
};

int main(int argc, char** argv)
{
  std::vector< vtkm::Vec<vtkm::Float32, 3> > inputVec;
  std::vector< vtkm::Int32 > inputScalar1;
  std::vector< vtkm::Float64 > inputScalar2;

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleV =
    vtkm::cont::make_ArrayHandle(inputVec);

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS1 =
    vtkm::cont::make_ArrayHandle(inputVec);

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS2 =
    vtkm::cont::make_ArrayHandle(inputVec);

  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOV;
  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS1;
  vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS2;

  std::cout << "Making 3 output DynamicArrayHandles " << std::endl;
  vtkm::cont::DynamicArrayHandle out1(handleOV), out2(handleOS1), out3(handleOS2);

  typedef vtkm::worklet::DispatcherMapField<ExampleFieldWorklet> DispatcherType;

  std::cout << "Invoking ExampleFieldWorklet" << std::endl;
  DispatcherType dispatcher;

  dispatcher.Invoke(handleV, handleS1, handleS2, out1, out2, out3);

}
```

Original vtkm would generate a binary of size 4684kb and would perform 91
ArrayHandle copies or assignments. With this branch the binary size is
reduced to 2392kb and will perform 36 copies or assignments.
2016-01-19 09:20:49 -05:00
Kenneth Moreland
1a538ca196 Merge branch 'scatter-worklets' into 'master'
Scatter in worklets

Add the functionality to perform a scatter operation from input to output in a worklet invocation. This allows you to, for example, specify a variable amount of outputs generated for each input.

See merge request !221
2015-11-11 13:09:47 -05:00
Kenneth Moreland
7b05604a66 Add more tolerance to UnitTestParametricCoordinates
I noticed a failure in a dashboard run of UnitTestParametricCoordinates.
This test uses randomly generated numbers to test the behavior of some
cell shapes, and there was an instance that occured with seed 1447261681
that caused one of the comparisons to be just slightly larger than the
default tolerance but still within reasonable value.

I just increased the tolerance of that particular comparison. Hopefully
this will prevent all future failures.
2015-11-11 10:38:17 -07:00
Robert Maynard
b3687c6f3c Workaround inclusive_scan issues in thrust 1.8.X for complex value types.
The original workaround for inclusive_scan bugs in thrust 1.8 only solved the
issue for basic arithmetic types such as int, float, double. Now we go one
step further and fix the problem for all types.

The solution is to provide a proper implementation of destructive_accumulate_n
and make sure it exists before any includes of thrust occur.
2015-11-09 17:14:30 -05:00
Kenneth Moreland
8ef0a4ee50 Fix conversion warnings.
Recent changes to algorithm implementations caused CellDerivative to be
called in a way such that it gave conversion warnings on some compilers.
Fix that.
2015-11-07 13:13:17 -07:00
Kenneth Moreland
342a57efcd Fix double reference compile error. 2015-11-07 11:23:03 -07:00
Kenneth Moreland
0d394db0ce Fix conversion warnings when using double precision.
There were some conversion warnings issued when the default float was set
to 64-bit. Fixed these (on clang).
2015-11-07 06:35:24 -07:00
Kenneth Moreland
bf03243516 Add ability to multiply any Vec by vtkm::Float64.
This has been requested on the mailing list to make it easier to
interpolate integer vectors.

There are a couple of downsides to this addition. First, it implicitly
casts doubles back to whatever the vector type is, which can cause a
loss of precision. Second, it makes it more likely to get overload
errors when multiplying with Vec. In particular, the operator to cast
Vec of size 1 to the component class had to be removed.
2015-11-07 06:33:50 -07:00
Kenneth Moreland
d44860c3cf Change tetrahedralize filters to use new Scatter mechanism
The tetrahedralize algorithms have been changed to use the Scatter
classes to build indices rather than build them on their own.

To implement this efficiently with structured grids, a new ScatterUniform
class was made. I also added a new execution argument tag that allows
you to get the thread indices object from within the worklet.
2015-11-07 04:57:16 -07:00
Kenneth Moreland
45abbb5c75 Share from indices vector.
Previously, each VecFromPortalPermute (the type that held the from field
values) held its own copy of the indices. For point to cell on
structured grids, this was a lot of repeated data values, which has the
potential to fill up cache and registers. Instead, just use pointer
references.
2015-11-06 18:05:21 -07:00
Kenneth Moreland
f7789f0ed7 Fix issue with const types in Thrust array management
Previously, there was a declaration ConstArrayPortalFromThrust<const T>
in ArrayManagerExecutionThrustDevice. This proved problematic because
values read from the array in the worklet were typed as const T rather
than simply T. Any Vec or Matrix built from that type would then fail
because they are not meant to work with a const value (which means they
have to be set on construction and never changed.

Instead, declare ConstArrayPortalFromThrust<T> and internally set all
the Thrust pointers to have type const T. Also declare other thrust
pointers used as method parameters to have const T rather than T. This
should work as conversion from T to const T should be fine, but not the
other way around.
2015-11-06 18:05:21 -07:00
Kenneth Moreland
7b6e6e4a66 Enable output to input map in fetch mechanism.
This changes the interface to the ThreadIndices classes to have both
input and output indices. It also adds a visit index to ThreadIndices.

Also added the VisitIndex execution signature tag, which relies on this
behavior.
2015-11-06 18:05:20 -07:00
Kenneth Moreland
b0c5a32611 Add Scatter parameters to Invocation.
We are passing in execution objects with the Invocation when the Worklet
is scheduled, but we are not using it yet.
2015-11-06 18:05:20 -07:00
Robert Maynard
97550d5e2d Update Cuda so that UnaryPredictes work with fancy cuda array handles. 2015-11-03 13:28:07 -05:00
T.J. Corona
829c1b1f7f Install missing cuda device backend header. 2015-11-02 16:44:19 -05:00
Robert Maynard
8de216c088 Propagate vtkm::Id3 scheduling down to the ThreadIndex classes.
This now allows for even more efficient construction of uniform point
coordinates when running under the 3d scheduler, since we don't need to go
from 3d index to flat index to 3d index, instead we stay in 3d index
2015-10-20 09:29:41 -04:00
Kenneth Moreland
99ce66c6fe Change Fetches to use ThreadIndices instead of Invocation.
Previously, all Fetch objects received an Invocation object in their
Load and Store methods. The point of this was that it allowed the Fetch
to get data from any of the execution objects. However, every Fetch
either just got data directly from its associated execution object or
else used a secondary execution object (the input domain) to get indices
into their own execution object.

This left two potential areas for improvement. First, pulling data out
of the Invocation object was unnecessarily complicated. It would be much
nicer to get data directly from the associated execution object. Second,
when getting index information from the input domain, it was often the
case that extra computations were necessary (particularly on structured
cell sets). There was no way to share the index information among
Fetches, and therefore the computations were replicated.

This change removes the Invocation from the Fetch Load and Store.
Instead, it passes the associated execution object and a new object type
called the ThreadIndices. The ThreadIndices are customized for the input
domain and therefore have all the information needed for a redirected
lookup. It is also a thread-local object so it can cache computed
indices and save on computation time.
2015-10-07 17:01:42 -06:00
Robert Maynard
9a8809f933 Add CellSetPermutation which allows custom iteration over a cell set.
When you create a CellSetPermutation you provide an array of the cell ids that
you want to iterate. This allows the user to do custom blanking of a data set,
or to do multi iteration over a set of cells.
2015-10-01 09:23:10 -04:00
Robert Maynard
9965977f47 Merge topic 'FetchTagTopologyIn_return_shape_type'
a1f5bc9f FetchTagTopologyIn updated to properly return CellShape.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !209
2015-09-30 10:18:17 -04:00
Robert Maynard
fc79055f76 Add suppression pragmas to exec::Fetch classes 2015-09-24 10:39:48 -04:00
Robert Maynard
a1f5bc9f0a FetchTagTopologyIn updated to properly return CellShape. 2015-09-23 10:45:06 -04:00
Robert Maynard
056f69bf96 Remove unused variable and conversion warnings from cuda code. 2015-09-21 14:17:25 -04:00
Kenneth Moreland
fd21a12f4a Merge branch 'xcode-7-warnings' into 'master'
Xcode 7 warnings

The XCode 7 compiler has a new warning for unused typedefs. The Boost code we use has some instances where this warning gets issued. Suppress these warnings.

See merge request !199
2015-09-17 18:12:31 -04:00
Kenneth Moreland
b15940c1e3 Declare new VTKM_STATIC_ASSERT
This is to be used in place of BOOST_STATIC_ASSERT so that we can
control its implementation.

The implementation is designed to fix the issue where the latest XCode
clang compiler gives a warning about a unused typedefs when the boost
static assert is used within a function. (This warning also happens when
using the C++11 static_assert keyword.) You can suppress this warning
with _Pragma commands, but _Pragma commands inside a block is not
supported in GCC. The implementation of VTKM_STATIC_ASSERT handles all
current cases.
2015-09-17 14:40:39 -06:00
Robert Maynard
9b877ef49b Merge topic 'multiple_backend_example'
fd685210 Always install all device headers even when device isn't enabled.
b1663b24 Add an example of using multiple backends from a single translation unit.
fc0ff69d Methods with try/catch need to be host only.
4d635d64 DeviceAdapter Tags now always exist, and contain if the device is valid.
cf32b430 Teach Configure.h to store if TBB and CUDA are enabled.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !198
2015-09-17 09:49:49 -04:00
Robert Maynard
fd68521066 Always install all device headers even when device isn't enabled.
vtkm_declare_headers now is able to not test headers, by using the
TESTABLE keyword.
2015-09-17 09:28:21 -04:00
Kenneth Moreland
2ff6576c65 Add third party wrappers around boost macros.
The boost assert macros seem to have an issue where they define an
unused typedef. This is causing the XCode 7 compiler to issue a warning.
Since the offending code is in a macro, the warning is identified with
the VTK-m header even though the code is in boost. To get around this,
wrap all uses of the boost assert that is causing the warning in the
third party pre/post macros to disable the warning.
2015-09-16 23:34:49 -06:00
Robert Maynard
1d97f886e0 Remove the thrust pragma statements that are not needed. 2015-09-15 14:20:56 -04:00
Kenneth Moreland
13d4087657 Change ExecutionWholeArray interface to match expected for ArrayPortal
When ExecutionWholeArray is passed to a worklet, it is expected to
behave like an array portal. However, it was missing the
GetNumberOfValues method and the ValueType typedef. These are now added.
2015-09-09 13:30:12 -06:00
Robert Maynard
5b8cc44ed4 Merge branch 'improve_sort_perf_on_thrust' into 'master'
Tell thrust to use fast code paths when using our predicates and operators.

See merge request !176
2015-09-07 10:38:17 -04:00
Hendrik Schroots
801d4dd1e5 Merge topic 'make_cont_export_macro_be_device_host'
0d6dfb1e Make it possible to use Cuda TextureMemory from device/host method.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !181
2015-09-04 13:50:13 -04:00
Robert Maynard
72450e87f3 Make thrust use fast paths when doing sort and scan.
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.

Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan

Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0117049s
  median abs dev = 0.00324614s
  mean = 0.0167615s
  std dev = 0.00786269s
  min = 0.00845875s
  max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0234463s
  median abs dev = 0.000317249s
  mean = 0.021452s
  std dev = 0.00470307s
  min = 0.011255s
  max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0310486s
  median abs dev = 0.000182129s
  mean = 0.0286914s
  std dev = 0.00634102s
  min = 0.0116225s
  max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0310617s
  median abs dev = 0.000193583s
  mean = 0.0295779s
  std dev = 0.00491531s
  min = 0.0147257s
  max = 0.032307s
2015-09-03 16:00:37 -04:00
Robert Maynard
0d6dfb1e40 Make it possible to use Cuda TextureMemory from device/host method. 2015-09-03 11:52:40 -04:00
Kenneth Moreland
20c5819397 Remove unused typedef
A typedef in a method was left over from a copy/paste. Although
harmless, it was causing a (valid) warning on some compilers.
2015-09-02 13:54:54 -07:00
Kenneth Moreland
08f9c04fab Add specialization of topology map fetch for regular point coords
In the special case where you are loading the point coordinates for a
structured grid in a point to cell map (an important use case), create a
VecRectilinearPointCoordinates rather than build a Vec of the values.
This will activate the cell specalizations in previous commits.

These changes also added some flat-to-logical index conversion and vice
versa in ConnectivityStructuredInternals. This change also fixed a bug
in getting cells attached to points in 2D grids. (Actually, technically
someone else fixed it and checked it in first. The changes were merged
during a rebase.)

I also added a specalization to Vec for 1D that implicitly converts
between the 1D Vec and the component. This can be convenient when
templating on the Vec length.
2015-09-02 13:54:51 -07:00
Kenneth Moreland
b58543297a Special implementation of cell derivative for rectilinear cells 2015-09-02 13:50:31 -07:00