Commit Graph

3086 Commits

Author SHA1 Message Date
Allison Vacanti
4cd791932b Ensure that Pair and Vec are trivial classes.
For std::copy to optimize a copy to memcpy, the valuetype must be both
trivially constructable and trivially copyable.

The new copy benchmarks highlighted an issue that std::copy'ing pairs
and vecs were not optimized to memcpy. For a 256 MiB buffer on my
laptop w/ GCC, the serial copy speeds were:

UInt8:                 10.10 GiB/s
Vec<UInt8, 2>           3.12 GiB/s
Pair<UInt32, Float32>   6.92 GiB/s

After this patch, the optimization occurs and a bitwise copy occurs:

UInt8:                 10.12 GiB/s
Vec<UInt8, 2>           9.66 GiB/s
Pair<UInt32, Float32>   9.88 GiB/s

Check were also added to the Vec and Pair unit tests to ensure that
this classes continue to be trivial.

The ArrayHandleSwizzle test was refactored a bit to eliminate a new
'possibly uninitialized memory' warning introduced with the default
Vec ctors.
2017-10-18 14:58:35 -04:00
Allison Vacanti
d465d03047 Add benchmark to print copy speeds. 2017-10-18 10:53:30 -04:00
Allison Vacanti
b582b07983 Modify Benchmarker to expose samples and reduce iterations. 2017-10-18 10:42:50 -04:00
Allison Vacanti
d0fa70deb5 Reduce overhead and fix bugs in device adapter benchmarks. 2017-10-18 10:14:09 -04:00
Allison Vacanti
05419719d5 Add more options to the device adapter algorithm benchmark. 2017-10-17 16:21:17 -04:00
Sujin Philip
d6ce8000f4 Workaround an Intel compiler bug
Fixes a linker error about not finding 'LinearBVH::ConstructOnDevice'
2017-10-16 12:17:04 -04:00
Sujin Philip
800bcf3124 Merge topic 'fix-intel-link-bug'
ecb99acb Workaround intel compiler bug

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Matt Larsen <mlarsen@cs.uoregon.edu>
Merge-request: !969
2017-10-12 16:46:33 -04:00
Sujin Philip
ecb99acb5e Workaround intel compiler bug
Fixes issue #179
2017-10-12 13:32:39 -04:00
Allison Vacanti
7b66dece45 Add equality operators that handle different handle types.
In generic code, it's a pain to use the equality operators since they
requires the ValueType and Storage to match, else the operator is undefined.
This commit adds operators for such comparisons, as well as a unit test.
2017-10-11 17:25:13 -04:00
Allison Vacanti
1653f20e7c Add missing typedef to portal. 2017-10-11 17:24:05 -04:00
Allison Vacanti
6c2f22b5ce Overcome narrowing warning on MSVC. 2017-10-11 17:24:04 -04:00
Allison Vacanti
1018d981a0 Check for overlap in CopySubRange.
Some parallel copy implementations will not handle this sanely.
2017-10-11 16:52:32 -04:00
Allison Vacanti
374321e027 Use std::copy in TBB copy routines. 2017-10-11 16:52:32 -04:00
Allison Vacanti
825f351d04 Use std::copy in serial Copy implementation.
I had assumed that the compiler would be clever enough to turn the
iterative implementation of Copy into a memcpy, but inspecting the
disassembly on a release GCC build shows that this is not the case,
likely because it can't assume that the memory ranges do not overlap.

Replacing the loop with std::copy speeds things up (about 30-50%) for
most data types, though there is a slight (usually < 5%) slowdown for
Vec types. The uint8 copy improved by a factor of 8.

Comparison:
| Speedup | iteration            | std::copy            | Benchmark (Type) |
|---------|----------------------|----------------------|------------------|
|   1.363 | 0.001590 +- 0.000087 | 0.001166 +- 0.000049 | Copy 2097152 values (vtkm::Float32) |
|   1.487 | 0.003429 +- 0.000185 | 0.002305 +- 0.000146 | Copy 2097152 values (vtkm::Float64) |
|   1.379 | 0.001568 +- 0.000072 | 0.001137 +- 0.000093 | Copy 2097152 values (vtkm::Int32) |
|   1.420 | 0.003410 +- 0.000173 | 0.002402 +- 0.000101 | Copy 2097152 values (vtkm::Int64) |
|   1.303 | 0.001564 +- 0.000083 | 0.001201 +- 0.000078 | Copy 2097152 values (vtkm::UInt32) |
|   7.204 | 0.002441 +- 0.000104 | 0.000339 +- 0.000029 | Copy 2097152 values (vtkm::UInt8) |
|   0.987 | 0.006602 +- 0.000266 | 0.006688 +- 0.000291 | Copy 2097152 values (vtkm::Vec< vtkm::Float32, 4 >) |
|   0.965 | 0.010065 +- 0.000528 | 0.010427 +- 0.000617 | Copy 2097152 values (vtkm::Vec< vtkm::Float64, 3 >) |
|   0.979 | 0.003327 +- 0.000191 | 0.003398 +- 0.000142 | Copy 2097152 values (vtkm::Vec< vtkm::Int32, 2 >) |
|   0.851 | 0.001579 +- 0.000090 | 0.001856 +- 0.000098 | Copy 2097152 values (vtkm::Vec< vtkm::UInt8, 4 >) |
2017-10-11 16:52:32 -04:00
Allison Vacanti
b396716f86 Merge topic 'vertexclustering-reducepoints'
8fabece1 Use median point from cluster as representative vertex.
c7bf0c95 Compute PointIdMap while reducing cluster ids.
5dee7c6a Select input point from cluster rather than averaging.
28e76ddb Update vertex clustering benchmarking code.
e3c9e7bb Optimize cell map computation.
d7669650 Use requested grid in VertexClustering worklet.
0472dc11 Fix warning on Cuda.
3f4e17e2 Add field mapping to VertexClustering.
...

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !960
2017-10-11 16:25:30 -04:00
Sujin Philip
4253d12062 Merge topic 'cell-locator'
41679cb5 Add a CellLocator
02f48cfa Fix multiple declaration of DistributeCellData
9e0650ad Update Newton's Method to return solution status

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !957
2017-10-11 09:37:51 -04:00
Robert Maynard
6c695a1b6e Merge topic 'fixes_184_empty_reduce'
34361dd1 DeviceAdapterAlgorithmSerial ReduceByKey handles zero size key/values

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !965
2017-10-11 08:55:09 -04:00
Sujin Philip
41679cb5f9 Add a CellLocator
Implements a two-level uniform grid cell locator
2017-10-10 14:01:41 -04:00
Sujin Philip
02f48cfaaa Fix multiple declaration of DistributeCellData 2017-10-10 14:01:41 -04:00
Sujin Philip
9e0650adf2 Update Newton's Method to return solution status 2017-10-10 14:01:41 -04:00
Sujin Philip
730fa4390a Update cuda 9 workaround for cuda 9 final release 2017-10-10 11:34:29 -04:00
Allison Vacanti
8fabece187 Use median point from cluster as representative vertex.
Also uses the recently added StableSortIndices worklet to produce the
cell map.
2017-10-10 10:28:51 -04:00
Allison Vacanti
c7bf0c95f4 Compute PointIdMap while reducing cluster ids. 2017-10-10 10:28:51 -04:00
Allison Vacanti
5dee7c6a5d Select input point from cluster rather than averaging.
This is a bit counterintuitive, but choosing a random point from each
cluster rather than averaging them gives a better visual result. The
averages poorly represent an surface that runs through the grid block and
tends to bias the output points towards the center of each block, creating
very noticeable grid artifacts that look blocky.
2017-10-10 10:28:51 -04:00
Allison Vacanti
28e76ddbfd Update vertex clustering benchmarking code.
Reset the timer after each print to make the output more usable.
2017-10-10 10:28:51 -04:00
Allison Vacanti
e3c9e7bbce Optimize cell map computation.
Rather than sorting and reducing the list by key, sorting and uniquing
an index array with an indirection functor is faster.
2017-10-10 10:28:51 -04:00
Allison Vacanti
d766965000 Use requested grid in VertexClustering worklet.
This is to match the default behavior of vtkQuadricClustering. If we
want to add this functionality back, it should go into the filter as
an option that adjusts nDivisions before calling the worklet.
2017-10-10 10:28:51 -04:00
Allison Vacanti
0472dc1198 Fix warning on Cuda.
assert(false && ""); emitted a

"warning : controlling expression is constant"

Replace the assertion with an exception, which is more appropriate here
anyway.
2017-10-10 10:28:51 -04:00
Allison Vacanti
3f4e17e2a2 Add field mapping to VertexClustering. 2017-10-10 10:28:51 -04:00
Allison Vacanti
9c332674ae Remove unused comparison operator. 2017-10-10 10:28:51 -04:00
Allison Vacanti
5420368ae0 Add fields to the cow nose testing dataset. 2017-10-10 10:28:51 -04:00
Allison Vacanti
c2a7e4faba Adapt ReduceByKey to handle ArrayHandleDiscard for output keys.
Often times we don't care about the output keys, and it's useful to
be able to pass an ArrayHandleDiscard into the algorithm to save
memory in these cases.
2017-10-10 10:28:51 -04:00
Allison Vacanti
9fe7cb4542 Add struct to simplify detection of ArrayHandleDiscards.
This makes it easier to adapt algorithms to avoid reading from
discard arrays.
2017-10-10 10:28:51 -04:00
Robert Maynard
34361dd15a DeviceAdapterAlgorithmSerial ReduceByKey handles zero size key/values 2017-10-10 10:12:59 -04:00
Kenneth Moreland
82c97e0d14 Merge topic 'array-copy-regression'
e149dcf1 Specify device adapter for array copy in worklet tests

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !961
2017-10-10 09:30:56 -04:00
Robert Maynard
189b668e54 The interop/cuda headers aren't testable.
The headers in this location presume you are linking/including GL which
isn't done as part of the header tests.
2017-10-09 09:40:52 -04:00
Robert Maynard
e20b64151c Merge topic 'consistent_forward_declares'
f8f1adc9 Forward decleare DeviceAdapterAlgorithm correctly as a struct

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !962
2017-10-09 08:42:38 -04:00
Matt Larsen
86b2757505 Merge topic 'simplify_triangulator'
2326dca0 simplify triangulator

Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !963
2017-10-06 23:30:18 -04:00
Matt Larsen
2326dca0a2 simplify triangulator 2017-10-06 16:08:29 -07:00
Robert Maynard
f8f1adc962 Forward decleare DeviceAdapterAlgorithm correctly as a struct 2017-10-06 09:50:12 -04:00
Allison Vacanti
5ec29128de Merge topic 'install_hooks'
c0a73159 Clean up install/VTK issues in VTKmConfig.cmake.
30f4151b Add missing headers to examples.
a2a55eda Install cuda interop headers.
85062a3a Add version info to installed cmake config files.
75f88b4c Add versioning to VTKM installed include/share dirs.
b3852e8d Add versioning to VTKM libraries.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Robert Maynard <robert.maynard@kitware.com>
Merge-request: !958
2017-10-04 10:55:36 -04:00
Kenneth Moreland
e149dcf1c1 Specify device adapter for array copy in worklet tests
Rob noticed a degridation in performance in some worklet tests when
ArrayCopy was added. I hypothesize that this slowdown is doing the array
copy with TBB instead of serial in the serial tests. (There have been
some checks in the existing code to suggest that some operations in TBB
can be slower than serial.) This change forces the array copy to be on
the device for which we are testing.
2017-10-03 14:18:39 -07:00
Robert Maynard
fe028d828a Merge topic 'marching_cubes_faster_structured_normals'
1147edb1 MarchingCubes now uses Gradient fast paths when possible.
d7d5da4f More changes to Neighborhood code to make it more easy to use.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Sujin Philip <sujin.philip@kitware.com>
Merge-request: !916
2017-10-03 15:44:16 -04:00
Allison Vacanti
a934ab82f8 Merge topic 'stable_sort_keys'
fd311f57 Update worklet::Keys to stable sort keys and not modify input.
d6b2896a Add StableSortIndices worklet.

Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !954
2017-10-03 14:04:53 -04:00
Allison Vacanti
a2a55edaff Install cuda interop headers.
Looks like a missing subdirectory call. Also disabled testbuilds since
these need external OpenGL setup.
2017-10-02 12:33:30 -04:00
Allison Vacanti
75f88b4c46 Add versioning to VTKM installed include/share dirs. 2017-10-02 11:39:10 -04:00
Allison Vacanti
fd311f5716 Update worklet::Keys to stable sort keys and not modify input.
This ensures that the order of the values presented to the
WorkletReduceByKey functor is consistent.

After this change, the key array used to build the worklet::Keys object
is no longer modified. The sorted keys can be obtained by using permuting
the input keys with Keys::GetSortedValuesMap().
2017-09-28 13:02:33 -04:00
Allison Vacanti
d6b2896ad9 Add StableSortIndices worklet.
This worklet produces an index array that permutes a key array into a
stable sorted order.
2017-09-28 13:02:33 -04:00
Dave Pugmire
d6edab8e50 Fix comment for streamline filter. 2017-09-28 08:09:57 -04:00
Dave Pugmire
f303c52f20 fix copyright. 2017-09-28 07:30:38 -04:00