Commit Graph

11 Commits

Author SHA1 Message Date
Robert Maynard
022898072c Update Benchmark code to properly verify all algorithms. 2015-10-28 14:20:06 -04:00
Robert Maynard
f38673f618 Replace ErrorControlOutOfMemory with ErrorControlBadAllocation. 2015-10-01 14:25:28 -04:00
Robert Maynard
72450e87f3 Make thrust use fast paths when doing sort and scan.
By introducing our own custom thrust execution policy we can make sure
to hit the fastest code paths in thrust for the sort operation. This makes
sure that for UInt32,Int32, and Float32 we use the radix sort from thrust
which offers a 2x to 3x speed improvement over the merge sort implementation.

Secondly by telling thrust that our BinaryOperators are commutative we
make sure that we get the fastest code paths when executing Inclusive
and Exclusive Scan

Benchmark 'Radix Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0117049s
  median abs dev = 0.00324614s
  mean = 0.0167615s
  std dev = 0.00786269s
  min = 0.00845875s
  max = 0.0389063s
Benchmark 'Radix Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0234463s
  median abs dev = 0.000317249s
  mean = 0.021452s
  std dev = 0.00470307s
  min = 0.011255s
  max = 0.0250643s
Benchmark 'Merge Sort on 1048576 random values vtkm::Int32' results:
  median = 0.0310486s
  median abs dev = 0.000182129s
  mean = 0.0286914s
  std dev = 0.00634102s
  min = 0.0116225s
  max = 0.0317379s
Benchmark 'Merge Sort on 1048576 random values vtkm::Float32' results:
  median = 0.0310617s
  median abs dev = 0.000193583s
  mean = 0.0295779s
  std dev = 0.00491531s
  min = 0.0147257s
  max = 0.032307s
2015-09-03 16:00:37 -04:00
Robert Maynard
ab59e34a2f Rename pragma header guard so it makes sense for tbb and thrust.
Boost is not the only thirdparty that we are supressing warnings for, so
make the name more generic.
2015-08-13 09:04:23 -04:00
Kenneth Moreland
c637bf94b1 The use of is_sorted in Benchmarker.h was ambiguous
Benchmarker provides its own implementation of is_sorted since this
method was not introduced until C++11 and not all compilers necessarily
support it. However, for those that did, the system is_sorted conflicted
with the provided is_sorted. To get around the problem, specify the full
namespace of the is_sorted being used (which is standard practice in
VTK-m anyway).
2015-08-12 09:16:54 -06:00
Will Usher
046cd2d2b9 Change StorageBasic to use an aligned allocator.
The storage used will now be aligned to `VTKM_CACHE_LINE_SIZE bytes,
resulting in slightly better cache usage and load/store performance.
This define is set in `StorageBasic.h We also now detect if Posix is
available in Configure.h and will define VTKM_POSIX with _POSIX_VERSION
if it's available.

The AlignedAllocator used by StorageBasic is also STL compatible
and can be used in STL containers so user's can use it in their
std::vector and pass aligned user memory to the storage.
2015-08-11 13:42:55 -06:00
Will Usher
1ea6f73297 Add our own version of is_sorted to check the assert
Also caught a bug where I incorrectly assumed abs_deviations would be sorted
in MedianAbsDeviation
2015-08-03 10:56:59 -06:00
Will Usher
311b5dcc6b Remove C++11 feature is_sorted and increase time alloted to run bench 2015-07-31 16:33:04 -06:00
Kenneth Moreland
21b3b318ba Always disable conversion warnings when including boost header files
On one of my compile platforms, GCC was giving conversion warnings from
any boost include that was not wrapped in pragmas to disable conversion
warnings. To make things easier and more robust, I created a pair of
macros, VTKM_BOOST_PRE_INCLUDE and VTKM_BOOST_POST_INCLUDE, that should
be wrapped around any #include of a boost header file.
2015-07-30 17:40:40 -06:00
Will Usher
e982ebe41e Measurement and general improvements to the benchmark suite
- A warm up run is done and not timed to allow for any allocation of
  room for output data without accounting for it in the run times.
Previously this time spent allocating memory would be included in the
time we measured for the benchmark.

- Benchmarks are run multiple times and we then compute some statistics
  about the run time of the benchmark to give a better picture of the
expected run time of the function. To this end we run the benchmark
either 500 times or for 1.5s, whichever comes sooner (though these are
easily changeable). We then perform outlier limiting by Winsorising the
data (similar to how Rust's benchmarking library works) and print out
the median, mean, min and max run times along with the median absolute
deviation and standard deviation.

- Because benchmarks are run many times they can now perform some
  initial setup in the constructor, eg. to fill some test input data
array with values to let the main benchmark loop run faster.

- To allow for benchmarks to have members of the data type being
  benchmarked the struct must now be templated on this type, leading to
a bit of awkwardness. I've worked around this by adding the
`VTKM_MAKE_BENCHMARK` and `VTKM_RUN_BENCHMARK` macros, the make
benchmark macro generates a struct that has an `operator()` templated on
the value type which will construct and return the benchmark functor
templated on that type. The run macro will then use this generated
struct to run the benchmark functor on the type list passed. You can
also pass arguments to the benchmark functor's constructor through the
make macro however this makes things more awkward because the name of
the MakeBench struct must be different for each variation of constructor
arguments (for example see `BenchLowerBounds`).

- Added a short comment on how to add benchmarks in
  `vtkm/benchmarking/Benchmarker.h` as the new system is a bit different
from how the tests work.

- You can now pass an extra argument when running the benchmark suite to
  only benchmark specific functions, eg. `Benchmarks_TBB
BenchmarkDeviceAdapter ScanInclusive Sort` will only benchmark
ScanInclusive and Sort. Running without any extra arguments will run all
the benchmarks as before.
2015-07-28 11:03:28 -06:00
Will Usher
238d4fa759 Adding micro benchmark suite 2015-07-09 13:56:06 -06:00