b05bd33d3c
- It also adds Google's benchmarch compare.py script - It is installed to the build directory. - It add a wrapper script called compare-benchmarks.py which: - Let you run each of the benchmarks with different devices - It adds a README.md explaining how to run the benchmarks - BenchmarkDeviceAdapter input size range parametrized at compile time Signed-off-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
121 lines
5.3 KiB
Markdown
121 lines
5.3 KiB
Markdown
# BENCHMARKING VTK-m
|
|
|
|
## TL;DR
|
|
|
|
When configuring _VTM-m_ with _CMake_ pass the flag `-DVTKm_ENABLE_BENCHMARKS=1`
|
|
. In the build directory you will see the following binaries:
|
|
|
|
$ ls bin/Benchmark*
|
|
bin/BenchmarkArrayTransfer* bin/BenchmarkCopySpeeds* bin/BenchmarkFieldAlgorithms*
|
|
bin/BenchmarkRayTracing* bin/BenchmarkAtomicArray* bin/BenchmarkDeviceAdapter*
|
|
bin/BenchmarkFilters* bin/BenchmarkTopologyAlgorithms*
|
|
|
|
Taking as an example `BenchmarkArrayTransfer`, we can run it as:
|
|
|
|
$ bin/BenchmarkArrayTransfer -d Any
|
|
|
|
---
|
|
|
|
## Parts of this Documents
|
|
|
|
0. [TL;DR](#TL;DR)
|
|
1. [Devices](#choosing-devices)
|
|
2. [Filters](#run-a-subset-of-your-benchmarks)
|
|
4. [Compare with baseline](#compare-with-baseline)
|
|
5. [Installing compare.py](#installing-compare-benchmarkspy)
|
|
|
|
---
|
|
|
|
## Choosing devices
|
|
|
|
Taking as an example `BenchmarkArrayTransfer`, we can determine in which
|
|
device we can run it by simply:
|
|
|
|
$ bin/BenchmarkArrayTransfer
|
|
...
|
|
Valid devices: "Any" "Serial"
|
|
...
|
|
|
|
Upon the _Valid devices_ you can chose in which device to run the benchmark by:
|
|
|
|
$ bin/BenchmarkArrayTransfer -d Serial
|
|
|
|
|
|
## Run a subset of your benchmarks
|
|
|
|
_VTK-m_ benchmarks uses [Google Benchmarks] which allows you to choose a subset
|
|
of benchmaks by using the flag `--benchmark_filter=REGEX`
|
|
|
|
For instance, if you want to run all the benchmarks that writes something you
|
|
would run:
|
|
|
|
$ bin/BenchmarkArrayTransfer -d Serial --benchmark_filter='Write'
|
|
|
|
Note you can list all of the available benchmarks with the option:
|
|
`--benchmark_list_tests`.
|
|
|
|
## Compare with baseline
|
|
|
|
_VTM-m_ ships with a helper script based in [Google Benchmarks] `compare.py`
|
|
named `compare-benchmarks.py` which lets you compare benchmarks using different
|
|
devices, filters, and binaries. After building `VTM-m` it must appear on the
|
|
`bin` directory within your `build` directory.
|
|
|
|
When running `compare-benchmarks.py`:
|
|
- You can specify the baseline benchmark binary path and its arguments in
|
|
`--benchmark1=`
|
|
- The contender benchmark binary path and its arguments in `--benchmark2=`
|
|
- Extra options to be passed to `compare.py` must come after `--`
|
|
|
|
### Compare between filters
|
|
|
|
When comparing filters, we only can use one benchmark binary with a single device
|
|
as shown in the following example:
|
|
|
|
```sh
|
|
$ ./compare-benchmarks.py --benchmark1='./BenchmarkArrayTransfer -d Any
|
|
--benchmark_filter=1024' --filter1='Read' --filter2=Write -- filters
|
|
|
|
# It will output something like this:
|
|
|
|
Benchmark Time CPU Time Old Time New CPU Old CPU New
|
|
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
BenchContToExec[Read vs. Write]<F32>/Bytes:1024/manual_time +0.2694 +0.2655 18521 23511 18766 23749
|
|
BenchExecToCont[Read vs. Write]<F32>/Bytes:1024/manual_time +0.0212 +0.0209 25910 26460 26152 26698
|
|
```
|
|
|
|
### Compare between devices
|
|
|
|
When comparing two benchmarks using two devices use the _option_ `benchmark`
|
|
after `--` and call `./compare-benchmarks.py` as follows:
|
|
|
|
```sh
|
|
$ ./compare-benchmarks.py --benchmark1='./BenchmarkArrayTransfer -d Serial
|
|
--benchmark_filter=1024' --benchmark2='./BenchmarkArrayTransfer -d Cuda
|
|
--benchmark_filter=1024' -- benchmarks
|
|
|
|
|
|
# It will output something like this:
|
|
|
|
Benchmark Time CPU Time Old Time New CPU Old CPU New
|
|
---------------------------------------------------------------------------------------------------------------------------------------------------
|
|
BenchContToExecRead<F32>/Bytes:1024/manual_time +0.0127 +0.0120 18388 18622 18632 18856
|
|
BenchContToExecWrite<F32>/Bytes:1024/manual_time +0.0010 +0.0006 23471 23496 23712 23726
|
|
BenchContToExecReadWrite<F32>/Bytes:1024/manual_time -0.0034 -0.0041 26363 26274 26611 26502
|
|
BenchRoundTripRead<F32>/Bytes:1024/manual_time +0.0055 +0.0056 20635 20748 21172 21291
|
|
BenchRoundTripReadWrite<F32>/Bytes:1024/manual_time +0.0084 +0.0082 29288 29535 29662 29905
|
|
BenchExecToContRead<F32>/Bytes:1024/manual_time +0.0025 +0.0021 25883 25947 26122 26178
|
|
BenchExecToContWrite<F32>/Bytes:1024/manual_time -0.0027 -0.0038 26375 26305 26622 26522
|
|
BenchExecToContReadWrite<F32>/Bytes:1024/manual_time +0.0041 +0.0039 25639 25745 25871 25972
|
|
```
|
|
|
|
## Installing compare-benchmarks.py
|
|
|
|
`compare-benchmarks.py` relies on `compare.py` from Google Benchmarks which also
|
|
relies in `SciPy`, you can find instructions [here][SciPy] regarding its
|
|
installation.
|
|
|
|
[Google Benchmarks]: https://github.com/google/benchmark
|
|
[Compare.py]: https://github.com/google/benchmark/blob/master/tools/compare.py
|
|
[SciPy]: https://www.scipy.org/install.html
|