Make detecting if we are cuda 3+ gpu running cuda 2 code faster.
The original implementing tried to run 2^31 kernels and detect a
launch failure to determine this use-case. The issue with this approach
is that on a cuda 3+ gpu, this would take multiple seconds and cause
the gpu to terminate the kernel when opengl was also loaded.
See merge request !104
The Invoke of the topology dispatcher is also changed to expect a
concrete cell set (which the DynamicCellSet is automatically cast to)
rather than a connectivity structure. The dispatcher calls the
GetNodeToCellConnectivity method for you. (That is currently the only
one supported.)
The original implementing tried to run 2^31 kernels and detect a
launch failure to determine this use-case. The issue with this approach
is that on a cuda 3+ gpu, this would take multiple seconds and cause
the gpu to terminate the kernel when opengl was also loaded.
Previously, IteratorFromArrayPortal was declaring its difference_type
to be vtkm::Id. Although this is allowed, there is code that assumes
that iterators have a difference_type that is ptrdiff_t or something
similar. This change makes the difference_type the default for the
boost iterator facade, which should be the type other code that\
neglects to check expects.
- A warm up run is done and not timed to allow for any allocation of
room for output data without accounting for it in the run times.
Previously this time spent allocating memory would be included in the
time we measured for the benchmark.
- Benchmarks are run multiple times and we then compute some statistics
about the run time of the benchmark to give a better picture of the
expected run time of the function. To this end we run the benchmark
either 500 times or for 1.5s, whichever comes sooner (though these are
easily changeable). We then perform outlier limiting by Winsorising the
data (similar to how Rust's benchmarking library works) and print out
the median, mean, min and max run times along with the median absolute
deviation and standard deviation.
- Because benchmarks are run many times they can now perform some
initial setup in the constructor, eg. to fill some test input data
array with values to let the main benchmark loop run faster.
- To allow for benchmarks to have members of the data type being
benchmarked the struct must now be templated on this type, leading to
a bit of awkwardness. I've worked around this by adding the
`VTKM_MAKE_BENCHMARK` and `VTKM_RUN_BENCHMARK` macros, the make
benchmark macro generates a struct that has an `operator()` templated on
the value type which will construct and return the benchmark functor
templated on that type. The run macro will then use this generated
struct to run the benchmark functor on the type list passed. You can
also pass arguments to the benchmark functor's constructor through the
make macro however this makes things more awkward because the name of
the MakeBench struct must be different for each variation of constructor
arguments (for example see `BenchLowerBounds`).
- Added a short comment on how to add benchmarks in
`vtkm/benchmarking/Benchmarker.h` as the new system is a bit different
from how the tests work.
- You can now pass an extra argument when running the benchmark suite to
only benchmark specific functions, eg. `Benchmarks_TBB
BenchmarkDeviceAdapter ScanInclusive Sort` will only benchmark
ScanInclusive and Sort. Running without any extra arguments will run all
the benchmarks as before.
The test_equal method compares the ratio of the two values to decide if
they are close enough. Although there is a previous check to make sure
that neither value is too close to zero, the MSVC sometimes gives a
warning because it cannot trace the flow of the check. Add another
conditional (that will never actually be executed) to check a second time
that we never divide by 0.
The DynamicCellSet will be used in place of the pointer to a CellSet
in a DataSet. This will prevent us from having to cast it all the time
and also remove reliance on boost smart_ptr.
MSVC ArrayHandle fail
Fix the fact that UnitTestArrayHandle is failing on the Windows dashboards. Also fix some of the MSVC warnings.
See merge request !101
The test was creating a large array on the stack, and this caused a
problem on Windows for some reason. Instead of putting the array on
the stack, use an std::vector. Also reduced the size of the array
used. It seemed unnecessarily large.
Also re-enabled the tests where VTK-m allocates its own ArrayHandle
data.
Conform DataSet classes to coding practices better
The most common changes were making class members uppercase and spelled
out, adding "this->" whenever a class member is used, and declare
functions and members with export macros. Also fixed some uses of int
(instead of vtkm::Id or something similar) and a bit of indentation. I
also sprinkled some const goodness over the code.
It should be noted that I had about a week delay between first making
these changes and checking them in. In the mean time Sujin also made
some similar changes, so there might be some repetative changes.
See merge request !100
ArrayHandle::GetPortalControl should only be used when the array is
going to be modified. Using it on a const ArrayHandle should be a
compiler error (although some compilers are not always giving an error).
Also, GetPortalControl has a side effect where any data in the excution
environment gets wiped out (because it might change), which can lead to
inefficiencies when used unnecessarily.
The most common changes were making class members uppercase and spelled
out, adding "this->" whenever a class member is used, and declare
functions and members with export macros. Also fixed some uses of int
(instead of vtkm::Id or something similar) and a bit of indentation. I
also sprinkled some const goodness over the code.
It should be noted that I had about a week delay between first making
these changes and checking them in. In the mean time Sujin also made
some similar changes, so there might be some repetative changes.
Fix 64bit and doubles
The CMake flag and define differ in their capitalization of the 'm' in
VTKm so I've made CMake variables that match those used in Configure.h.in,
thereby keeping the original naming of VTKm in CMake code, and VTKM in the
C++ code.
This fix also revealed some areas in CellSet and CellSetExplicit where ints
where used instead of vtkm::Ids which caused errors with child classes who
override the methods and returned a vtkm::Id instead of an int.
Also fixed issues that appeared in TestOutOfMemory which got out of date due
to not being compiled since the `VTKM_USE_64BIT_IDS` flag would never be set.
The test now runs and passes when 64bit ids are enabled.
See merge request !99
The CMake flag and define differ in their capitalization of the 'm' in
VTKm so I've made CMake variables that match those used in Configure.h.in,
thereby keeping the original naming of VTKm in CMake code, and VTKM in the
C++ code.
This fix also revealed some areas in CellSet and CellSetExplicit where ints
where used instead of vtkm::Ids which caused errors with child classes who
override the methods and returned a vtkm::Id instead of an int.
Also fixed issues that appeared in TestOutOfMemory which got out of date due
to not being compiled since the `VTKM_USE_64BIT_IDS` flag would never be set.
The test now runs and passes when 64bit ids are enabled.
Component Min/Max for Vec classes
Changes the <code>vtkm::Min</code> and <code>vtkm::Max</code> methods in Math.h to correctly handle <code>vtkm::Vec</code> and similar classes by doing a per component operation.
See merge request !96
Remove the usage of enable_if as return type to reduce binary size.
Using enable_if/disable_if as a return type has a negative impact on
binary size compared to use a boost::true/false_type as a method parameter.
For comparison the WorkletTests_TBB sees a 6-7% reduction in binary size when
compiled with O3.
Origin WorkletTests_TBB size details:
__TEXT __DATA __OBJC others dec hex
2363392 49152 0 4297793536 4300206080 1004ff000
Updated WorkletTests_TBB size details:
__TEXT __DATA __OBJC others dec hex
2215936 49152 0 4297568256 4299833344 1004a4000
See merge request !90
Using enable_if/disable_if as a return type has a negative impact on
binary size compared to use a boost::true/false_type as a method parameter.
For comparison the WorkletTests_TBB sees a 6-7% reduction in binary size when
compiled with O3.
Origin WorkletTests_TBB size details:
__TEXT __DATA __OBJC others dec hex
2363392 49152 0 4297793536 4300206080 1004ff000
Updated WorkletTests_TBB size details:
__TEXT __DATA __OBJC others dec hex
2215936 49152 0 4297568256 4299833344 1004a4000
Reduce the length of generated TestBuild files.
VisualStudio 2010 has issues with files with extremely long names, and CMake
will generate warnings when it detects problematic files.
See merge request !93
The initial implementation forgot about the fact that SM2.X architectures
can only handle 65k blocks. Now we gracefully handle when compiling for
SM2.X.
Correct issues compiling on OSX 10.8 with CUDA 6.5
Removed the MultiParam test as we have PointElevation which does the exact
same thing.
See merge request !87