The 8x8x8 is a better launch strategy for most VTK-m kernels.
The current problem is that a couple of VTK-m kernels use a
high number of registers and this number of threads combines to
require too many registers.
What we should do in the longer run is have more controls over
kernel launches on a per kernel basis. This will require VTK-m
to extract the number of registers being used by each kernel
41b8236a2 For GCC 4.8.4 'half' shadows a global variable with that name
770912f99 Correct compiler issues found with GCC 4.8.5 + CUDA 9.2 on summit
b248b2c93 Correct unused-parameter warnings from defaulted methods.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Allison Vacanti <allison.vacanti@kitware.com>
Merge-request: !1666
The consistent API for control to execution memory transfers is
the ArrayHandle class. Previously the tests would verify memory
transfer by calling the ArrayManagerExecution class directly. This
is problematic as the class isn't used by ArrayHandle<T, StorageBasic>.
When TransferInfo is given memory from VirtualObjectTransferShareWithControl
it doesn't have a bound function ptr for the destruction. In those cases
we need to make sure the HostCopyOfDevice is properly deleted, otherwise
we will cause a memory leak.
bdabfbe11 Make sure ArrayPortalUniformPointCoordinates constructor is explicit
b3d951b50 vtkm::Range Include function now requires half as many min/max calls
ddaa0df26 ArrayHandleVirtualCoordinates now calls the proper parent constructor
61e800379 Make sure all execution side CellLocator objects have explicit destructors
307898ff6 Cleanup the CellLocatorBoundingIntervalHierarchy.cxx style.
0f31c69f3 Remove unnecessary constructor from ParameterContainer
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1657
Previously it was calling the ArrayHandle<T,StorageTagVirtual>
constructor and not the ArrayHandleVirtual constructor which
generated a warning with some compilers
There is a small section in the code generated from Math.h.in that is
subject to clang formatting. A recent change reformatted that bit of
Math.h, so we need to update Math.h.in accordingly.
ff687016e For VTK-m libs all includes of DeviceAdapterTagCuda happen from cuda files
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1648
It is very easy to cause ODR violations with DeviceAdapterTagCuda.
If you include that header from a C++ file and a CUDA file inside
the same program we an ODR violation. The reasons is that the C++
versions will say the tag is invalid, and the CUDA will say the
tag is valid.
The solution to this is that any compilation unit that includes
DeviceAdapterTagCuda from a version of VTK-m that has CUDA enabled
must be invoked by the cuda compiler.
d8cc067ca Remove DeviceAdapterError as it isn't needed any more.
Acked-by: Kitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland <kmorel@sandia.gov>
Merge-request: !1649
Fixes#277
DeviceAdapterError existed to make sure that the default device adapter
template was being handled properly. Since the default device adapter doesn't
exist, and nothing is templated over it we can now remove DeviceAdapterError.
9c2920072 UnitTestBoundingIntervalHierarchy handles systems under load better
671c1df5c Timer logs the proper device name when called with an invalid device
d3d66a331 GameOfLife example always uses the proper device adapter
Acked-by: Kitware Robot <kwrobot@kitware.com>
Merge-request: !1645
The UnitTestBoundingIntervalHierarchy has historically had problems
when the machine is already under-load when the algorithm is executed.
By limiting the number of openMP threads the test uses we can
reduce the amount of CPU time slicing that this test causes.
VTK-m now offers a more GPU aware set of defaults for kernel scheduling.
When VTK-m first launches a kernel we do system introspection and determine
what GPU's are on the machine and than match this information to a preset
table of values. The implementation is designed in a way that allows for
VTK-m to offer both specific presets for a given GPU ( V100 ) or for
an entire generation of cards ( Pascal ).
Currently VTK-m offers preset tables for the following GPU's:
- Tesla V100
- Tesla P100
If the hardware doesn't match a specific GPU card we than try to find the
nearest know hardware generation and use those defaults. Currently we offer
defaults for
- Older than Pascal Hardware
- Pascal Hardware
- Volta+ Hardware
Some users have workloads that don't align with the defaults provided by
VTK-m. When that is the cause, it is possible to override the defaults
by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`.
As shown below:
```cpp
ScheduleParameters CustomScheduleValues(char const* name,
int major,
int minor,
int multiProcessorCount,
int maxThreadsPerMultiProcessor,
int maxThreadsPerBlock)
{
ScheduleParameters params {
64 * multiProcessorCount, //1d blocks
64, //1d threads per block
64 * multiProcessorCount, //2d blocks
{ 8, 8, 1 }, //2d threads per block
64 * multiProcessorCount, //3d blocks
{ 4, 4, 4 } }; //3d threads per block
return params;
}
vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);
```