There was a known issue where PointLocatorUniformGrid would quickly quit
once it found a point. Instead, look at one more level of bins just in
case there is a closer one near the boundary. (Still not guaranteed, but
likely.)
Also, fix a typo that caused some bins in the y and z direction to not
be searched.
The vtkm::exec::PointLocatorUniformGrid has a default constructor. It
was "helpfully" declared as VTKM_EXEC_CONT, but apparently that is the
wrong thing to do for constructors that are set to default.
Previously when PointLocatorUniformGrid.h was compiled by the CUDA
compiler, the compiler would crash. Apparently during the ptxas
part of the compiler goes into a crazy recursion and runs out of
stack space. This appears to be a long-standing bug in CUDA
(been there for multiple releases) without a clear reason why it
sometimes rears its ugly head. (See for example
https://devtalk.nvidia.com/default/topic/1028825/cuda-programming-and-performance/-ptxas-died-with-status-0xc00000fd-stack_overflow-/)
The problem appears to be when having a doubly or triply nested
loop over a box of values to check in the uniform array. This
appears to fix the problem by converting that to a single for
loop with some index magic to convert that to 3D indices.