All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
While they prevent legit write past the array boundary error
those fixes introduced regression in behavior when having exact
max_hits transparent intersections and nothing else.
Previous code would have considered such case a totally opaque,
but it's not correct.
Fixes T48941: Some materials don't get transparent shadows anymore
It was possible to miss bounces termination criteria in this functions,
mainly when max_hits was set to 0.
Made the check more robust in traversal functions (which should not
affect performance, it's an operation of same complexity AFAIK).
Also avoid doing ray-scene intersection from shadow_blocked when
limit of transparent bounces was already reached.
This way restrict can be used for CUDA and OpenCL as well.
From quick tests in areas i've been testing this it might give some
barely measurable %% of speedup, but it increases registers pressure.
So use of this qualifier is still really limited.
Using camel case for variables is something what didn't came from our original
code, but rather from third party libraries. Let's avoid those as much as possible.
Matches better naming of volume traversal files, where we've got
optimized versions of a single step of volume intersection and
traversal which will gather all volume intersections.
BVH traversal is not really that much a geometry and we've got
quite some traversals now. Makes sense to keep them separate in
the name of source structure clarity.