All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
These are complex nodes, and it's conceivable they may end up constant
in some circumstances within node groups, so folding support is useful.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2084