This also updates the configurations to build kernels for compute capability
5.0 cards, when using and older CUDA toolkit version this will be skipped.
Also includes tweaks to improve performance with this version:
* Increase max registers on sm_30, sm_35 and sm_50
* No longer use texture storage on sm_30
The formula was not consistent across Blender and behaved strangely, now it is
a simple linear blend between color1 and min(color1, color2).
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D489
This caused a couple of fireflies in koro_final.blend. The wrong normal would
cause the shading point to be set as backfacing, which triggered another bug
with hair BSDFs on the backface of hair curves. That one is not fixed yet but
there's a comment in the code about it now.
Old algorithm:
Raytrace from one transparent surface to the next step by step. To minimize
overhead in cases where we don't need transparent shadows, we first trace a
regular shadow ray. We check if the hit primitive was potentially transparent,
and only in that case start marching. this gives extra ray cast for the cases
were we do want transparency.
New algorithm:
We trace a single ray. If it hits any opaque surface, or more than a given
number of transparent surfaces is hit, then we consider the geometry to be
entirely blocked. If not, all transparent surfaces will be recorded and we
will shade them one by one to determine how much light is blocked. This all
happens in one scene intersection function.
Recording all hits works well in some cases but may be slower in others. If
we have many semi-transparent hairs, one intersection may be faster because
you'd be reinteresecting the same hairs a lot with each step otherwise. If
however there is mostly binary transparency then we may be recording many
unnecessary intersections when one of the first surfaces blocks all light.
We found that this helps quite nicely in some scenes, on koro.blend this can
give a 50% reduction in render time, on the pabellon barcelona scene and a
forest scene with transparent leaves it was 30%. Some other files rendered
maybe 1% or 2% slower, but this seems a reasonable tradeoff.
Differential Revision: https://developer.blender.org/D473
This was the original code to get things working on old GPUs, but now it is no
longer in use and various features in fact depend on this to work correctly to
the point that enabling this code is too buggy to be useful.
This can for example be useful if you want to manually terminate the path at
some point and use a color other than black.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D454
for one of the input shaders is zero.
This gives about 5% speedup for koro_final.blend. In general this is important
so you can design shaders that run faster for shadows, diffuse bounces, etc, for
example by skipping procedural textures or even using a single fixed color.
Otherwise devices used for display will lock up the UI too much. This means
you might still get 100% CPU for the display device, but for others CPU usage
should be low still.
The check to see if a device is used for display may not be entirely reliable,
it checks if there is a watchdog timeout on the device, but I'm not entirely
sure that always exists for display devices or is disabled for non-display
devices, though some tools like cuda-gdb seem to make the same assumption.
Ref T39559
This makes it easier to have per kernel number of registers. Also, all the
tunable parameters for this are now in kernel.cu, rather than spread over cmake,
scons and device_cuda.cpp.
The problem here was that animation buffers got initialized with zeros in the beginning for unknown parts. Now it gets initialized with the first known value.
The bug's result was that the animation of the pitch started with 0 on first playback and thus any seeking while the pitch is zero resulted in seeking to the beginning.
Nice little mistake, since the invalid mem access only happened once (the first time),
was close to valid mem, and was only used to read, it would not crash often...
Idea and code by Brecht, many thanks!
Reviewers: brecht
Reviewed By: brecht
CC: campbellbarton, dingto
Differential Revision: https://developer.blender.org/D369
This fixes the ptaxs "ACCESS_VIOLATION" error and should allow our Linux and Windows build bots to compile again.
Unfortunately this comes with a performance penalty on sm_2x cards, so this is only a workaround for now. Branched Path is still globally disabled on GPU.