One problem is that it was always using __mm_blendv_ps emulation even if the
instruction was supported. The other that the emulation function was wrong.
Thanks a lot to Ray Molenkamp for tracking this one down.
Would be nice to be able to catch this with assert as well, will see what would
be the best way to do this/.\
Need to verify with Mai that this solves crash for her and maybe consider
porting this to 2.79.
Fishy cat benchmark was rendering with wrong shadows. Cause is unclear,
adding printf or rearranging code seems to avoid this issue, possibly a
compiler bug. This reverts the fix and solves the OSL bug elsewhere.
This was needed when we accessed OSL closure memory after shader evaluation,
which could get overwritten by another shader evaluation. But all closures
are immediatley converted to ShaderClosure now, so no longer needed.
While unlikely to have had any serious effects because of limited use, the
previous implementation was not actually atomic due to a data race and
incorrectly coded CAS loop. We also had duplicates of this code in a few
places, it's now been moved to a single location with all other atomic
operations.
We need to make sure we can store all volume closures for all objects in volume
stack. This is a bit tricky to detect what would be the "nestness" level of
volumes so for now use maximum possible stack depth. Might cause some slowdown,
but better to give reliable render output than to fail quickly.
Should be safe for 2.79 after extra eyes.
We should only early out with any hit in BVH traversal if the only visibility
bits used are opaque shadow. Not when opaque shadow is one of multiple bits.
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
This implements Arvo's "Stratified sampling of spherical triangles". Similar to how we sample rectangular area lights, this is sampling triangles over their solid angle. It does significantly improve sampling close to the triangle, but doesn't do much for more distant triangles. So I added a simple heuristic to switch between the two methods. Unfortunately, I expect this to add render time in any case, even when it does not make any difference whatsoever. It'll take some benchmarking with various scenes and hardware to estimate how severe the impact is and if it is worth the change.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: Vega-core, brecht, SteffenD
Tags: #cycles
Differential Revision: https://developer.blender.org/D2730
This patch adds "Pixel Size" to the performance options, which allows to render
in a smaller resolution, which is especially useful for displays with high DPI.
Reviewers: Severin, dingto, sergey, brecht
Reviewed By: brecht
Subscribers: Severin, venomgfx, eyecandy, brecht
Differential Revision: https://developer.blender.org/D1619
Basically, make re-alloc and memcpy from the same lock, otherwise one
thread might be re-allocating thread while another one is trying to
copy data there.
Reported by Mohamed Sakr in IRC, thanks!