Root of the issue goes back to the on-fly normals commit and the
latest fix for it wasn't actually correct. I've mixed two fixes
in there.
So the idea here goes back to storing negative scaled object flag
and flip runtime-calculated normal if this flag is set, which is
pretty much the same as the original fix for the issue from me.
The issue with motion blur wasn't caused by the rumtime normals
patch and it had issues before, because it already did runtime
normals calculation. Now made it so motion triangles takes the
negative scale flag into account.
This actually makes code more clean imo and avoids rather confusing
flipping code in mesh.cpp.
We can't really make CPU and GPU results look the same in all possible
circumstances, but here we can make them look close enough to each other
by making it so sobol pattern for bounce number is the smae for both
CPU and GPU.
This makes CPU and GPU render results look the same with low number of
samples, high number of samples was never an issue.
Fix T41115: Motion Blur renders Objects Black - But not in Viewport Preview
This actually extends previous fix to normals and makes it all much nicer now.
Worth doing some intense testing, quick one worked just fine but there always
could be some corner cases.
Fix T41079: Solid black render of object with negative scale and smooth shading
In both cases the issue was caused by negative scaled objects with single mesh
users for which scale gets applied when using static BVH.
Since the on-fly normals calculation land normals for such cases weren't flipped
leading them to point to a wrong direction.
Added a special object flag for this, which is a bit of a bummer because now
we've got less bits for real useful things, but this is the only way to get
proper normals without adding more complexity in the on-fly calculations.
* malloc() is used now, which is supported since sm_20: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations The performance of this needs to be tested on various cards still.
* This also works for Heterogeneous Decoupled Ray Marching, but in this case I get sporadic "Illegal Address" errors on my Geforce 540, therefore I did not remove the GPU check in kernel_volume_use_decoupled() yet.
I would appreciate some tests from people who compile themselves, enable Volumetrics in kernel_types.h.
* CUDA can be compiled with Volume support again, change line 78 kernel_types.h for that.
Volumes are still fragile on GPU though, got some Memory/Address CUDA errors in tests.. needs to be investigated more deeply.
Volume scatter might happen before path termination, so
need to check transparent bounces and consider shadow an
opaque when max transparent bounces are reached.
TODO: CPU code seems to have different branching in conditions
which made me thinking it does different things with volume
attenuation, but from the render results it seems the same
exact things are happening there. Worth looking into making
simplifying code a bit here to improve readability.
It turns out that the new Beckmann sampling function doesn't work well with
Quasi Monte Carlo sampling, mainly near normal incidence where it can be worse
than the previous sampler. In the new sampler the random number pattern gets
split in two, warped and overlapped, which hurts the stratification, see the
visualization in the differential revision.
Now we use a precomputed table, which is much better behaved. GGX does not seem
to benefit from using a precomputed table.
Disadvantage is that this table adds 1MB of memory usage and 0.03s startup time
to every render (on my quad core CPU).
Differential Revision: https://developer.blender.org/D614
* Anisotropic BSDF now supports GGX and Beckmann distributions, Ward has been
removed because other distributions are superior.
* GGX is now the default distribution for all glossy and anisotropic nodes,
since it looks good, has low noise and is fast to evaluate.
* Ashikhmin-Shirley is now available in the Glossy BSDF.
* Ashikhmin-Shirley anisotropic BSDF was added as closure
* Anisotropic BSDF node now has two distributions
Reviewers: brecht, dingto
Differential Revision: https://developer.blender.org/D549
Samples render slower than before, but hopefully this is made up for with
reduced noise in most cases. The main slowdown comes from samples that would
previously be wasted and turn out black, which are now continued.
GGX sampling is about the same speed as before, while for Beckmann it is slower
still. Perhaps optimizations are still possible there, but didn't find anything
easy.
Code from this paper, which comes with sample code:
Importance Sampling Microfacet-Based BSDFs using the Distribution of Visible Normals.
E. Heitz and E. d'Eon, EGSR 2014
Differential Revision: https://developer.blender.org/D572
This gives you "Multiple Importance", "Distance" and "Equiangular" choices.
What multiple importance sampling does is make things more robust to certain
types of noise at the cost of a bit more noise in cases where the individual
strategies are always better.
So if you've got a pretty dense volume that's lit from far away then distance
sampling is usually more efficient. If you've got a light inside or near the
volume then equiangular sampling is better. If you have a combination of both,
then the multiple importance sampling will be better.
* Volume multiple importace sampling support to combine equiangular and distance
sampling, for both homogeneous and heterogeneous volumes.
* Branched path "Sample All Direct Lights" and "Sample All Indirect Lights" now
apply to volumes as well as surfaces.
Implementation note:
For simplicity this is all done with decoupled ray marching, the only case we do
not use decoupled is for distance only sampling with one light sample. The
homogeneous case should still compile on the GPU because it only requires fixed
size storage, but the heterogeneous case will be trickier to get working.
* Added support for uchar4 attributes to Cycles' attribute system.
* This is used for Vertex Colors now, which saves some memory (4 unsigned characters, instead of 4 floats).
* GPU Texture Limit on sm_20 and sm_21 decreased from 95 to 94, because we need a new texture for the uchar4 attributes. This is no problem for sm_30 or newer.
Part of my GSoC 2014.
This kernel is compiled with AVX2, FMA3, and BMI compiler flags. At the moment only Intel Haswell benefits from this, but future AMD CPUs will have these instructions as well.
Makes rendering on Haswell CPUs a few percent faster, only benchmarked with clang on OS X though.
Part of my GSoC 2014.
Instead of pre-calculation and storage, we now calculate the face normal during render.
This gives a small slowdown (~1%) but decreases memory usage, which is especially important for GPUs,
where you have limited VRAM.
Part of my GSoC 2014.