This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.
This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.
Reviewed by Sergey and Brecht, thanks for some hlp with this.
I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.
Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.
This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain
In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.
Patch by Sergey and myself.
Differential Revision: https://developer.blender.org/D1264
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
This merges consecutive empty steps in the decoupled record function,
which can lead to fewer iterations in the scatter functions.
Only helps slightly though (1%), but doesn't hurt to have this.
Differential Revision: https://developer.blender.org/D873
It could have happened with really long rays and small steps.
Step size will be adjusted to the clamped number of steps in order
to preserve render result compatibility as much as possible.
We should probably reformulate this a bit, so it will give the
same looking results without step tweaks. But this new behavior
should already be much better that it was before.
The issue was caused by the way how we shoot the ray to see which rays we're
inside which might start bouncing back-n-forth between two close to parallel
intersecting faces.
Real solution would be to record all the intersections when shooting the ray,
but it's kinda tricky on GPU because of needed sorting and uncertainty of
how huge intersection array should be.
For now we'll just limit number of steps in the check so in worst case we'll
have some samples not being correct which will be compensated with further
sampling. Shouldn't be an issue since probability of such a lock is quite
small actually.
The idea is to avoid memory allocation when only one segment step is to be allocated.
This gives some speedup which is difficult to measure on this trashcan from hell, but
it's about from 7% to 10% in the extreme case with single volume filling the whole of
the viewport. This seems to depends on the phase of the bug-o-meter in the studio.
On the linux boxes it's not that spectacular speedup, it's about 2% on my laptop and
about 3% on the studio desktop. This is likely because of the awesomeness of jemalloc.
Ray actually should have infinite length, so we can detect camera in a volume
which is bigger that the far clipping of the camera.
This might also give some speedup (wouldn't expect much tho) because we don't
need to re-calculate ray direction and length after every bounce now.
The idea is to only count intersections with objects which has volumetric shader
and ignore all other objects.
This is probably as fast as we can go without involving some forth level magic.
Basically the title says it all, volume stack initialization now is aware that
camera might be inside of the volume. This gives quite noticeable render time
regressions in cases camera is in the volume (didn't measure them yet) because
this requires quite a few of ray-casting per camera ray in order to check which
objects we're inside. Not quite sure if this might be optimized.
But the good thing is that we can do quite a good job on detecting whether
camera is outside of any of the volumes and in this case there should be no
time penalty at all (apart from some extra checks during the sync state).
For now we're only doing rather simple AABB checks between the viewplane and
volume objects. This could give some false-positives, but this should be good
starting point.
Need to mention panoramic cameras here, for them it's only check for whether
there are volumes in the scene, which would lead to speed regressions even if
the camera is outside of the volumes. Would need to figure out proper check
for such cameras.
There are still quite a few of TODOs in the code, but the patch is good enough
to start playing around with it checking whether there are some obvious mistakes
somewhere.
Currently the feature is only available in the Experimental feature sey, need
to solve some of the TODOs and look into making things faster before considering
the feature is ready for the official feature set. This would still likely
happen in current release cycle.
Reviewers: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D794
* __VOLUME__ is basic volume support with Emission and Absorption.
* __VOLUME_SCATTER__ enables volume Scattering support.
* __VOLUME_DECOUPLED__ enables Decoupled Ray Marching.
* Don't compute expf() for every step, instead sum the intermediate values and calculate it every N (8 for now) steps. This helps a few percent (~5% on a cube with wave texture) in my tests here.
* malloc() is used now, which is supported since sm_20: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations The performance of this needs to be tested on various cards still.
* This also works for Heterogeneous Decoupled Ray Marching, but in this case I get sporadic "Illegal Address" errors on my Geforce 540, therefore I did not remove the GPU check in kernel_volume_use_decoupled() yet.
I would appreciate some tests from people who compile themselves, enable Volumetrics in kernel_types.h.
* CUDA can be compiled with Volume support again, change line 78 kernel_types.h for that.
Volumes are still fragile on GPU though, got some Memory/Address CUDA errors in tests.. needs to be investigated more deeply.
This gives you "Multiple Importance", "Distance" and "Equiangular" choices.
What multiple importance sampling does is make things more robust to certain
types of noise at the cost of a bit more noise in cases where the individual
strategies are always better.
So if you've got a pretty dense volume that's lit from far away then distance
sampling is usually more efficient. If you've got a light inside or near the
volume then equiangular sampling is better. If you have a combination of both,
then the multiple importance sampling will be better.
* Volume multiple importace sampling support to combine equiangular and distance
sampling, for both homogeneous and heterogeneous volumes.
* Branched path "Sample All Direct Lights" and "Sample All Indirect Lights" now
apply to volumes as well as surfaces.
Implementation note:
For simplicity this is all done with decoupled ray marching, the only case we do
not use decoupled is for distance only sampling with one light sample. The
homogeneous case should still compile on the GPU because it only requires fixed
size storage, but the heterogeneous case will be trickier to get working.
Transparent objects could become subtly visible by the different sampling
patterns for pixels covered and not covered by the object. It still converged
to the right solution but that can take a while. Now we try to use the same
sampling pattern here.
This can for example be useful if you want to manually terminate the path at
some point and use a color other than black.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D454
This now uses decoupled ray marching, and removes the probalistic scattering.
What this means is that each AA sample will be slower but contain less noise,
hopefully giving less render time to reach the same noise levels.
For those following along, there's still a bunch of volume sampling improvements
to do: all-light sampling, multiple importance sampling, transmittance threshold,
better indirect light handling, multiple scatter approximation.
This basically records all volumes steps, which can then later be used multiple
time to take scattering samples, without having to step through the volume
again. From the paper:
"Importance Sampling Techniques for Path Tracing in Participating Media"
This works only on the CPU, due to usage of malloc/free.
Similar to surfaces, this will now always scatter rather than probabilistically
scattering or not depending on the transmittance.
This also makes calculation of branched path throughput non-probalistic, which
makes thing slower too. That's to be solved by decoupled ray marching later.
This removes a few optimizations to avoid exp() calls and division, they will be
added back later, at the moment it's more important to make the code easier to
understand and refactor.
Rather use random point in each step instead of giving the steps random sizes.
Gives a bit more accurate results with large step sizes, but also convenient
convention for later changes.