Commit Graph

75 Commits

Author SHA1 Message Date
Lukas Stockner
aae2cea28d Cycles: Also support the constant emission speedup for mesh lights
Reviewers: brecht, sergey, dingto, juicyfruit

Differential Revision: https://developer.blender.org/D2220
2016-09-14 18:53:35 +02:00
Sergey Sharybin
29c733e6f2 Fix T49078: Cycles tries to render volume from another render layer when camera is in volume 2016-08-25 10:55:59 +02:00
Sergey Sharybin
6353ecb996 Cycles: Tweaks to support CUDA 8 toolkit
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.

This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.

On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
2016-08-01 15:54:29 +02:00
Sergey Sharybin
f31f740bd0 Cycles: Proper fix for buffer overflow in volume intersect all 2016-07-26 17:16:23 +02:00
Lukas Stockner
654019fa01 Cycles: Fix two numerical issues in the volume code
This hopefully fixes T48383 by avoiding two numerical problems that I found in the volume code.

Reviewers: sergey, dingto, brecht

Reviewed By: sergey, dingto, brecht

Maniphest Tasks: T48383

Differential Revision: https://developer.blender.org/D2051
2016-06-08 03:17:19 +02:00
Sergey Sharybin
14f9a5aa1d Fix T48571: Cycles/GPU - A lot of fireflies on SSS+Volume
Was some accumulated precision error happening.
2016-06-06 15:56:22 +02:00
999d5a6785 Cycles CUDA: reduce stack memory by reusing ShaderData.
57% less for path and 48% less for branched path.
2016-05-23 22:29:24 +02:00
ca03eddfcc Cleanup: remove Cycles layer bits checking in the kernel.
At some point the idea was that we could have an optimization where we could
render multiple render layers without re-exporting the scene, by just updating
the layer bits. We are not doing this now and in practice with the available
render layer control like exclude layers it's not always possible anyway.

This makes it easier to support an arbitrary number of layers in the future
(hopefully this summer), and frees up some useful bits in the kernel.

Reviewed By: sergey, dingto

Differential Revision: https://developer.blender.org/D2020
2016-05-22 17:36:38 +02:00
Sergey Sharybin
792e147e2c Cycles: Fix compilation error of CUDA kernels after recent volume commit
Apparently the code path with malloc() was enabled for CUDA.
2016-05-18 11:15:28 +02:00
Sergey Sharybin
7b356a8565 Cycles: Reduce amount of malloc() calls from the kernel
This commit makes it so malloc() is only happening once per volume and
once per transparent shadow query (per thread), improving scalability of
the code to multiple CPU cores.

Hard to measure this with a low-bottom i7 here currently, but from quick
tests seems volume sampling gave about 3-5% speedup.

The idea is to store allocated memory in kernel globals, which are per
thread on CPU already.

Reviewers: dingto, juicyfruit, lukasstockner97, maiself, brecht

Reviewed By: brecht

Subscribers: Blendify, nutel

Differential Revision: https://developer.blender.org/D1996
2016-05-18 10:14:24 +02:00
Sergey Sharybin
b9d9d93ff9 Fix T48162: GPU render gives wrong results in certain volume setups
ideally this part of code should be de-duplicated across __VOLUME_INTERSECT_ALL
and regular code.
2016-04-20 13:49:54 +02:00
Sergey Sharybin
b8892cac19 Cycles: Yet another fix for camera in volume
Was an embarrassing glitch in original optimization policy,
the for-loops can't be de-duplicated here.
2016-04-14 17:20:17 +02:00
Sergey Sharybin
65f279b770 Cycles: Fix wrong camera in volume check when domain is only visible to camera rays 2016-04-04 19:30:38 +02:00
Sergey Sharybin
ac8f4ba530 Cycles: Fix regression caused by recent camera-in-volume commit
Stupid me forgot that we don't have stop-element in the stack yet.
2016-04-04 18:24:40 +02:00
Sergey Sharybin
ce44ffd74f Cycles: Fix wrong camera-in-volume stack when camera ray hits volume domain twice 2016-04-01 18:03:58 +02:00
Sergey Sharybin
700722f686 Cycles: Cleanup, indent nested preprocessor directives
Quite straightforward, main trick is happening in path_source_replace_includes().

Reviewers: brecht, dingto, lukasstockner97, juicyfruit

Differential Revision: https://developer.blender.org/D1794
2016-03-25 13:55:42 +01:00
Thomas Dinges
35c3e7b522 Cleanup: Remove outdated comment in volume code.
Thanks to jesterking for finding this one.
2016-01-24 12:31:36 +01:00
Thomas Dinges
83e73a2100 Cycles: Refactor how we pass bounce info to light path node.
This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.

This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.

Reviewed by Sergey and Brecht, thanks for some hlp with this.

I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
2016-01-06 23:43:29 +01:00
Thomas Dinges
26bad9e8f0 Cleanup: Fix some typos in volume code comments. 2015-08-31 18:14:51 +02:00
Thomas Dinges
8d15cad449 Cleanup: Typo in comment. 2015-07-04 13:17:29 +02:00
Thomas Dinges
b3def11f5b Cycles: Record all possible volume intersections for SSS and camera checks
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.

Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.

This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain

In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.

Patch by Sergey and myself.

Differential Revision: https://developer.blender.org/D1264
2015-04-29 23:31:06 +02:00
Sergey Sharybin
7aab5c6ca9 Cycles: Fix wrong termination criteria in SSS volume stack update
Another issue spotted with Thomas.
2015-04-30 01:20:17 +05:00
Thomas Dinges
5e423775da Cleanup: Move Cycles volume stack update for subsurface into kernel_volume.h. 2015-04-28 11:20:27 +02:00
Sergey Sharybin
5ff132182d Cycles: Code cleanup, spaces around keywords
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.

Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
2015-03-28 00:15:15 +05:00
Thomas Dinges
064fa4baae Cycles / Decoupled Ray Marching: Skip consecutive empty steps.
This merges consecutive empty steps in the decoupled record function,
which can lead to fewer iterations in the scatter functions.

Only helps slightly though (1%), but doesn't hurt to have this.

Differential Revision: https://developer.blender.org/D873
2015-03-12 13:50:12 +01:00
Sergey Sharybin
ef11be0e77 Cycles: Avoid over-allocation in decouple ray marching
It could have happened with really long rays and small steps.

Step size will be adjusted to the clamped number of steps in order
to preserve render result compatibility as much as possible.

We should probably reformulate this a bit, so it will give the
same looking results without step tweaks. But this new behavior
should already be much better that it was before.
2015-02-18 02:26:24 +05:00
Sergey Sharybin
25f33e058a Fix T43562: Cycles gets stuck with camera in volume in certain setup
The issue was caused by the way how we shoot the ray to see which rays we're
inside which might start bouncing back-n-forth between two close to parallel
intersecting faces.

Real solution would be to record all the intersections when shooting the ray,
but it's kinda tricky on GPU because of needed sorting and uncertainty of
how huge intersection array should be.

For now we'll just limit number of steps in the check so in worst case we'll
have some samples not being correct which will be compensated with further
sampling. Shouldn't be an issue since probability of such a lock is quite
small actually.
2015-02-05 16:10:50 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Thomas Dinges
2e2c24bec1 Cycles: Update some comments in volume code. 2014-11-13 11:30:18 +01:00
Thomas Dinges
727e9dd1bb Cleanup, typo fixes. 2014-11-12 09:14:07 +01:00
Sergey Sharybin
157067acbd Cycles: Speedup for homogenous volumes in decoupled volume sampling
The idea is to avoid memory allocation when only one segment step is to be allocated.
This gives some speedup which is difficult to measure on this trashcan from hell, but
it's about from 7% to 10% in the extreme case with single volume filling the whole of
the viewport. This seems to depends on the phase of the bug-o-meter in the studio.

On the linux boxes it's not that spectacular speedup, it's about 2% on my laptop and
about 3% on the studio desktop. This is likely because of the awesomeness of jemalloc.
2014-11-10 18:47:28 +01:00
Sergey Sharybin
1073c6ae8e Cycles: Don't check shader for volume when checking if camera is inside volume
Intersection code already ignores objects without volume closure so checking it
afterwards is not needed.
2014-11-04 19:57:15 +01:00
Campbell Barton
eaaeae4699 Cleanup: spelling 2014-10-23 10:38:38 +02:00
Campbell Barton
4c60aae66c Cleanup: warnings 2014-10-06 23:19:07 +02:00
Sergey Sharybin
939fa6759c Cycles: Fix for camera-in-volume detection
Ray actually should have infinite length, so we can detect camera in a volume
which is bigger that the far clipping of the camera.

This might also give some speedup (wouldn't expect much tho) because we don't
need to re-calculate ray direction and length after every bounce now.
2014-10-06 12:36:46 +02:00
Sergey Sharybin
7dabfb2048 Cycles: Speedup of kernel side camera-in-volume detection
The idea is to only count intersections with objects which has volumetric shader
and ignore all other objects.

This is probably as fast as we can go without involving some forth level magic.
2014-10-03 12:55:31 +06:00
Sergey Sharybin
21825c4359 Cycles: Avoid temp variable in camera-in-volume check
Was a left-over from some experiments, no need it with the current
implementation, and likely wouldn't need in the future.
2014-09-28 02:35:37 +06:00
Sergey Sharybin
fe731686fb Cycles: Add support for cameras inside volume
Basically the title says it all, volume stack initialization now is aware that
camera might be inside of the volume. This gives quite noticeable render time
regressions in cases camera is in the volume (didn't measure them yet) because
this requires quite a few of ray-casting per camera ray in order to check which
objects we're inside. Not quite sure if this might be optimized.

But the good thing is that we can do quite a good job on detecting whether
camera is outside of any of the volumes and in this case there should be no
time penalty at all (apart from some extra checks during the sync state).

For now we're only doing rather simple AABB checks between the viewplane and
volume objects. This could give some false-positives, but this should be good
starting point.

Need to mention panoramic cameras here, for them it's only check for whether
there are volumes in the scene, which would lead to speed regressions even if
the camera is outside of the volumes. Would need to figure out proper check
for such cameras.

There are still quite a few of TODOs in the code, but the patch is good enough
to start playing around with it checking whether there are some obvious mistakes
somewhere.

Currently the feature is only available in the Experimental feature sey, need
to solve some of the TODOs and look into making things faster before considering
the feature is ready for the official feature set. This would still likely
happen in current release cycle.

Reviewers: brecht, juicyfruit, dingto

Differential Revision: https://developer.blender.org/D794
2014-09-25 23:28:01 +06:00
Thomas Dinges
031620aba2 Cycles: Avoid redundant call to volume_stack_is_heterogeneous() for Distance Sampling. 2014-08-24 16:15:57 +02:00
Thomas Dinges
187d77612b Code refactor: Split __VOLUME__ defines in Cycles.
* __VOLUME__ is basic volume support with Emission and Absorption.
* __VOLUME_SCATTER__ enables volume Scattering support.
* __VOLUME_DECOUPLED__ enables Decoupled Ray Marching.
2014-08-20 23:15:30 +02:00
Thomas Dinges
075f6eff74 Cycles: Further tweak for Decoupled Ray Marching
Avoid some if checks when probalistic_scatter is false.

Differential Revision: https://developer.blender.org/D743
2014-08-20 22:59:08 +02:00
Thomas Dinges
5bdea81319 Cycles: Don't check closure flag in kernel_volume_decoupled_scatter(), we check this before the function already. 2014-08-14 21:25:52 +02:00
Thomas Dinges
32a5313b41 Cycles: Optimize Equi-Angular sampling using binary range search.
Patch by Lukas Tönne and myself.
2014-08-14 20:21:36 +02:00
Thomas Dinges
5af00a3d12 Cycles: Optimization for Heterogeneous Volume Shadows.
* Don't compute expf() for every step, instead sum the intermediate values and calculate it every N (8 for now) steps. This helps a few percent (~5% on a cube with wave texture) in my tests here.
2014-08-14 20:09:25 +02:00
Thomas Dinges
5fefc84783 Cycles: Equi-Angular and MIS Volume sampling work on GPU now.
* malloc() is used now, which is supported since sm_20: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations The performance of this needs to be tested on various cards still.
* This also works for Heterogeneous Decoupled Ray Marching, but in this case I get sporadic "Illegal Address" errors on my Geforce 540, therefore I did not remove the GPU check in kernel_volume_use_decoupled() yet.

I would appreciate some tests from people who compile themselves, enable Volumetrics in kernel_types.h.
2014-07-06 14:00:11 +02:00
Thomas Dinges
5aec61f849 Cycles: Compile fixes for CUDA Volumetrics.
* CUDA can be compiled with Volume support again, change line 78 kernel_types.h for that.

Volumes are still fragile on GPU though, got some Memory/Address CUDA errors in tests.. needs to be investigated more deeply.
2014-07-05 02:04:07 +02:00
4b209f063c Fix T40695: world surface shader incorrectly visible with world volume. 2014-06-24 11:35:48 +02:00
5fa68133c9 Cycles: volume sampling method can now be set per material/world.
This gives you "Multiple Importance", "Distance" and "Equiangular" choices.

What multiple importance sampling does is make things more robust to certain
types of noise at the cost of a bit more noise in cases where the individual
strategies are always better.

So if you've got a pretty dense volume that's lit from far away then distance
sampling is usually more efficient. If you've got a light inside or near the
volume then equiangular sampling is better. If you have a combination of both,
then the multiple importance sampling will be better.
2014-06-14 13:49:56 +02:00
a29807cd63 Cycles: volume light sampling
* Volume multiple importace sampling support to combine equiangular and distance
  sampling, for both homogeneous and heterogeneous volumes.

* Branched path "Sample All Direct Lights" and "Sample All Indirect Lights" now
  apply to volumes as well as surfaces.

Implementation note:

For simplicity this is all done with decoupled ray marching, the only case we do
not use decoupled is for distance only sampling with one light sample. The
homogeneous case should still compile on the GPU because it only requires fixed
size storage, but the heterogeneous case will be trickier to get working.
2014-06-14 13:49:56 +02:00
0780f5915b Fix T39804: cycles smoke domain visible in rendering.
Transparent objects could become subtly visible by the different sampling
patterns for pixels covered and not covered by the object. It still converged
to the right solution but that can take a while. Now we try to use the same
sampling pattern here.
2014-05-29 14:51:02 +02:00