Commit Graph

18 Commits

Author SHA1 Message Date
Mai Lavelle
915766f42d Cycles: Branched path tracing for the split kernel
This implements branched path tracing for the split kernel.

General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.

Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.

Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.

Reviewers: sergey, nirved

Reviewed By: nirved

Subscribers: Blendify, bliblubli

Differential Revision: https://developer.blender.org/D2611
2017-05-02 14:26:46 -04:00
Hristo Gueorguiev
e07ffcbd1c Cycles: Add OpenCL support for shadow catcher feature
The title says it all actually.
2017-03-27 10:46:59 +02:00
Hristo Gueorguiev
8ada7f7397 Cycles: Remove ccl_addr_space from RNG passed to functions
Simplifies code quite a bit, making it shorter and easier to extend.
Currently no functional changes for users, but is required for the
upcoming work of shadow catcher support with OpenCL.
2017-03-27 10:46:28 +02:00
Sergey Sharybin
1cad64900e Cycles: Define ccl_local variables in kernel functions
Declaring ccl_local in a device function is not supported
by certain compilers.
2017-03-16 11:27:17 +01:00
Sergey Sharybin
76acaefdd7 Cycles: Cleanup, wipe obviously outdated parts of split kernel comments 2017-03-13 17:16:16 +01:00
Mai Lavelle
352ee7c3ef Cycles: Remove ccl_fetch and SOA 2017-03-08 00:52:41 -05:00
Mai Lavelle
230c00d872 Cycles: OpenCL split kernel refactor
This does a few things at once:

- Refactors host side split kernel logic into a new device
  agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
  allows as many threads as will fit in memory regardless of tile size, which
  can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
  number of arguments passed to kernels. Means there's less code to deal
  with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
  other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
2017-03-08 00:52:41 -05:00
Lukas Stockner
26bf230920 Cycles: Add optional probabilistic termination of light samples based on their expected contribution
In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways.
To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased.
This is the same approach that's also used in path termination based on length.

Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise.

Reviewers: #cycles

Subscribers: sergey, brecht

Differential Revision: https://developer.blender.org/D2217
2016-10-30 11:31:28 +01:00
Lukas Stockner
b459d9f46c Cycles: Stop lamp sampling if the lamp isn't visible
Both spot and area light have large areas where they're not visible.
Therefore, this patch stops the light sampling code when one of these cases (outside of the spotlight cone or behind the area light) occurs, before the lamp shader is evaluated.
In the case of the area light, the solid angle sampling can also be skipped.

In a test scene with Sample All Lights and 18 Area lamps and 9 Spot lamps that all point away from the area that the camera sees, render time drops from 12sec to 5sec.

Reviewers: brecht, sergey, dingto, juicyfruit

Differential Revision: https://developer.blender.org/D2216
2016-09-14 19:45:12 +02:00
999d5a6785 Cycles CUDA: reduce stack memory by reusing ShaderData.
57% less for path and 48% less for branched path.
2016-05-23 22:29:24 +02:00
Sergey Sharybin
e2161ca854 Cycles: Remove few function arguments needed only for the split kernel
Use KernelGlobals to access all the global arrays for the intermediate
storage instead of passing all this storage things explicitly.

Tested here with Intel OpenCL, NVIDIA GTX580 and AMD Fiji, didn't see
any artifacts, so guess it's all good.

Reviewers: juicyfruit, dingto, lukasstockner97

Differential Revision: https://developer.blender.org/D1736
2016-01-28 18:59:27 +01:00
Thomas Dinges
83e73a2100 Cycles: Refactor how we pass bounce info to light path node.
This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.

This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.

Reviewed by Sergey and Brecht, thanks for some hlp with this.

I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
2016-01-06 23:43:29 +01:00
Sergey Sharybin
dc9e0b819b Cycles: Remove unused argument from the split kernel functions
Should be no functional changes, just simplifies operation with kernels.
2015-11-01 17:22:42 +05:00
Sergey Sharybin
4ca688a963 Cycles: OpenCL split kernel cleanup, move casts from .h files to .cl files
Ideally we shouldn't use char* at all, but for now we have to, so at least
let's assume common .h files are free from pointer magic.
2015-10-29 21:52:56 +05:00
Sergey Sharybin
92022218c2 Cycles: Code cleanup, split kernel 2015-05-27 13:08:17 +05:00
Sergey Sharybin
84ad20acef Fix T44833: Can't use ccl_local space in non-kernel functions
This commit re-shuffles code in split kernel once again and makes it so common
parts which is in the headers is only responsible to making all the work needed
for specified ray index. Getting ray index, checking for it's validity and
enqueuing tasks are now happening in the device specified part of the kernel.

This actually makes sense because enqueuing is indeed device-specified and i.e.
with CUDA we'll want to enqueue kernels from kernel and avoid CPU roundtrip.

TODO:
- Kernel comments are still placed in the common header files, but since queue
  related stuff is not passed to those functions those comments might need to
  be split as well.

  Just currently read them considering that they're also covering the way how
  all devices are invoking the common code path.

- Arguments might need to be wrapped into KernelGlobals, so we don't ened to
  pass all them around as function arguments.
2015-05-26 22:54:02 +05:00
Thomas Dinges
a3ef51bba5 Fix T44833, OpenCL compile error on AMD.
This was broken after the kernel file restructure.
Variables allocated in the __local address space can only be defined
inside a __kernel function.

We probably need to solve this a bit differently once we do the CUDA
kernel split, but this fix shoud be good enough until then.
2015-05-25 01:02:06 +02:00
Sergey Sharybin
2c503d8303 Cycles: Restructure kernel files organization
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.

Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.

Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.

There should be no functional changes.

Reviewers: juicyfruit, dingto

Differential Revision: https://developer.blender.org/D1314
2015-05-22 16:31:34 +05:00