Commit Graph

25 Commits

Author SHA1 Message Date
Campbell Barton
e12c08e8d1 ClangFormat: apply to source, most of intern
Apply clang format as proposed in T53211.

For details on usage and instructions for migrating branches
without conflicts, see:

https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-17 06:21:24 +02:00
f77cdd1d59 Code cleanup: deduplicate some branched and split kernel code.
Benchmarks peformance on GTX 1080 and RX 480 on Linux is the same for
bmw27, classroom, pabellon, and about 2% faster on fishy_cat and koro.
2017-09-13 15:24:14 +02:00
Mathieu Menuet
659ba012b0 Cycles: change AO bounces approximation to do more glossy and transmission.
Rather than treating all ray types equally, we now always render 1 glossy
bounce and unlimited transmission bounces. This makes it possible to get
good looking results with low AO bounces settings, making it useful to
speed up interior renders for example.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2818
2017-09-12 15:37:35 +02:00
cfa8b762e2 Code cleanup: move rng into path state.
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
2017-08-19 18:14:16 +02:00
7542282c06 Code cleanup: make DebugData part of PathRadiance. 2017-08-13 01:19:07 +02:00
Mai Lavelle
6238214159 Cycles: Faster split branched path tracing by sharing samples with inactive threads
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.

Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.

Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
2017-06-10 04:08:49 -04:00
Lukas Stockner
ef816f9cff Cycles: Fix the AO replacement option in the split kernel
Currently the code for it was inside the hair-specific part, so it wouldn't be enabled in hairless renders.
2017-04-11 01:07:49 +02:00
Hristo Gueorguiev
8ada7f7397 Cycles: Remove ccl_addr_space from RNG passed to functions
Simplifies code quite a bit, making it shorter and easier to extend.
Currently no functional changes for users, but is required for the
upcoming work of shadow catcher support with OpenCL.
2017-03-27 10:46:28 +02:00
Sergey Sharybin
26620f3f87 Cycles: Avoid some ccl_local in various kernels 2017-03-16 11:27:17 +01:00
Sergey Sharybin
76acaefdd7 Cycles: Cleanup, wipe obviously outdated parts of split kernel comments 2017-03-13 17:16:16 +01:00
Hristo Gueorguiev
f169ff8b88 Fix T50925: Add AO approximation to split kernel 2017-03-13 11:15:58 +01:00
Hristo Gueorguiev
57e26627c4 Cycles: SSS and Volume rendering in split kernel
Decoupled ray marching is not supported yet.

Transparent shadows are always enabled for volume rendering.

Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
2017-03-09 17:09:37 +01:00
Mai Lavelle
230c00d872 Cycles: OpenCL split kernel refactor
This does a few things at once:

- Refactors host side split kernel logic into a new device
  agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
  allows as many threads as will fit in memory regardless of tile size, which
  can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
  number of arguments passed to kernels. Means there's less code to deal
  with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
  other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
2017-03-08 00:52:41 -05:00
Sergey Sharybin
53fa389802 Cycles: Use dedicated debug passes for traversed nodes and intersection tests
This way it's more clear whether some issue is caused by lots of geometry in
the node or by lots of "transparent" BVH nodes.
2017-01-12 13:44:35 +01:00
Lukas Stockner
2dccf5a6e8 Cycles: Fix OpenCL split kernel compilation after recent CUDA 8 performance fix 2016-10-07 18:50:43 +02:00
Sergey Sharybin
25aea19323 Cycles: Remove some unused variables from split kernel function 2016-01-29 18:54:46 +01:00
Sergey Sharybin
dc9e0b819b Cycles: Remove unused argument from the split kernel functions
Should be no functional changes, just simplifies operation with kernels.
2015-11-01 17:22:42 +05:00
Sergey Sharybin
4ca688a963 Cycles: OpenCL split kernel cleanup, move casts from .h files to .cl files
Ideally we shouldn't use char* at all, but for now we have to, so at least
let's assume common .h files are free from pointer magic.
2015-10-29 21:52:56 +05:00
Sergey Sharybin
b9f89b1647 Cycles: Code cleanup in split kernel, whitespaces 2015-07-03 11:03:56 +02:00
Sergey Sharybin
596eadf0e1 Cycles: Add debug pass which shows number of instance pushes during camera ray intersection
TODO: We might want to refactor debug passes into PASS_DEBUG and some
debug_type (similar to Blender's side passes) to avoid issue of running
out of bits.
2015-06-12 00:12:03 +02:00
Sergey Sharybin
2bd6de5bbb Cycles: Add debug pass showing average number of ray bounces per pixel
Quite straightforward implementation, but still needs some work for the split
kernel. Includes both regular and split kernel implementation for that.

The pass is not exposed to the interface yet because it's currently not really
easy to have same pass listed in the menu multiple times.
2015-06-11 14:53:15 +02:00
Sergey Sharybin
84ad20acef Fix T44833: Can't use ccl_local space in non-kernel functions
This commit re-shuffles code in split kernel once again and makes it so common
parts which is in the headers is only responsible to making all the work needed
for specified ray index. Getting ray index, checking for it's validity and
enqueuing tasks are now happening in the device specified part of the kernel.

This actually makes sense because enqueuing is indeed device-specified and i.e.
with CUDA we'll want to enqueue kernels from kernel and avoid CPU roundtrip.

TODO:
- Kernel comments are still placed in the common header files, but since queue
  related stuff is not passed to those functions those comments might need to
  be split as well.

  Just currently read them considering that they're also covering the way how
  all devices are invoking the common code path.

- Arguments might need to be wrapped into KernelGlobals, so we don't ened to
  pass all them around as function arguments.
2015-05-26 22:54:02 +05:00
Campbell Barton
2c3c477223 Cleanup: warning, spelling 2015-05-26 16:46:33 +10:00
Thomas Dinges
a3ef51bba5 Fix T44833, OpenCL compile error on AMD.
This was broken after the kernel file restructure.
Variables allocated in the __local address space can only be defined
inside a __kernel function.

We probably need to solve this a bit differently once we do the CUDA
kernel split, but this fix shoud be good enough until then.
2015-05-25 01:02:06 +02:00
Sergey Sharybin
2c503d8303 Cycles: Restructure kernel files organization
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.

Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.

Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.

There should be no functional changes.

Reviewers: juicyfruit, dingto

Differential Revision: https://developer.blender.org/D1314
2015-05-22 16:31:34 +05:00