Commit Graph

421 Commits

Author SHA1 Message Date
Mai Lavelle
d097c72f81 Cycles: Only calculate global size of split kernel once to avoid changes
Global size depends on memory usage which might change during rendering.
Havent seen it happen but seems possible that this could cause the global
size to be different than what was used for allocating buffers.
2017-04-11 03:26:18 -04:00
Mai Lavelle
1e6038a426 Cycles: Implement automatic global size for CUDA split kernel
Not sure this is the best way to do things for CUDA but its much better than
being unimplemented.
2017-04-11 03:11:18 -04:00
Sergey Sharybin
9539cfacca Cycles: Apparently board name could be an empty string 2017-04-10 15:31:21 +02:00
Sergey Sharybin
867d311307 Cycles: Fix warning with MSVC 2017-04-07 18:28:38 +02:00
Mai Lavelle
91b9db0724 Cycles: Change work pool and global size of split CPU for easier debugging 2017-04-07 06:06:08 -04:00
Mai Lavelle
5b45fff136 Cycles: Add missing flush 2017-04-07 06:06:08 -04:00
Mai Lavelle
d66ffaebef Cycles: Check ray state properly to avoid endless loop
The state mask wasnt applied before comparison giving false results. It
shouldnt really happen that a ray state contains any flags that need to
be masked away, but if it does happen its better to not get stuck.
2017-04-07 06:06:08 -04:00
Mai Lavelle
4b7d95290f Cycles: More fixes after include changes 2017-03-31 10:12:13 +02:00
Sergey Sharybin
a88801b99b Cycles: Fix missing kernel re-compilation after recent changes
Reported by Mai in IRC, thanks!
2017-03-30 11:45:30 +02:00
Sergey Sharybin
5af4e1ca15 Cycles: Only use CUDA 8.0 as officially supported one
This deprecates CUDA 7.5.
2017-03-29 15:06:47 +02:00
Sergey Sharybin
0579eaae1f Cycles: Make all #include statements relative to cycles source directory
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.

For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.

Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.

This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.

Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.

Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner

Reviewed By: lukasstockner97, maiself, nirved, dingto

Subscribers: brecht

Differential Revision: https://developer.blender.org/D2586
2017-03-29 13:41:11 +02:00
Thomas Dinges
6a5e92c022 Cleanup: Use upper case consistently in adaptive feature compile logging. 2017-03-27 22:52:33 +02:00
Sergey Sharybin
8d48ea0233 Cycles: Make shadow catcher an optional feature for OpenCL
Solves majority of speed regression on AMD OpenCL.
2017-03-27 10:47:14 +02:00
Mai Lavelle
4d82d525f8 Cycles: Fix building for some compilers 2017-03-23 00:14:48 -04:00
Sergey Sharybin
a0f16e12a0 Cycles: Use more friendly GPU device name for AMD cards
For example, for RX480 you'll no longer see "Ellesmere" but will see
"AMD Radeon RX 480 Graphics" which makes more sense and allows to easily
distinguish which exact card it is when having multiple different cards
of Ellesmere codenames (i.e. RX480 and WX7100) in the same machine.
2017-03-21 12:01:11 +01:00
Sergey Sharybin
7780a108b3 Cycles: Simplify some extra OpenCL query code 2017-03-21 12:01:03 +01:00
Sergey Sharybin
fceb1d0781 Cycles: Cleanup, add some utility functions to shorten access to low level API
Should be no functional changes.
2017-03-21 12:01:03 +01:00
Sergey Sharybin
3c4df13924 Fix T50268: Cycles allows to select un supported GPUs for OpenCL 2017-03-20 15:37:27 +01:00
Sergey Sharybin
439a277aa5 Cycles: Silence strict compiler warning 2017-03-17 09:56:44 +01:00
Mai Lavelle
2cae58524c Cycles: Improve memory usage of CPU split kernel by using smaller global size 2017-03-17 01:54:10 -04:00
Mai Lavelle
4833a71621 Cycles: Adjust global size for OpenCL CPU devices to make them faster 2017-03-16 06:11:42 -04:00
Sergey Sharybin
5ba51de84a Cycles: Cleanup, indentation 2017-03-14 16:54:16 +01:00
Mai Lavelle
8dd0355c21 Cycles: Try to avoid infinite loops by catching invalid ray states 2017-03-14 06:22:57 -04:00
Mai Lavelle
96868a3941 Fix T50888: Numeric overflow in split kernel state buffer size calculation
Overflow led to the state buffer being too small and the split kernel to
get stuck doing nothing forever.
2017-03-11 05:39:28 -05:00
Hristo Gueorguiev
9de9f25b24 Cycles: add single program debug option for split kernel
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
2017-03-09 17:09:37 +01:00
Hristo Gueorguiev
06c051363b Cycles: split kernel_shadow_blocked to AO & DL parts
Reduces memory allocation for split kernel.

This allows for faster rendering due to bigger global size,
specially when GPU memory is limited.

Perfromance results:

                         R9 290 total render time
                        Before    After   Change
BMW                      4:37      4:34   -1.1 %
Classroom               14:43     14:30   -1.5 %
Fishy Cat               11:20     11:04   -2.4 %
Koro                    12:11     12:04   -1.0 %
Pabellon Barcelona      22:01     20:44   -5.8 %
Pabellon Barcelona(*)   15:32     15:09   -2.5 %

(*) without glossy connected to volume
2017-03-09 17:09:37 +01:00
Hristo Gueorguiev
57e26627c4 Cycles: SSS and Volume rendering in split kernel
Decoupled ray marching is not supported yet.

Transparent shadows are always enabled for volume rendering.

Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
2017-03-09 17:09:37 +01:00
Sergey Sharybin
97c4c2689f Cycles: Make it more obvious message which initialization failed 2017-03-08 13:57:21 +01:00
Sergey Sharybin
ecfbfe478b Cycles: Log which device kernels are being loaded for 2017-03-08 12:33:51 +01:00
Sergey Sharybin
712f7c3640 Cycles: Make it possible to access KernelGlobals from split data initialization function 2017-03-08 11:02:54 +01:00
Sergey Sharybin
ef7c36f5ed Cycles: Cleanup, remove residue of previous split kernel data
This is all in split data state array.
2017-03-08 10:26:29 +01:00
Mai Lavelle
64751552f7 Cycles: Fix indentation 2017-03-08 01:31:32 -05:00
Mai Lavelle
306034790f Cycles: Calculate size of split state buffer kernel side
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.

Will also be a little faster in some cases because of larger global work size.
2017-03-08 01:31:30 -05:00
Mai Lavelle
997e345bd2 Cycles: Fix crash after failed kernel build
Pointers to kernels were uninitialized leading to freeing of random memory
addresses. Another reason it would be good to use smart pointers.
2017-03-08 01:31:09 -05:00
Mai Lavelle
18e50927f7 Cycles: Faster building of split kernel
Simple change to make it so that only kernels that have been modified are
rebuilt. Might only be useful during development.
2017-03-08 01:31:09 -05:00
Mai Lavelle
cd7d5669d1 Cycles: Remove sum_all_radiance kernel
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.

Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
2017-03-08 01:31:07 -05:00
Mai Lavelle
4cf501b835 Cycles: Split path initialization into own kernel
This makes it easier to initialize things correctly in the data_init kernel
before they are needed by path tracing.
2017-03-08 01:30:43 -05:00
Mai Lavelle
b78e543af9 Cycles: Add names to buffer allocations
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.
2017-03-08 01:24:55 -05:00
Mai Lavelle
817873cc83 Cycles: CUDA implementation of split kernel 2017-03-08 01:24:53 -05:00
Mai Lavelle
0892352bfe Cycles: CPU implementation of split kernel 2017-03-08 00:52:41 -05:00
Sergey Sharybin
a87766416f Cycles: Report device maximum allocation and detected global size 2017-03-08 00:52:41 -05:00
Mai Lavelle
365a4239c5 Cycles: Workaround for driver hangs
Simple workaround for some issues we've been having with AMD drivers hanging
and rendering systems unresponsive. Unfortunately this makes things a bit
slower, but its better than having to do hard reboots. Will be removed when
drivers have been fixed.

Define CYCLES_DISABLE_DRIVER_WORKAROUNDS to disable for testing purposes.
2017-03-08 00:52:41 -05:00
Mai Lavelle
230c00d872 Cycles: OpenCL split kernel refactor
This does a few things at once:

- Refactors host side split kernel logic into a new device
  agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
  allows as many threads as will fit in memory regardless of tile size, which
  can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
  number of arguments passed to kernels. Means there's less code to deal
  with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
  other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
2017-03-08 00:52:41 -05:00
Mai Lavelle
520b53364c Cycles: Add OpenCL kernel for zeroing memory buffers
Transferring memory to the device was very slow and there's really no
need when only zeroing a buffer.
2017-03-08 00:52:41 -05:00
Mai Lavelle
bc652766e8 Cycles: Expose passes size to device tasks
This is needed so devices can know the size of a tile buffer before any
tiles are acquired.
2017-03-08 00:52:41 -05:00
Mai Lavelle
0f56f7a811 Cycles: Allow device_memory to be used directly
This is useful for when theres no host side memory attched to the buffer
2017-03-08 00:52:41 -05:00
Sergey Sharybin
5acac13eb4 Cycles: Fix compilation error on vanilla Ubuntu 16.10
Patch by @swerner, thanks!
2017-02-27 15:22:51 +01:00
Sergey Sharybin
2c30fd83f1 Cycles: Additionally report all OpenCL cflags
This way we can control exact spaces and such added to the cflags
which is crucial to troubleshoot certain drivers.
2017-02-22 10:06:02 +01:00
Sergey Sharybin
333dc8d60f Fix T50719: Memory usage won't reset to zero while re-rendering on two video cards
Was only visible with Persistent Images option ON.
2017-02-20 11:02:19 +01:00
Aaron Carlisle
e5d8c2a67f Use new manual URL 2017-01-23 19:10:37 -05:00