Commit Graph

396 Commits

Author SHA1 Message Date
Hristo Gueorguiev
06c051363b Cycles: split kernel_shadow_blocked to AO & DL parts
Reduces memory allocation for split kernel.

This allows for faster rendering due to bigger global size,
specially when GPU memory is limited.

Perfromance results:

                         R9 290 total render time
                        Before    After   Change
BMW                      4:37      4:34   -1.1 %
Classroom               14:43     14:30   -1.5 %
Fishy Cat               11:20     11:04   -2.4 %
Koro                    12:11     12:04   -1.0 %
Pabellon Barcelona      22:01     20:44   -5.8 %
Pabellon Barcelona(*)   15:32     15:09   -2.5 %

(*) without glossy connected to volume
2017-03-09 17:09:37 +01:00
Hristo Gueorguiev
57e26627c4 Cycles: SSS and Volume rendering in split kernel
Decoupled ray marching is not supported yet.

Transparent shadows are always enabled for volume rendering.

Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
2017-03-09 17:09:37 +01:00
Sergey Sharybin
97c4c2689f Cycles: Make it more obvious message which initialization failed 2017-03-08 13:57:21 +01:00
Sergey Sharybin
ecfbfe478b Cycles: Log which device kernels are being loaded for 2017-03-08 12:33:51 +01:00
Sergey Sharybin
712f7c3640 Cycles: Make it possible to access KernelGlobals from split data initialization function 2017-03-08 11:02:54 +01:00
Sergey Sharybin
ef7c36f5ed Cycles: Cleanup, remove residue of previous split kernel data
This is all in split data state array.
2017-03-08 10:26:29 +01:00
Mai Lavelle
64751552f7 Cycles: Fix indentation 2017-03-08 01:31:32 -05:00
Mai Lavelle
306034790f Cycles: Calculate size of split state buffer kernel side
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.

Will also be a little faster in some cases because of larger global work size.
2017-03-08 01:31:30 -05:00
Mai Lavelle
997e345bd2 Cycles: Fix crash after failed kernel build
Pointers to kernels were uninitialized leading to freeing of random memory
addresses. Another reason it would be good to use smart pointers.
2017-03-08 01:31:09 -05:00
Mai Lavelle
18e50927f7 Cycles: Faster building of split kernel
Simple change to make it so that only kernels that have been modified are
rebuilt. Might only be useful during development.
2017-03-08 01:31:09 -05:00
Mai Lavelle
cd7d5669d1 Cycles: Remove sum_all_radiance kernel
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.

Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
2017-03-08 01:31:07 -05:00
Mai Lavelle
4cf501b835 Cycles: Split path initialization into own kernel
This makes it easier to initialize things correctly in the data_init kernel
before they are needed by path tracing.
2017-03-08 01:30:43 -05:00
Mai Lavelle
b78e543af9 Cycles: Add names to buffer allocations
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.
2017-03-08 01:24:55 -05:00
Mai Lavelle
817873cc83 Cycles: CUDA implementation of split kernel 2017-03-08 01:24:53 -05:00
Mai Lavelle
0892352bfe Cycles: CPU implementation of split kernel 2017-03-08 00:52:41 -05:00
Sergey Sharybin
a87766416f Cycles: Report device maximum allocation and detected global size 2017-03-08 00:52:41 -05:00
Mai Lavelle
365a4239c5 Cycles: Workaround for driver hangs
Simple workaround for some issues we've been having with AMD drivers hanging
and rendering systems unresponsive. Unfortunately this makes things a bit
slower, but its better than having to do hard reboots. Will be removed when
drivers have been fixed.

Define CYCLES_DISABLE_DRIVER_WORKAROUNDS to disable for testing purposes.
2017-03-08 00:52:41 -05:00
Mai Lavelle
230c00d872 Cycles: OpenCL split kernel refactor
This does a few things at once:

- Refactors host side split kernel logic into a new device
  agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
  allows as many threads as will fit in memory regardless of tile size, which
  can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
  number of arguments passed to kernels. Means there's less code to deal
  with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
  other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
2017-03-08 00:52:41 -05:00
Mai Lavelle
520b53364c Cycles: Add OpenCL kernel for zeroing memory buffers
Transferring memory to the device was very slow and there's really no
need when only zeroing a buffer.
2017-03-08 00:52:41 -05:00
Mai Lavelle
bc652766e8 Cycles: Expose passes size to device tasks
This is needed so devices can know the size of a tile buffer before any
tiles are acquired.
2017-03-08 00:52:41 -05:00
Mai Lavelle
0f56f7a811 Cycles: Allow device_memory to be used directly
This is useful for when theres no host side memory attched to the buffer
2017-03-08 00:52:41 -05:00
Sergey Sharybin
5acac13eb4 Cycles: Fix compilation error on vanilla Ubuntu 16.10
Patch by @swerner, thanks!
2017-02-27 15:22:51 +01:00
Sergey Sharybin
2c30fd83f1 Cycles: Additionally report all OpenCL cflags
This way we can control exact spaces and such added to the cflags
which is crucial to troubleshoot certain drivers.
2017-02-22 10:06:02 +01:00
Sergey Sharybin
333dc8d60f Fix T50719: Memory usage won't reset to zero while re-rendering on two video cards
Was only visible with Persistent Images option ON.
2017-02-20 11:02:19 +01:00
Aaron Carlisle
e5d8c2a67f Use new manual URL 2017-01-23 19:10:37 -05:00
lazydodo
5a8b5a0377 Land D2339 by bliblu bli 2016-12-09 08:28:04 -07:00
Lukas Stockner
a2ebc5268f Cycles: Refactor Progress system to provide better estimates
The Progress system in Cycles had two limitations so far:
 - It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image.
 - Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased.

This patch fixes both problems:
First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time.
The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samples*num_tiles, the code now counts every sample for every pixel, so that the final value is num_samples*num_pixels.

Along with that, some unused variables were removed from the Progress and Session classes.

Reviewers: brecht, sergey, #cycles

Subscribers: brecht, candreacchio, sergey

Differential Revision: https://developer.blender.org/D2214
2016-12-03 05:02:21 +01:00
Sergey Sharybin
9aa8d1bc45 Cycles: Fix strict compilation warnings
Should be no functional changes.
2016-11-22 16:39:03 +01:00
Sergey Sharybin
af7343ae22 Cycles: Attempt to fix compilation error on ppc64el
There is some define conflict between system headers and clew,
so delay include of clew.h as much as possible.]

This is something which needed to be done in the code before
the refactor, hopefully such change will still work.
2016-11-21 13:32:41 +01:00
Lukas Stockner
dd921238d9 Cycles: Refactor Device selection to allow individual GPU compute device selection
Previously, it was only possible to choose a single GPU or all of that type (CUDA or OpenCL).
Now, a toggle button is displayed for every device.
These settings are tied to the PCI Bus ID of the devices, so they're consistent across hardware addition and removal (but not when swapping/moving cards).

From the code perspective, the more important change is that now, the compute device properties are stored in the Addon preferences of the Cycles addon, instead of directly in the User Preferences.
This allows for a cleaner implementation, removing the Cycles C API functions that were called by the RNA code to specify the enum items.

Note that this change is neither backwards- nor forwards-compatible, but since it's only a User Preference no existing files are broken.

Reviewers: #cycles, brecht

Reviewed By: #cycles, brecht

Subscribers: brecht, juicyfruit, mib2berlin, Blendify

Differential Revision: https://developer.blender.org/D2338
2016-11-07 03:19:29 +01:00
Martijn Berger
c02cce7b75 cycles, cuDeviceComputeCapability is deprecated as of cuda 5.0 2016-11-04 14:49:54 +01:00
Martijn Berger
4fdf68271c Cycles standalone, compile fix UINT_MAX is not defined in device_cuda.cpp 2016-11-02 10:56:16 +01:00
Sergey Sharybin
80a6e5beb5 Cycles: Remove explicit std:: from types where possible
We have our own abstraction level on top of the STL's implementation.
This commit will guarantee our tweaks are used for all cases.
2016-10-24 12:31:11 +02:00
Sergey Sharybin
48997d2e40 Cycles: Cleanup, style 2016-10-24 12:26:12 +02:00
Lukas Stockner
f7ce482385 Cycles: Fix another OpenCL logging issue
Previously an error message would be printed whenever the OpenCL build produced output.
However, some frameworks seem to print extra information even if the build succeeded, so now the actual returned error is checked as well.
When --debug-cycles is activated, the build output will always be printed, otherwise it only gets printed if there was an error.
2016-10-21 02:49:00 +02:00
Lukas Stockner
cd843409d3 Fix T49630: Cycles: Swapped shader and bake kernels
The problem here was, as the title says, that the two kernels were swapped.
Since shader evaluation is only used for building the samling map when World MIS is enabled, rendering without it would still work fine, although baking also was broken.
2016-10-17 12:28:01 +02:00
Lukas Stockner
d5dd12e56c Cycles: Improve OpenCL kernel compilation logging
The previous refactor changed the code to use a separate logging mechanism to support multithreaded compilation.
However, since that's not supported by any frameworks yes, it just resulted in bad logging behaviour.
So, this commit changes the logging to go diectly to stdout/stderr once again by default.
2016-10-17 11:51:18 +02:00
Lukas Stockner
9ea71bc674 Cycles: Split device_opencl.cpp into multiple files for easier maintenance
There are no user-visible changes, just some internal restructuring.

Differential Revision: https://developer.blender.org/D2231
2016-10-09 15:49:50 +02:00
Sergey Sharybin
80837d06de Cycles: Support earlier tile rendering termination on cancel
It will discard the whole tile, but it's still kind of more friendly than
fully locked interface (sort of) for until tile is fully sampled.

Sorry if it causes PITA to merge for the opencl split work, but this issue
bothering a lot when collecting benchmarks.
2016-09-29 16:00:25 +02:00
Sergey Sharybin
333366dbcf Cycles: Fix typo in shader cancel routines 2016-09-29 15:48:10 +02:00
Sergey Sharybin
2372e67dd6 Cycles: Don't sum up memory usage of all devices together for the stats 2016-09-23 12:43:23 +02:00
Sergey Sharybin
91e0a16f2f Cycles: Use XDG's .cache folder for cached kernels
Basically just moves cached kernels from ~/.config/blender/BLENDER_VERSION to
~/.cache/cycles/kernels. This has following benefits:

- Follows XDG specification more closely,
  not as if it's totally crucial or measurable by users, but still nice.

- Prevents unexpected sizes of config folder, makes disk space used in more
  predictable for users way.

- Allows to share kernels across multiple Blender versions,
  which makes it easier debugging at the times close to release.

- "Copy Previous Settings" operator will no longer be copying possibly
  gigabytes of cached kernels, which used to lead to really nast disk usage
  and annoying delays of copying settings.

- In the future we can have some smart logic to clear old unused cached
  kernels.

Currently only done for Linux and OSX. Windows still follows old "cache"
folder logic, but it's not really important for now because we don't
support kernel compilation on this platform yet.

Reviewers: dingto, juicyfruit, brecht

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2197
2016-09-12 09:39:05 +02:00
Mai Lavelle
76b6c77f2c Cycles microdisplacement: Allow kernels to be built without patch evaluation
Kernels can now be built without patch evaluation when not needed by the
scene (Catmull-Clark subdivision not in use), giving a performance boost
for some devices.
2016-08-15 11:13:18 -04:00
Thomas Dinges
9d236ac06c Cycles: Enable half float support (4 channels and 1 channel) on CUDA.
Atm OpenEXR half files benefit from this and will use only 1/2 of the memory now. More space for HDRs!

Part of my GSoC 2016.
2016-08-11 22:47:53 +02:00
Thomas Dinges
c2a7317d1f CUDA: We don't support Toolkits < 7.5, update error message. 2016-08-09 11:41:25 +02:00
Sergey Sharybin
29dc04d9bb Cycles: Report human-readable string of compilation error code
It is possible that compilation will fail without giving anything in the
log buffer. For this cases giving a tip about error code will be really
handy.

Patch by @Ilia, thanks!
2016-08-04 12:14:43 +02:00
Sergey Sharybin
b416168d85 Cycles: Cleanup, trailing whitespace 2016-08-02 14:09:34 +02:00
Sergey Sharybin
7b8b16a18c Cycles: Some cleanup in CUDA device file 2016-08-02 14:09:34 +02:00
Sergey Sharybin
ad48f13099 Cycles: Include NVCC compiler flags into md5 hash
This way we can easily switch between toolkits without worrying
whether some kernel was compiled with old or new CUDA toolkit.

It's also now possible to switch machine architecture and have
proper cached kernel detected. Not as if it happens every day,
but i did such a bitness switch back in the days :)
2016-08-02 14:09:34 +02:00
Sergey Sharybin
6353ecb996 Cycles: Tweaks to support CUDA 8 toolkit
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.

This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.

On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
2016-08-01 15:54:29 +02:00