Commit Graph

22 Commits

Author SHA1 Message Date
George Kyriazis
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
Sergey Sharybin
6fc1669679 Cycles: Initial work towards selective nodes support compilation
The goal is to be able to compile kernel with nodes which are actually needed
to render current scene, hence improving performance of the kernel,

The idea is:

- Have few node groups, starting with a group which contains nodes are used
  really often, and then couple of groups which will be extension of this one.

- Have feature-based nodes disabling, so it's possible to disable nodes related
  to features which are not used with the currently used nodes group.

This commit only lays down needed routines for this approach, actual split will
happen later after gathering statistics from bunch of production scenes.
2015-05-09 19:22:16 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Campbell Barton
e2522b4a29 Cycles: correct math wrappers
include the parens around value before cast,
in some cases was causing double/float promotion by only casting the left value.
2014-10-08 00:13:26 +02:00
741f17f05b Cycles CUDA: make CUDA toolkit 6.0 the official supported version.
This also updates the configurations to build kernels for compute capability
5.0 cards, when using and older CUDA toolkit version this will be skipped.

Also includes tweaks to improve performance with this version:
* Increase max registers on sm_30, sm_35 and sm_50
* No longer use texture storage on sm_30
2014-04-30 16:07:27 +02:00
d9e52ac98b Code cleanup: move half float functions to separate header file. 2014-01-15 15:29:22 +01:00
c18712e868 Cycles: change __device and similar qualifiers to ccl_device in kernel code.
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.

Ref T37477.
2013-11-18 08:48:15 +01:00
Brecht Van Lommel
fa352bb749 Fix #35684: cycles unable to use full 6GB of memory on NVidia Titan GPU. We now
use arrays instead of textures for general storage on this card (image textures
are still stored as texture). Textures were found to be faster on older cards,
but the limits on 1D texture size have not increased along with the memory size,
which meant that the full 6 GB could not be used.

The performance actually seems to be slightly better with arrays in some tests
on Titan. For older cards there seems to be a bit of a mix, some are better and
others not. We may change those to use arrays too, but more testing is needed,
only Titan and Tesla K20 (sm_35) is changed for now.

The fact that arrays are faster is a bit surprising, as others found textures
to be faster on Kepler. However even if they were, the memory limitation is
more important to solve anyway.
https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum
2013-09-27 19:09:31 +00:00
Brecht Van Lommel
29f6616d60 Cycles: viewport render now takes scene color management settings into account,
except for curves, that's still missing from the OpenColorIO GLSL shader.

The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
2013-08-30 23:49:38 +00:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Brecht Van Lommel
de9dffc61e Cycles: initial subsurface multiple scattering support. It's not working as
well as I would like, but it works, just add a subsurface scattering node and
you can use it like any other BSDF.

It is using fully raytraced sampling compatible with progressive rendering
and other more advanced rendering algorithms we might used in the future, and
it uses no extra memory so it's suitable for complex scenes.

Disadvantage is that it can be quite noisy and slow. Two limitations that will
be solved are that it does not work with bump mapping yet, and that the falloff
function used is a simple cubic function, it's not using the real BSSRDF
falloff function yet.

The node has a color input, along with a scattering radius for each RGB color
channel along with an overall scale factor for the radii.

There is also no GPU support yet, will test if I can get that working later.

Node Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#BSSRDF

Implementation notes:
http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/Subsurface_Scattering
2013-04-01 20:26:52 +00:00
Brecht Van Lommel
40b05d364e Cycles: code refactoring to add generic lookup table memory. 2013-04-01 20:26:43 +00:00
Brecht Van Lommel
12117a8187 Fix cycles aliasing warnings caused by motion blur transforms. 2012-12-21 10:26:48 +00:00
Thomas Dinges
a7c68afc72 Cycles / CUDA:
* Remove double declaration of cosf.
2012-05-28 23:51:06 +00:00
Brecht Van Lommel
131de4352b Cycles: fixes to make CUDA 4.2 work, compiling gave errors in shadows and
other places, was mainly due to instancing not working, but also found
issues in procedural textures.

The problem was with --use_fast_math, this seems to now have way lower
precision for some operations. Disabled this flag and selectively use
fast math functions. Did not find performance regression on GTX 460 after
doing this.
2012-05-28 19:21:13 +00:00
Brecht Van Lommel
dd9c1b7fbf Cycles: OpenCL image texture support, fix an attribute node issue and refactor
feature enabling #defines a bit.
2012-05-13 12:32:44 +00:00
Brecht Van Lommel
5873301257 Sample as Lamp option for world shaders, to enable multiple importance sampling.
By default lighting from the world is computed solely with indirect light
sampling. However for more complex environment maps this can be too noisy, as
sampling the BSDF may not easily find the highlights in the environment map
image. By enabling this option, the world background will be sampled as a lamp,
with lighter parts automatically given more samples.

Map Resolution specifies the size of the importance map (res x res). Before
rendering starts, an importance map is generated by "baking" a grayscale image
from the world shader. This will then be used to determine which parts of the
background are light and so should receive more samples than darker parts.
Higher resolutions will result in more accurate sampling but take more setup
time and memory.

Patch by Mike Farnsworth, thanks!
2012-01-20 17:49:17 +00:00
Brecht Van Lommel
47853bf6f6 Cycles: OpenCL tweaks
* Reduce kernel arguments size, helps compile for apple nvidia.
* Fix use of unitialized variable in displace kernel.
* Use build flags in opencl kernel md5 hash.
* Reorganize code for kernel feature #defines a bit.
2011-11-22 13:15:19 +00:00
Brecht Van Lommel
cfbd6cf154 Cycles:
* OpenCL now only uses GPU/Accelerator devices, it's only confusing if CPU
  device is used, easy to enable in the code for debugging.
* OpenCL kernel binaries are now cached for faster startup after the first
  time compiling.
* CUDA kernels can now be compiled and cached at runtime if the CUDA toolkit
  is installed. This means that even if the build does not have CUDA enabled,
  it's still possible to use it as long as you install the toolkit.
2011-09-09 12:04:39 +00:00
Brecht Van Lommel
bae896691a Cycles:
* Add alpha pass output, to use set Transparent option in Film panel.
* Add Holdout closure (OSL terminology), this is like the Sky option in the
  internal renderer, objects with this closure show the background / zero
  alpha.
* Add option to use Gaussian instead of Box pixel filter in the UI.
* Remove camera response curves for now, they don't really belong here in
  the pipeline, should be moved to compositor.

* Output full float values for rendering now, previously was only byte precision.
* Add a patch from Thomas to get a preview passes option, but still disabled
  because it isn't quite working right yet.
* CUDA: don't compile shader graph evaluation inline.
* Convert tabs to spaces in python files.
2011-08-28 13:55:59 +00:00
Brecht Van Lommel
63d4bafff5 Cycles: some steps to getting OpenCL backend to compile. 2011-05-20 12:26:01 +00:00
Ton Roosendaal
da376e0237 Cycles render engine, initial commit. This is the engine itself, blender modifications and build instructions will follow later.
Cycles uses code from some great open source projects, many thanks them:

* BVH building and traversal code from NVidia's "Understanding the Efficiency of Ray Traversal on GPUs":
http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/
* Open Shading Language for a large part of the shading system:
http://code.google.com/p/openshadinglanguage/
* Blender for procedural textures and a few other nodes.
* Approximate Catmull Clark subdivision from NVidia Mesh tools:
http://code.google.com/p/nvidia-mesh-tools/
* Sobol direction vectors from:
http://web.maths.unsw.edu.au/~fkuo/sobol/
* Film response functions from:
http://www.cs.columbia.edu/CAVE/software/softlib/dorf.php
2011-04-27 11:58:34 +00:00