Commit Graph

40 Commits

Author SHA1 Message Date
Sergey Sharybin
45b5bf034b Cycles; Make baking a feature-specific option
This means render devices now might skip building baking kernels in cases when
only actual render-related functionality is used.

For now it's only implemented for OpenCL split kernel device and mainly needed
to work around some compiler-specific bugs which crashes on building the kernel.

Using OpenCL for baking might still crash the driver, but at least there is now
higher probability of that GPU will be usable to render the scene.

Real fix should actually be done in the driver side.
2015-07-18 16:02:08 +02:00
Sergey Sharybin
27ed75271c Cycles: Make hair, object and motion blur selective compiled into OpenCL
This features are now based on the scene settings, so scenes without those features
used are rendered even faster.

This gives about 30% speedup on the AMD A10 APU here, but at the same time it does
not mean such an improvement will happen on all the hardware. That being said, the
Tonga device here seems to have no measurable difference.

In any case it seems handy to have for the future, when we'll want to support SSS
in the kernel or to port selective compilation/split kernel to CUDA devices.
2015-06-08 11:15:39 +02:00
Antony Riakiotakis
4fc3188112 Cycles: Get rid of one more OpenGL matrix manipulation/push/pop. 2015-05-11 16:41:18 +02:00
Antony Riakiotakis
e38f914421 Cycles: use vertex buffers when possible to draw tiles on the screen.
Not terribly necessary in this case, since we are just drawing a quad,
but makes blender overall more GL 3.x core ready.
2015-05-11 16:28:41 +02:00
George Kyriazis
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
Sergey Sharybin
f680c1b54a Cycles: Communicate number of closures and nodes feature set to the device
This way device can actually make a decision of how it can optimize the kernel
in order to make it most efficient.
2015-05-09 19:28:00 +05:00
Sergey Sharybin
0e4ddaadd4 Cycles: Change the way how we pass requested capabilities to the device
Previously we only had experimental flag passed to device's load_kernel() which
was all fine. But since we're gonna to have some extra parameters passed there
it makes sense to wrap them into a single struct, which will make it easier to
pass stuff around.
2015-05-09 19:05:49 +05:00
Martijn Berger
f01456aaa4 Optionally use c++11 stuff instead of boost in cycles where possible. We do and continue to depend on boost though
Reviewers: dingto, sergey

Reviewed By: sergey

Subscribers: #cycles

Differential Revision: https://developer.blender.org/D1185
2015-03-29 22:12:40 +02:00
Sergey Sharybin
585dd26120 Cycles: Code cleanup, prepare for strict C++ flags 2015-03-27 18:23:31 +05:00
Sergey Sharybin
a922be9270 Cycles: Repot CPU and CUDA capabilities to system info operator
For CPU it gives available instructions set (SSE, AVX and so).

For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
2015-01-06 14:13:21 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Dalai Felinto
fc55c41bba Cycles Bake: show progress bar during bake
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.

Reviewers: sergey

Differential Revision: https://developer.blender.org/D656
2014-07-25 11:42:53 -03:00
Thomas Dinges
c08c931fb6 Cycles / CUDA: Increase maximum image textures on GPU.
Instead of 95, we can use 145 images now. This only affects Kepler and above (sm30, sm_35 and sm_50).

This can be increased further if needed, but let's first test if this does not come with a performance impact.

Originally developed during my GSoC 2013.
2014-05-11 03:38:39 +02:00
Sergey Sharybin
74518b2826 Fix T39420: Cycles viewport/preview flickers, when moving mouse across editors
Issue was caused by the wrong usage of OCIO GLSL binding API. To make it
work properly on pre-GLSL-1.3 drivers shader is to be enabled after the
texture is binded to the opengl context. Otherwise it wouldn't know the
proper texture size.

This is actually a regression in 2.70 and to be ported to 'a'.
2014-03-26 15:58:53 +06:00
Martijn Berger
dd2dca2f7e Add support for multiple interpolation modes on cycles image textures
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.

This patch adds user control to this setting.

Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)

Reviewers: brecht, dingto

Reviewed By: brecht

CC: dingto, venomgfx

Differential Revision: https://developer.blender.org/D317
2014-03-07 23:16:33 +01:00
Martijn Berger
85a0c5d4e1 Cycles: network render code updated for latest changes and improved
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.

* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code

This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.

Reviewers: brecht

Differential Revision: http://developer.blender.org/D36
2013-12-07 12:26:58 +01:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Brecht Van Lommel
35c0b821a5 Cycles: deal a bit better with errors when CUDA runs out of memory, try to avoid crashes. 2012-12-23 12:53:58 +00:00
Brecht Van Lommel
204113b791 Fix #33107: cycles fixed threads 1 was still having two cores do work,
because main thread works as well.
2012-11-07 21:00:49 +00:00
Sergey Sharybin
6eec49ed20 Cycles: memory usage report
This commit adds memory usage information while rendering.

It reports memory used by device, meaning:

- For CPU it'll report real memory consumption
- For GPU rendering it'll report GPU memory consumption, but it'll
  also mean the same memory is used from host side.

This information displays information about memory requested by Cycles,
not memory really allocated on a device. Real memory usage might be
higher because of memory fragmentation or optimistic memory allocator.

There's really nothing we can do against this.

Also in contrast with blender internal's render cycles memory usage
does not include memory used by scene, only memory needed by cycles
itself will be displayed. So don't freak out if memory usage reported
by cycles would be much lower than blender internal's.

This commit also adds RenderEngine.update_memory_stats callback which
is used to tell memory consumption from external engine to blender.
This information is used to generate information line after rendering
is finished.
2012-11-05 08:04:57 +00:00
Lukas Toenne
efaf512406 Revert r50528: "Performance fix for Cycles: Don't wait in the main UI thread when resetting devices."
This commit leads to random freezes in Cycles rendering:
https://projects.blender.org/tracker/index.php?func=detail&aid=32545&group_id=9&atid=498

The goal of this commit was to remove UI lag for OSL, but since that is not officially supported yet, better revert it until a proper fix can be implemented in 2.65.
2012-09-17 12:07:06 +00:00
Lukas Toenne
31ed71cb6b Performance fix for Cycles: Don't wait in the main UI thread when resetting devices.
When the scene is updated Cycles resets the renderer device, cancelling
all existing tasks. The main thread would wait for all running tasks to
finish before continuing. This is ok when tasks can actually cancel in a
timely fashion. For OSL however, this does not work, since the OSL
shader group optimization takes quite a bit of time and can not be
easily be cancelled once running (on my crappy machine in full debug
mode: ~0.12 seconds for simple node trees). This would lead to very
laggy UI behavior and make it difficult to accurately control elements
such as sliders.

This patch removes the wait condition from the device->task_cancel
method. Instead it just sets the do_cancel flag and returns. To avoid
backlog in the task pool of the device it will return early from the
BlenderSession::sync function while the reset is going on (tested in
Session::resetting). Once all existing tasks have finished the do_cancel
flag is finally cleared again (checked in TaskPool::num_decrease).

Care has to be taken to avoid race conditions on the do_cancel flag,
since it can now be modified outside the TaskPool::cancel function
itself. For this purpose the scope of the TaskPool::num_mutex locks has
been extended, in most cases the mutex is now locked by the TaskPool
itself before calling TaskScheduler methods, instead of only locking
inside the num_increase/num_decrease functions themselves. The only
occurrence of a lock outside of the TaskPool methods is in
TaskScheduler::thread_run.

This patch is most useful in combination with the OSL renderer mode, so
it can probably wait until after the 2.64 release. SVM tasks tend to be
cancelled quickly, so the effect is less noticeable.
2012-09-11 11:41:51 +00:00
Brecht Van Lommel
adea12cb01 Cycles: merge of changes from tomato branch.
Regular rendering now works tiled, and supports save buffers to save memory
during render and cache render results.

Brick texture node by Thomas.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Brick_Texture

Image texture Blended Box Mapping.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Image_Texture
http://mango.blender.org/production/blended_box/

Various bug fixes by Sergey and Campbell.
* Fix for reading freed memory in some node setups.
* Fix incorrect memory read when synchronizing mesh motion.
* Fix crash appearing when direct light usage is different on different layers.
* Fix for vector pass gives wrong result in some circumstances.
* Fix for wrong resolution used for rendering Render Layer node.
* Option to cancel rendering when doing initial synchronization.
* No more texture limit when using CPU render.
* Many fixes for new tiled rendering.
2012-09-04 13:29:07 +00:00
Brecht Van Lommel
dd9c1b7fbf Cycles: OpenCL image texture support, fix an attribute node issue and refactor
feature enabling #defines a bit.
2012-05-13 12:32:44 +00:00
Brecht Van Lommel
07b2241fb1 Cycles: merging features from tomato branch.
=== BVH build time optimizations ===

* BVH building was multithreaded. Not all building is multithreaded, packing
  and the initial bounding/splitting is still single threaded, but recursive
  splitting is, which was the main bottleneck.

* Object splitting now uses binning rather than sorting of all elements, using
  code from the Embree raytracer from Intel.
  http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/

* Other small changes to avoid allocations, pack memory more tightly, avoid
  some unnecessary operations, ...

These optimizations do not work yet when Spatial Splits are enabled, for that
more work is needed. There's also other optimizations still needed, in
particular for the case of many low poly objects, the packing step and node
memory allocation.

BVH raytracing time should remain about the same, but BVH build time should be
significantly reduced, test here show speedup of about 5x to 10x on a dual core
and 5x to 25x on an 8-core machine, depending on the scene.

=== Threads ===

Centralized task scheduler for multithreading, which is basically the
CPU device threading code wrapped into something reusable.

Basic idea is that there is a single TaskScheduler that keeps a pool of threads,
one for each core. Other places in the code can then create a TaskPool that they
can drop Tasks in to be executed by the scheduler, and wait for them to complete
or cancel them early.

=== Normal ====

Added a Normal output to the texture coordinate node. This currently
gives the object space normal, which is the same under object animation.

In the future this might become a "generated" normal so it's also stable for
deforming objects, but for now it's already useful for non-deforming objects.

=== Render Layers ===

Per render layer Samples control, leaving it to 0 will use the common scene
setting.

Environment pass will now render environment even if film is set to transparent.

Exclude Layers" added. Scene layers (all object that influence the render,
directly or indirectly) are shared between all render layers. However sometimes
it's useful to leave out some object influence for a particular render layer.
That's what this option allows you to do.

=== Filter Glossy ===

When using a value higher than 0.0, this will blur glossy reflections after
blurry bounces, to reduce noise at the cost of accuracy. 1.0 is a good
starting value to tweak.

Some light paths have a low probability of being found while contributing much
light to the pixel. As a result these light paths will be found in some pixels
and not in others, causing fireflies. An example of such a difficult path might
be a small light that is causing a small specular highlight on a sharp glossy
material, which we are seeing through a rough glossy material. With path tracing
it is difficult to find the specular highlight, but if we increase the roughness
on the material the highlight gets bigger and softer, and so easier to find.

Often this blurring will be hardly noticeable, because we are seeing it through
a blurry material anyway, but there are also cases where this will lead to a
loss of detail in lighting.
2012-04-28 08:53:59 +00:00
Brecht Van Lommel
803286dde8 Cycles: render passes for CUDA cards with compute model >= 2.x. 2012-01-26 19:07:01 +00:00
Brecht Van Lommel
d7932ceea8 Cycles: multi GPU rendering support.
The rendering device is now set in User Preferences > System, where you can
choose between OpenCL/CUDA and devices. Per scene you can then still choose
to use CPU or GPU rendering.

Load balancing still needs to be improved, now it just splits the entire
render in two, that will be done in a separate commit.
2012-01-09 16:58:01 +00:00
Brecht Van Lommel
049ab98469 Cycles: device code refactoring, no functional changes. 2012-01-04 18:06:32 +00:00
Brecht Van Lommel
b5595298d3 Cycles code refactoring: change displace kernel into more generic shader
evaluate kernel, added background shader evaluate.
2011-12-31 15:18:13 +00:00
Brecht Van Lommel
690de79580 Cycles: some tweaks for apple opencl with ATI cards, to get it working up to
the level of ambient occlusion render, shaders still fail. Fixes found with
much help from Jens and Dalai.
2011-12-20 17:36:56 +00:00
Brecht Van Lommel
72d2d05770 Cycles: border rendering support, includes some refactoring in how pixels are
accessed on devices.
2011-12-20 12:25:37 +00:00
Brecht Van Lommel
9e01abf777 Cycles: require Experimental to be set to enable CUDA on cards with shader model
lower than 1.3, since we're not officially supporting these. We're already not
providing CUDA binaries for these, so better make it clear when compiling from
source too.
2011-12-12 22:51:35 +00:00
Brecht Van Lommel
086e4ed825 Cycles: improve error reporting for opencl and cuda, showing error messages in
viewport instead of only console.
2011-11-22 20:49:33 +00:00
Brecht Van Lommel
5fd67a3ba5 Cycles: enable multi closure sampling and transparent shadows only on CPU and
CUDA cards with shader model >= 2 for now (GTX 4xx, 5xx, ..). The CUDA compiler
can't handle the increased kernel size currently.
2011-10-16 18:54:27 +00:00
Brecht Van Lommel
60bc63c7b8 Cycles: enable improved closure sampling, this should give less noise for mix, add
and glass shaders. How well this will work on non-fermi GPU's is unclear still, it's
a bit heavy on register usage.
2011-10-16 17:40:47 +00:00
Brecht Van Lommel
66b1dfae89 Cycles: tweaks to properties and nodes
* Passes renamed to samples
* Camera lens radius renamed to aperature size/blades/rotation
* Glass and fresnel nodes input is now index of refraction
* Glossy and velvet fresnel socket removed
* Mix/add closure node renamed to mix/add shader node
* Blend weight node added for shader mixing weights

There is some version patching code for reading existing files, but it's not
perfect, so shaders may work a bit different.
2011-09-16 13:14:02 +00:00
Brecht Van Lommel
3c7dcd7a47 Cycles: compile opencl kernels in non-blocking thread, and don't crash on
build failure but show error message in status text.
2011-09-02 00:10:03 +00:00
Brecht Van Lommel
bae896691a Cycles:
* Add alpha pass output, to use set Transparent option in Film panel.
* Add Holdout closure (OSL terminology), this is like the Sky option in the
  internal renderer, objects with this closure show the background / zero
  alpha.
* Add option to use Gaussian instead of Box pixel filter in the UI.
* Remove camera response curves for now, they don't really belong here in
  the pipeline, should be moved to compositor.

* Output full float values for rendering now, previously was only byte precision.
* Add a patch from Thomas to get a preview passes option, but still disabled
  because it isn't quite working right yet.
* CUDA: don't compile shader graph evaluation inline.
* Convert tabs to spaces in python files.
2011-08-28 13:55:59 +00:00
Brecht Van Lommel
48b4de3152 Cycles:
* auto/fixed threads option is used now, patch by Thomas.
* remove unused CUDA_LIBRARIES, library is dynamically loaded
* fix mesh XML export operator for API update
2011-08-24 10:44:04 +00:00
Ton Roosendaal
da376e0237 Cycles render engine, initial commit. This is the engine itself, blender modifications and build instructions will follow later.
Cycles uses code from some great open source projects, many thanks them:

* BVH building and traversal code from NVidia's "Understanding the Efficiency of Ray Traversal on GPUs":
http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/
* Open Shading Language for a large part of the shading system:
http://code.google.com/p/openshadinglanguage/
* Blender for procedural textures and a few other nodes.
* Approximate Catmull Clark subdivision from NVidia Mesh tools:
http://code.google.com/p/nvidia-mesh-tools/
* Sobol direction vectors from:
http://web.maths.unsw.edu.au/~fkuo/sobol/
* Film response functions from:
http://www.cs.columbia.edu/CAVE/software/softlib/dorf.php
2011-04-27 11:58:34 +00:00