Commit Graph

52 Commits

Author SHA1 Message Date
Sergey Sharybin
a922be9270 Cycles: Repot CPU and CUDA capabilities to system info operator
For CPU it gives available instructions set (SSE, AVX and so).

For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
2015-01-06 14:13:21 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Bastien Montagne
c14d34322b Fix typo breaking compilation with SSE2.
Spotted by sybrenstuvel (Sybren Stüvel), thanks!
2014-11-02 23:01:09 +01:00
Martijn Berger
4b33667b93 Deduplicate some code by using a function pointer to the real kernel
This has no performance impact what so ever and is already used in the adaptive sampling patch
2014-10-30 10:23:44 +01:00
Sergey Sharybin
cd6129d1ff Cycles: Workaround dead-slow expf() on 64bit linux
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.

Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.

Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
2014-10-06 12:36:46 +02:00
Sergey Sharybin
fbed2047c8 Fix wrong track of the memory when doing device vector resize before freeing it
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.

Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
2014-09-04 17:25:12 +06:00
Dalai Felinto
8d3cc431d7 Fix T41471 Cycles Bake: Setting small tile size results in wrong bake with stripes rather than the expected noise pattern
This problem was introduced in 983cbafd1877f8dbaae60b064a14e27b5b640f18
Basically the issue is that we were not getting a unique index in the
baking routine for the RNG (random number generator).

Reviewers: sergey

Differential Revision: https://developer.blender.org/D749
2014-08-19 11:40:33 +02:00
Dalai Felinto
fc55c41bba Cycles Bake: show progress bar during bake
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.

Reviewers: sergey

Differential Revision: https://developer.blender.org/D656
2014-07-25 11:42:53 -03:00
Thomas Dinges
866c7fb6e6 Cycles: Add an AVX2 CPU kernel.
This kernel is compiled with AVX2, FMA3, and BMI compiler flags. At the moment only Intel Haswell benefits from this, but future AMD CPUs will have these instructions as well.

Makes rendering on Haswell CPUs a few percent faster, only benchmarked with clang on OS X though.

Part of my GSoC 2014.
2014-06-13 22:26:20 +02:00
e4e58d4612 Fix T40370: cycles CUDA baking timeout with high number of AA samples.
Now baking does one AA sample at a time, just like final render. There is
also some code for shader antialiasing that solves T40369 but it is disabled
for now because there may be unpredictable side effects.
2014-06-06 15:39:04 +02:00
0075efc4d2 Fix T40306: cycles baking not distributing work among CPU cores well. 2014-05-26 13:51:11 +02:00
Dalai Felinto
eec3eaba08 Cycles Bake
Expand Cycles to use the new baking API in Blender.

It works on the selected object, and the panel can be accessed in the Render panel (similar to where it is for the Blender Internal).

It bakes for the active texture of each material of the object. The active texture is currently defined as the active Image Texture node present in the material nodetree. If you don't want the baking to override an existent material, make sure the active Image Texture node is not connected to the nodetree. The active texture is also the texture shown in the viewport in the rendered mode.

Remember to save your images after the baking is complete.

Note: Bake currently only works in the CPU
Note: This is not supported by Cycles standalone because a lot of the work is done in Blender as part of the operator only, not the engine (Cycles).

Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Bake

Supported Passes:
-----------------
Data Passes
 * Normal
 * UV
 * Diffuse/Glossy/Transmission/Subsurface/Emit Color

Light Passes
 * AO
 * Combined
 * Shadow
 * Diffuse/Glossy/Transmission/Subsurface/Emit Direct/Indirect
 * Environment

Review: D421
Reviewed by: Campbell Barton, Brecht van Lommel, Sergey Sharybin, Thomas Dinge

Original design by Brecht van Lommel.

The entire commit history can be found on the branch: bake-cycles
2014-05-02 21:19:09 -03:00
a2e4ebd36a Cycles code internals: add CPU kernel support for 3D image textures. 2014-03-29 13:03:48 +01:00
Martijn Berger
dd2dca2f7e Add support for multiple interpolation modes on cycles image textures
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.

This patch adds user control to this setting.

Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)

Reviewers: brecht, dingto

Reviewed By: brecht

CC: dingto, venomgfx

Differential Revision: https://developer.blender.org/D317
2014-03-07 23:16:33 +01:00
Thomas Dinges
de28a4d4b2 Cycles: Add an AVX kernel for CPU rendering.
* AVX is available on Intel Sandy Bridge and newer and AMD Bulldozer and newer.
* We don't use dedicated AVX intrinsics yet, but gcc auto vectorization gives a 3% performance improvement for Caminandes. Tested on an i5-3570, Linux x64.
* No change for Windows yet, MSVC 2008 does not support AVX.

Reviewed by: brecht
Differential Revision: https://developer.blender.org/D216
2014-01-16 17:04:11 +01:00
Thomas Dinges
9351ac0d85 Cycles: Skip the compilation of the dedicated SSE2 kernel on x86-64, we can assume SSE2 here, so just re-use the regular one. Saves 500kb in the blender binary.
Reviewed by: brecht
Differential Revision: https://developer.blender.org/D199
2014-01-14 20:39:54 +01:00
Thomas Dinges
ce6dce3b13 Code cleanup / Cycles: else/if for SSE41 kernel functions. 2014-01-06 03:22:14 +01:00
Martijn Berger
85a0c5d4e1 Cycles: network render code updated for latest changes and improved
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.

* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code

This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.

Reviewers: brecht

Differential Revision: http://developer.blender.org/D36
2013-12-07 12:26:58 +01:00
Martijn Berger
e3a79258d1 Cycles: test code for sse 4.1 kernel and alignment for some vector types.
This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.

WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.

Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
2013-11-22 14:42:41 +01:00
Campbell Barton
48c1e0c0fc spelling: use American spelling for canceled 2013-10-26 01:06:19 +00:00
Brecht Van Lommel
29f6616d60 Cycles: viewport render now takes scene color management settings into account,
except for curves, that's still missing from the OpenColorIO GLSL shader.

The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
2013-08-30 23:49:38 +00:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Thomas Dinges
9732c6283e Cycles / CPU Rendering:
* "Auto Detect" now again uses the umber of cores, instead number of cores + 1.

This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.
2013-07-20 00:40:03 +00:00
Thomas Dinges
11707119de Cycles:
* Code cleanup, remove unused "resolution" variable from the DeviceTask class, was never used.
2013-05-14 21:18:20 +00:00
Thomas Dinges
a239700f43 Cycles:
* Code cleanup, remove deprecated support_advanced_shading() functions. Left over from r43734.
2013-02-21 17:10:14 +00:00
Brecht Van Lommel
d095bcc8aa Fix cycles not using SSE3 kernel after recent, order with SSE2 should be switched,
pointed out by Chad Fraleigh.
2013-02-12 14:58:46 +00:00
Brecht Van Lommel
7c9d993347 Fix cycles intersection issue with overlapping faces on windows 32 bit and CPU
without SSE3 support, due to 80 bit precision float register being used for one
bounding box but not the one next to it.
2013-02-04 16:12:37 +00:00
Brecht Van Lommel
7c0a0bae79 Fix #33375: OSL geom:trianglevertices gave wrong coordinates for static BVH.
Also some simple OSL optimization, passing thread data pointer directly instead
of via thread local storage, and creating ustrings for attribute lookup.
2012-12-01 19:15:05 +00:00
Brecht Van Lommel
204113b791 Fix #33107: cycles fixed threads 1 was still having two cores do work,
because main thread works as well.
2012-11-07 21:00:49 +00:00
Sergey Sharybin
6eec49ed20 Cycles: memory usage report
This commit adds memory usage information while rendering.

It reports memory used by device, meaning:

- For CPU it'll report real memory consumption
- For GPU rendering it'll report GPU memory consumption, but it'll
  also mean the same memory is used from host side.

This information displays information about memory requested by Cycles,
not memory really allocated on a device. Real memory usage might be
higher because of memory fragmentation or optimistic memory allocator.

There's really nothing we can do against this.

Also in contrast with blender internal's render cycles memory usage
does not include memory used by scene, only memory needed by cycles
itself will be displayed. So don't freak out if memory usage reported
by cycles would be much lower than blender internal's.

This commit also adds RenderEngine.update_memory_stats callback which
is used to tell memory consumption from external engine to blender.
This information is used to generate information line after rendering
is finished.
2012-11-05 08:04:57 +00:00
Sergey Sharybin
3b88a29abf Cycles: progressive refine option
Just makes progressive refine :)

This means the whole image would be refined gradually using as much
threads as it's set in performance settings. Having enough tiles is
required to have this option working as it's expected.

Technically it's implemented by repeatedly computing next sample for
all the tiles before switching to next sample.

This works around 7-12% slower than regular tile-based rendering, so
use this option only if you really need it.

This commit also fixes progressive update of image when Save Buffers
option is enabled.

And one more thing this commit fixes is handling display buffer with
Save Buffers option enabled. If this option is enabled image buffer
wouldn't have neither byte nor float buffer until image is fully
rendered which could backfire in missing image while rendering in
cases color management cache became full.

This issue solved by allocating byte buffer for image buffer from
tile update callback.

Patch was reviewed by Brecht. He also made some minor edits to
original version to patch. Thanks, man!
2012-10-13 12:38:32 +00:00
Lukas Toenne
efaf512406 Revert r50528: "Performance fix for Cycles: Don't wait in the main UI thread when resetting devices."
This commit leads to random freezes in Cycles rendering:
https://projects.blender.org/tracker/index.php?func=detail&aid=32545&group_id=9&atid=498

The goal of this commit was to remove UI lag for OSL, but since that is not officially supported yet, better revert it until a proper fix can be implemented in 2.65.
2012-09-17 12:07:06 +00:00
Lukas Toenne
31ed71cb6b Performance fix for Cycles: Don't wait in the main UI thread when resetting devices.
When the scene is updated Cycles resets the renderer device, cancelling
all existing tasks. The main thread would wait for all running tasks to
finish before continuing. This is ok when tasks can actually cancel in a
timely fashion. For OSL however, this does not work, since the OSL
shader group optimization takes quite a bit of time and can not be
easily be cancelled once running (on my crappy machine in full debug
mode: ~0.12 seconds for simple node trees). This would lead to very
laggy UI behavior and make it difficult to accurately control elements
such as sliders.

This patch removes the wait condition from the device->task_cancel
method. Instead it just sets the do_cancel flag and returns. To avoid
backlog in the task pool of the device it will return early from the
BlenderSession::sync function while the reset is going on (tested in
Session::resetting). Once all existing tasks have finished the do_cancel
flag is finally cleared again (checked in TaskPool::num_decrease).

Care has to be taken to avoid race conditions on the do_cancel flag,
since it can now be modified outside the TaskPool::cancel function
itself. For this purpose the scope of the TaskPool::num_mutex locks has
been extended, in most cases the mutex is now locked by the TaskPool
itself before calling TaskScheduler methods, instead of only locking
inside the num_increase/num_decrease functions themselves. The only
occurrence of a lock outside of the TaskPool methods is in
TaskScheduler::thread_run.

This patch is most useful in combination with the OSL renderer mode, so
it can probably wait until after the 2.64 release. SVM tasks tend to be
cancelled quickly, so the effect is less noticeable.
2012-09-11 11:41:51 +00:00
Brecht Van Lommel
adea12cb01 Cycles: merge of changes from tomato branch.
Regular rendering now works tiled, and supports save buffers to save memory
during render and cache render results.

Brick texture node by Thomas.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Brick_Texture

Image texture Blended Box Mapping.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Image_Texture
http://mango.blender.org/production/blended_box/

Various bug fixes by Sergey and Campbell.
* Fix for reading freed memory in some node setups.
* Fix incorrect memory read when synchronizing mesh motion.
* Fix crash appearing when direct light usage is different on different layers.
* Fix for vector pass gives wrong result in some circumstances.
* Fix for wrong resolution used for rendering Render Layer node.
* Option to cancel rendering when doing initial synchronization.
* No more texture limit when using CPU render.
* Many fixes for new tiled rendering.
2012-09-04 13:29:07 +00:00
Campbell Barton
0fbb6bff27 style cleanup: block comments 2012-06-09 17:22:52 +00:00
Brecht Van Lommel
dd9c1b7fbf Cycles: OpenCL image texture support, fix an attribute node issue and refactor
feature enabling #defines a bit.
2012-05-13 12:32:44 +00:00
Brecht Van Lommel
8103381ded Cycles: threading optimizations
* Multithreaded image loading, each thread can load a separate image.
* Better multithreading for multiple instanced meshes, different threads can now
  build BVH's for different meshes, rather than all cooperating on the same mesh.
  Especially noticeable for dynamic BVH building for the viewport, gave about
  2x faster build on 8 core in fairly complex scene with many objects.
* The main thread waiting for worker threads can now also work itself, so
  (num_cores + 1) threads will be working, this supposedly gives better
  performance on some operating systems, but did not measure performance for
  this very detailed yet.
2012-05-05 19:44:33 +00:00
Brecht Van Lommel
07b2241fb1 Cycles: merging features from tomato branch.
=== BVH build time optimizations ===

* BVH building was multithreaded. Not all building is multithreaded, packing
  and the initial bounding/splitting is still single threaded, but recursive
  splitting is, which was the main bottleneck.

* Object splitting now uses binning rather than sorting of all elements, using
  code from the Embree raytracer from Intel.
  http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/

* Other small changes to avoid allocations, pack memory more tightly, avoid
  some unnecessary operations, ...

These optimizations do not work yet when Spatial Splits are enabled, for that
more work is needed. There's also other optimizations still needed, in
particular for the case of many low poly objects, the packing step and node
memory allocation.

BVH raytracing time should remain about the same, but BVH build time should be
significantly reduced, test here show speedup of about 5x to 10x on a dual core
and 5x to 25x on an 8-core machine, depending on the scene.

=== Threads ===

Centralized task scheduler for multithreading, which is basically the
CPU device threading code wrapped into something reusable.

Basic idea is that there is a single TaskScheduler that keeps a pool of threads,
one for each core. Other places in the code can then create a TaskPool that they
can drop Tasks in to be executed by the scheduler, and wait for them to complete
or cancel them early.

=== Normal ====

Added a Normal output to the texture coordinate node. This currently
gives the object space normal, which is the same under object animation.

In the future this might become a "generated" normal so it's also stable for
deforming objects, but for now it's already useful for non-deforming objects.

=== Render Layers ===

Per render layer Samples control, leaving it to 0 will use the common scene
setting.

Environment pass will now render environment even if film is set to transparent.

Exclude Layers" added. Scene layers (all object that influence the render,
directly or indirectly) are shared between all render layers. However sometimes
it's useful to leave out some object influence for a particular render layer.
That's what this option allows you to do.

=== Filter Glossy ===

When using a value higher than 0.0, this will blur glossy reflections after
blurry bounces, to reduce noise at the cost of accuracy. 1.0 is a good
starting value to tweak.

Some light paths have a low probability of being found while contributing much
light to the pixel. As a result these light paths will be found in some pixels
and not in others, causing fireflies. An example of such a difficult path might
be a small light that is causing a small specular highlight on a sharp glossy
material, which we are seeing through a rough glossy material. With path tracing
it is difficult to find the specular highlight, but if we increase the roughness
on the material the highlight gets bigger and softer, and so easier to find.

Often this blurring will be hardly noticeable, because we are seeing it through
a blurry material anyway, but there are also cases where this will lead to a
loss of detail in lighting.
2012-04-28 08:53:59 +00:00
Brecht Van Lommel
803286dde8 Cycles: render passes for CUDA cards with compute model >= 2.x. 2012-01-26 19:07:01 +00:00
Brecht Van Lommel
f99343d3b8 Cycles: Render Passes
Currently supported passes:
* Combined, Z, Normal, Object Index, Material Index, Emission, Environment,
  Diffuse/Glossy/Transmission x Direct/Indirect/Color

Not supported yet:
* UV, Vector, Mist

Only enabled for CPU devices at the moment, will do GPU tweaks tommorrow,
also for environment importance sampling.

Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Passes
2012-01-25 17:23:52 +00:00
Brecht Van Lommel
5873301257 Sample as Lamp option for world shaders, to enable multiple importance sampling.
By default lighting from the world is computed solely with indirect light
sampling. However for more complex environment maps this can be too noisy, as
sampling the BSDF may not easily find the highlights in the environment map
image. By enabling this option, the world background will be sampled as a lamp,
with lighter parts automatically given more samples.

Map Resolution specifies the size of the importance map (res x res). Before
rendering starts, an importance map is generated by "baking" a grayscale image
from the world shader. This will then be used to determine which parts of the
background are light and so should receive more samples than darker parts.
Higher resolutions will result in more accurate sampling but take more setup
time and memory.

Patch by Mike Farnsworth, thanks!
2012-01-20 17:49:17 +00:00
Brecht Van Lommel
778774ba16 Fix: cycles CPU device not being used when it should be on some multi-GPU
configurations.
2012-01-11 13:18:06 +00:00
Brecht Van Lommel
d7932ceea8 Cycles: multi GPU rendering support.
The rendering device is now set in User Preferences > System, where you can
choose between OpenCL/CUDA and devices. Per scene you can then still choose
to use CPU or GPU rendering.

Load balancing still needs to be improved, now it just splits the entire
render in two, that will be done in a separate commit.
2012-01-09 16:58:01 +00:00
Brecht Van Lommel
049ab98469 Cycles: device code refactoring, no functional changes. 2012-01-04 18:06:32 +00:00
Brecht Van Lommel
b5595298d3 Cycles code refactoring: change displace kernel into more generic shader
evaluate kernel, added background shader evaluate.
2011-12-31 15:18:13 +00:00
Brecht Van Lommel
72d2d05770 Cycles: border rendering support, includes some refactoring in how pixels are
accessed on devices.
2011-12-20 12:25:37 +00:00
Brecht Van Lommel
db8024f4b5 Fix #29259: cycles issues on certain processors. Now two versions of the kernel
are compiled, one SSE optimized and the other not, and it will choose between
them at runtime.
2011-11-15 15:13:38 +00:00
Brecht Van Lommel
5fd67a3ba5 Cycles: enable multi closure sampling and transparent shadows only on CPU and
CUDA cards with shader model >= 2 for now (GTX 4xx, 5xx, ..). The CUDA compiler
can't handle the increased kernel size currently.
2011-10-16 18:54:27 +00:00
Brecht Van Lommel
66b1dfae89 Cycles: tweaks to properties and nodes
* Passes renamed to samples
* Camera lens radius renamed to aperature size/blades/rotation
* Glass and fresnel nodes input is now index of refraction
* Glossy and velvet fresnel socket removed
* Mix/add closure node renamed to mix/add shader node
* Blend weight node added for shader mixing weights

There is some version patching code for reading existing files, but it's not
perfect, so shaders may work a bit different.
2011-09-16 13:14:02 +00:00
Brecht Van Lommel
9b31cba74e Cycles: some warning fixes, cpu device task tweaks, avoid unnecessary
tonemap in non-viewport render, and some utility functions.
2011-09-08 18:58:07 +00:00