Commit Graph

69 Commits

Author SHA1 Message Date
Mai Lavelle
4388b29e98 Cycles: Add human readable sizes to debug output
Some of these values can get quite large and are hard to read, adding this
makes it easy to read them at a glance.

Reviewed By: sergey

Differential Revision: https://developer.blender.org/D2039
2016-05-31 06:13:54 -04:00
Sergey Sharybin
7b356a8565 Cycles: Reduce amount of malloc() calls from the kernel
This commit makes it so malloc() is only happening once per volume and
once per transparent shadow query (per thread), improving scalability of
the code to multiple CPU cores.

Hard to measure this with a low-bottom i7 here currently, but from quick
tests seems volume sampling gave about 3-5% speedup.

The idea is to store allocated memory in kernel globals, which are per
thread on CPU already.

Reviewers: dingto, juicyfruit, lukasstockner97, maiself, brecht

Reviewed By: brecht

Subscribers: Blendify, nutel

Differential Revision: https://developer.blender.org/D1996
2016-05-18 10:14:24 +02:00
Thomas Dinges
4a4f043bc4 Cycles: Add support for single channel float textures on CPU.
Until now, single channel textures were packed into a float4, wasting 3 floats per pixel. Memory usage of such textures is now reduced by 3/4.
Voxel Attributes such as density, flame and heat benefit from this, but also Bumpmaps with one channel.
This commit also includes some cleanup and code deduplication for image loading.

Example Smoke render from Cosmos Laundromat: http://www.pasteall.org/pic/show.php?id=102972
Memory here went down from ~600MB to ~300MB.

Reviewers: #cycles, brecht

Differential Revision: https://developer.blender.org/D1981
2016-05-11 21:58:34 +02:00
Thomas Dinges
d6555d936c Cleanup: Avoid duplicative defines for CPU textures, use the ones from util_texture.h
Also includes some further byte -> byte4 renaming, missed that in last commit.
2016-05-09 09:16:41 +02:00
Sergey Sharybin
f25f7c8030 Cycles: Re-implement some utilities to avoid use of boost
The title says it all actually, the idea is to make Cycles
only requiring Boost via 3rd party dependencies like OIIO
and OSL.

So now there are only few places which still uses Boost:

- Foreach, function bindings and threading primitives.

  Those we can easily get rid with C++11 bump (which seems
  inevitable sooner or later if we'll want ot use newer
  LLVM for OSL),

- Networking devices

  There's no quick solution for those currently, but there
  are some patches around which improves serialization.

Reviewers: juicyfruit, mont29, campbellbarton, brecht, dingto

Reviewed By: brecht, dingto

Differential Revision: https://developer.blender.org/D1764
2016-02-06 19:19:20 +01:00
Dalai Felinto
9a76354585 Cycles-Bake: Custom Baking passes
The combined pass is built with the contributions the user finds fit.

It is useful for lightmap baking, as well as non-view dependent effects
baking.

The manual will be updated once we get closer to the 2.77 release.
Meanwhile the new page can be found here:

http://dalaifelinto.com/blender-manual/render/cycles/baking.html

Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D1674
2016-01-15 13:00:56 -02:00
Sergey Sharybin
ac7aefd7c2 Cycles: Use special debug panel to fine-tune debug flags
This panel is only visible when debug_value is set to 256 and has no
affect in other cases. However, if debug value is not set to this
value, environment variables will be used to control which features
are enabled, so there's no visible changes to anyone in fact.

There are some changes needed to prevent devices re-enumeration on
every Cycles session create.

Reviewers: juicyfruit, lukasstockner97, dingto, brecht

Reviewed By: lukasstockner97, dingto

Differential Revision: https://developer.blender.org/D1720
2016-01-12 16:21:30 +05:00
Sergey Sharybin
944b6322e6 Cycles: Log whch optimizations are used for CPU kernels
Not fully thread-safe, but is rather harmless. Just some messages
might be logged several times.
2016-01-06 20:25:19 +05:00
Sergey Sharybin
e2846c999a Cycles: Fix stupid mistake which was assining kernel function in a loop 2016-01-06 20:05:33 +05:00
Sergey Sharybin
3918c8b9a5 Cycles: Optionally output luminance from the shader evaluation kernel
This makes it possible to move some parts of evaluation from host to the device
and hopefully reduce memory usage by avoid having full RGBA buffer on the host.

Reviewers: juicyfruit, lukasstockner97, brecht

Reviewed By: lukasstockner97, brecht

Differential Revision: https://developer.blender.org/D1702
2015-12-30 19:04:04 +05:00
Sergey Sharybin
3fba620858 Cycles: Prepare for more image extension types support
Basically just replace boolean periodic flag with extension type enum in the
device API.
2015-07-28 14:14:24 +02:00
Sergey Sharybin
f2c54df625 Cycles: Expose image image extension mapping to the image manager
Currently only two mappings are supported by API, which is Repeat (old behavior)
and new Clip behavior. Internally this extension is being converted to periodic
flag which was already supported but wasn't exposed.

There's no support for OpenCL yet because of the way how we pack images into a
single texture.

Those settings are not exposed to UI or anywhere else and there should be no
functional changes so far.
2015-07-21 21:58:19 +02:00
Sergey Sharybin
35812e65f4 Cycles: Fix compilation error on windows after recent logging changes 2015-04-10 22:35:10 +05:00
Sergey Sharybin
2f5dd83759 Cycles: Add some statistics logging
Covers number of entities in the scene (objects, meshes etc), also reports
sizes of textures being allocated.
2015-04-10 15:37:49 +05:00
Sergey Sharybin
5ff132182d Cycles: Code cleanup, spaces around keywords
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.

Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
2015-03-28 00:15:15 +05:00
Sergey Sharybin
585dd26120 Cycles: Code cleanup, prepare for strict C++ flags 2015-03-27 18:23:31 +05:00
Sergey Sharybin
7f406a53c7 Cycles: Cleanup for indentation in device_cpu.cpp
Perhaps became broken after rather recent change about which entry point
to kernel to use.
2015-02-19 19:05:04 +05:00
Sergey Sharybin
a922be9270 Cycles: Repot CPU and CUDA capabilities to system info operator
For CPU it gives available instructions set (SSE, AVX and so).

For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
2015-01-06 14:13:21 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Bastien Montagne
c14d34322b Fix typo breaking compilation with SSE2.
Spotted by sybrenstuvel (Sybren Stüvel), thanks!
2014-11-02 23:01:09 +01:00
Martijn Berger
4b33667b93 Deduplicate some code by using a function pointer to the real kernel
This has no performance impact what so ever and is already used in the adaptive sampling patch
2014-10-30 10:23:44 +01:00
Sergey Sharybin
cd6129d1ff Cycles: Workaround dead-slow expf() on 64bit linux
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.

Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.

Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
2014-10-06 12:36:46 +02:00
Sergey Sharybin
fbed2047c8 Fix wrong track of the memory when doing device vector resize before freeing it
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.

Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
2014-09-04 17:25:12 +06:00
Dalai Felinto
8d3cc431d7 Fix T41471 Cycles Bake: Setting small tile size results in wrong bake with stripes rather than the expected noise pattern
This problem was introduced in 983cbafd1877f8dbaae60b064a14e27b5b640f18
Basically the issue is that we were not getting a unique index in the
baking routine for the RNG (random number generator).

Reviewers: sergey

Differential Revision: https://developer.blender.org/D749
2014-08-19 11:40:33 +02:00
Dalai Felinto
fc55c41bba Cycles Bake: show progress bar during bake
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.

Reviewers: sergey

Differential Revision: https://developer.blender.org/D656
2014-07-25 11:42:53 -03:00
Thomas Dinges
866c7fb6e6 Cycles: Add an AVX2 CPU kernel.
This kernel is compiled with AVX2, FMA3, and BMI compiler flags. At the moment only Intel Haswell benefits from this, but future AMD CPUs will have these instructions as well.

Makes rendering on Haswell CPUs a few percent faster, only benchmarked with clang on OS X though.

Part of my GSoC 2014.
2014-06-13 22:26:20 +02:00
e4e58d4612 Fix T40370: cycles CUDA baking timeout with high number of AA samples.
Now baking does one AA sample at a time, just like final render. There is
also some code for shader antialiasing that solves T40369 but it is disabled
for now because there may be unpredictable side effects.
2014-06-06 15:39:04 +02:00
0075efc4d2 Fix T40306: cycles baking not distributing work among CPU cores well. 2014-05-26 13:51:11 +02:00
Dalai Felinto
eec3eaba08 Cycles Bake
Expand Cycles to use the new baking API in Blender.

It works on the selected object, and the panel can be accessed in the Render panel (similar to where it is for the Blender Internal).

It bakes for the active texture of each material of the object. The active texture is currently defined as the active Image Texture node present in the material nodetree. If you don't want the baking to override an existent material, make sure the active Image Texture node is not connected to the nodetree. The active texture is also the texture shown in the viewport in the rendered mode.

Remember to save your images after the baking is complete.

Note: Bake currently only works in the CPU
Note: This is not supported by Cycles standalone because a lot of the work is done in Blender as part of the operator only, not the engine (Cycles).

Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Bake

Supported Passes:
-----------------
Data Passes
 * Normal
 * UV
 * Diffuse/Glossy/Transmission/Subsurface/Emit Color

Light Passes
 * AO
 * Combined
 * Shadow
 * Diffuse/Glossy/Transmission/Subsurface/Emit Direct/Indirect
 * Environment

Review: D421
Reviewed by: Campbell Barton, Brecht van Lommel, Sergey Sharybin, Thomas Dinge

Original design by Brecht van Lommel.

The entire commit history can be found on the branch: bake-cycles
2014-05-02 21:19:09 -03:00
a2e4ebd36a Cycles code internals: add CPU kernel support for 3D image textures. 2014-03-29 13:03:48 +01:00
Martijn Berger
dd2dca2f7e Add support for multiple interpolation modes on cycles image textures
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.

This patch adds user control to this setting.

Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)

Reviewers: brecht, dingto

Reviewed By: brecht

CC: dingto, venomgfx

Differential Revision: https://developer.blender.org/D317
2014-03-07 23:16:33 +01:00
Thomas Dinges
de28a4d4b2 Cycles: Add an AVX kernel for CPU rendering.
* AVX is available on Intel Sandy Bridge and newer and AMD Bulldozer and newer.
* We don't use dedicated AVX intrinsics yet, but gcc auto vectorization gives a 3% performance improvement for Caminandes. Tested on an i5-3570, Linux x64.
* No change for Windows yet, MSVC 2008 does not support AVX.

Reviewed by: brecht
Differential Revision: https://developer.blender.org/D216
2014-01-16 17:04:11 +01:00
Thomas Dinges
9351ac0d85 Cycles: Skip the compilation of the dedicated SSE2 kernel on x86-64, we can assume SSE2 here, so just re-use the regular one. Saves 500kb in the blender binary.
Reviewed by: brecht
Differential Revision: https://developer.blender.org/D199
2014-01-14 20:39:54 +01:00
Thomas Dinges
ce6dce3b13 Code cleanup / Cycles: else/if for SSE41 kernel functions. 2014-01-06 03:22:14 +01:00
Martijn Berger
85a0c5d4e1 Cycles: network render code updated for latest changes and improved
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.

* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code

This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.

Reviewers: brecht

Differential Revision: http://developer.blender.org/D36
2013-12-07 12:26:58 +01:00
Martijn Berger
e3a79258d1 Cycles: test code for sse 4.1 kernel and alignment for some vector types.
This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.

WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.

Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
2013-11-22 14:42:41 +01:00
Campbell Barton
48c1e0c0fc spelling: use American spelling for canceled 2013-10-26 01:06:19 +00:00
Brecht Van Lommel
29f6616d60 Cycles: viewport render now takes scene color management settings into account,
except for curves, that's still missing from the OpenColorIO GLSL shader.

The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
2013-08-30 23:49:38 +00:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Thomas Dinges
9732c6283e Cycles / CPU Rendering:
* "Auto Detect" now again uses the umber of cores, instead number of cores + 1.

This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.
2013-07-20 00:40:03 +00:00
Thomas Dinges
11707119de Cycles:
* Code cleanup, remove unused "resolution" variable from the DeviceTask class, was never used.
2013-05-14 21:18:20 +00:00
Thomas Dinges
a239700f43 Cycles:
* Code cleanup, remove deprecated support_advanced_shading() functions. Left over from r43734.
2013-02-21 17:10:14 +00:00
Brecht Van Lommel
d095bcc8aa Fix cycles not using SSE3 kernel after recent, order with SSE2 should be switched,
pointed out by Chad Fraleigh.
2013-02-12 14:58:46 +00:00
Brecht Van Lommel
7c9d993347 Fix cycles intersection issue with overlapping faces on windows 32 bit and CPU
without SSE3 support, due to 80 bit precision float register being used for one
bounding box but not the one next to it.
2013-02-04 16:12:37 +00:00
Brecht Van Lommel
7c0a0bae79 Fix #33375: OSL geom:trianglevertices gave wrong coordinates for static BVH.
Also some simple OSL optimization, passing thread data pointer directly instead
of via thread local storage, and creating ustrings for attribute lookup.
2012-12-01 19:15:05 +00:00
Brecht Van Lommel
204113b791 Fix #33107: cycles fixed threads 1 was still having two cores do work,
because main thread works as well.
2012-11-07 21:00:49 +00:00
Sergey Sharybin
6eec49ed20 Cycles: memory usage report
This commit adds memory usage information while rendering.

It reports memory used by device, meaning:

- For CPU it'll report real memory consumption
- For GPU rendering it'll report GPU memory consumption, but it'll
  also mean the same memory is used from host side.

This information displays information about memory requested by Cycles,
not memory really allocated on a device. Real memory usage might be
higher because of memory fragmentation or optimistic memory allocator.

There's really nothing we can do against this.

Also in contrast with blender internal's render cycles memory usage
does not include memory used by scene, only memory needed by cycles
itself will be displayed. So don't freak out if memory usage reported
by cycles would be much lower than blender internal's.

This commit also adds RenderEngine.update_memory_stats callback which
is used to tell memory consumption from external engine to blender.
This information is used to generate information line after rendering
is finished.
2012-11-05 08:04:57 +00:00
Sergey Sharybin
3b88a29abf Cycles: progressive refine option
Just makes progressive refine :)

This means the whole image would be refined gradually using as much
threads as it's set in performance settings. Having enough tiles is
required to have this option working as it's expected.

Technically it's implemented by repeatedly computing next sample for
all the tiles before switching to next sample.

This works around 7-12% slower than regular tile-based rendering, so
use this option only if you really need it.

This commit also fixes progressive update of image when Save Buffers
option is enabled.

And one more thing this commit fixes is handling display buffer with
Save Buffers option enabled. If this option is enabled image buffer
wouldn't have neither byte nor float buffer until image is fully
rendered which could backfire in missing image while rendering in
cases color management cache became full.

This issue solved by allocating byte buffer for image buffer from
tile update callback.

Patch was reviewed by Brecht. He also made some minor edits to
original version to patch. Thanks, man!
2012-10-13 12:38:32 +00:00
Lukas Toenne
efaf512406 Revert r50528: "Performance fix for Cycles: Don't wait in the main UI thread when resetting devices."
This commit leads to random freezes in Cycles rendering:
https://projects.blender.org/tracker/index.php?func=detail&aid=32545&group_id=9&atid=498

The goal of this commit was to remove UI lag for OSL, but since that is not officially supported yet, better revert it until a proper fix can be implemented in 2.65.
2012-09-17 12:07:06 +00:00
Lukas Toenne
31ed71cb6b Performance fix for Cycles: Don't wait in the main UI thread when resetting devices.
When the scene is updated Cycles resets the renderer device, cancelling
all existing tasks. The main thread would wait for all running tasks to
finish before continuing. This is ok when tasks can actually cancel in a
timely fashion. For OSL however, this does not work, since the OSL
shader group optimization takes quite a bit of time and can not be
easily be cancelled once running (on my crappy machine in full debug
mode: ~0.12 seconds for simple node trees). This would lead to very
laggy UI behavior and make it difficult to accurately control elements
such as sliders.

This patch removes the wait condition from the device->task_cancel
method. Instead it just sets the do_cancel flag and returns. To avoid
backlog in the task pool of the device it will return early from the
BlenderSession::sync function while the reset is going on (tested in
Session::resetting). Once all existing tasks have finished the do_cancel
flag is finally cleared again (checked in TaskPool::num_decrease).

Care has to be taken to avoid race conditions on the do_cancel flag,
since it can now be modified outside the TaskPool::cancel function
itself. For this purpose the scope of the TaskPool::num_mutex locks has
been extended, in most cases the mutex is now locked by the TaskPool
itself before calling TaskScheduler methods, instead of only locking
inside the num_increase/num_decrease functions themselves. The only
occurrence of a lock outside of the TaskPool methods is in
TaskScheduler::thread_run.

This patch is most useful in combination with the OSL renderer mode, so
it can probably wait until after the 2.64 release. SVM tasks tend to be
cancelled quickly, so the effect is less noticeable.
2012-09-11 11:41:51 +00:00