Previous fix didn't work well enough because on Windows Python has different
environment than Blender ans setting variables in there made no effect from
Blender point of view.
Currently only two mappings are supported by API, which is Repeat (old behavior)
and new Clip behavior. Internally this extension is being converted to periodic
flag which was already supported but wasn't exposed.
There's no support for OpenCL yet because of the way how we pack images into a
single texture.
Those settings are not exposed to UI or anywhere else and there should be no
functional changes so far.
This means render devices now might skip building baking kernels in cases when
only actual render-related functionality is used.
For now it's only implemented for OpenCL split kernel device and mainly needed
to work around some compiler-specific bugs which crashes on building the kernel.
Using OpenCL for baking might still crash the driver, but at least there is now
higher probability of that GPU will be usable to render the scene.
Real fix should actually be done in the driver side.
The idea is to make all kernels as small as possible to work around possible
issues with buggy drivers which might fail building feature-complete kernels.
It's indeed just a workaround to make at last simple test scenes to render
on OpenCL. Real fix should happen from the driver side.
Requires having latest El Capitan beta 3 OSX due to ome crucial fixes made in the
compiler. Supports same features as NVidia OpenCL apart from CMJ (there's no
experimental feature set support in megakernel yet).
Uses megakernel internally, which works much better than the split kernel. Split
kernel is not supported on OSX still, needs to be investigated still.
Some more details can be found there:
http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/OpenCL#AMD_on_OSX
This is not really supported by OpenCL but might happen in certain
configurations. There might be some remained cases when this happens
but so far can not find any,
it is the same issue as described in the previous commit, original changes
in this area were wrong and only worked on a bugger optimus driver which
simply appeared to work by co-incident and in fact used wrong device..
It was annoying copy-paste happened across OpenCL device constructor, device
enumeration and split kernel checks. Now those areas are using an utility
function which returns pairs of platform and device IDs for devices which are
supported by Cycles and enumeration is happening inside that list.
This makes it so filtering is happening in a single place, so there's no need
to keep 3 different functions in sync.
This commit also fixes a bug with wrong enumeration of devices caused by recent
fixes. Those fixes were in fact wrong and only happened to appear to be working
on laptop with optimus card on Linux. Root of those issues is in fact in bad
Linux driver for optimus cards.
They'll be checked for the version later and that check will fail anyway,
so better to not allow user to see unsupported device in the list.
Also corrected one more issue with the device enumeration.
Skipped devices did not reflect in the device number, which might result in bad
array indices.
This might also resolve T45037, and need to be ported to a release branch.
Round-up was only enabled for viewport render, which was for a long time hardcoded to
use 64 closures. This was done in order to avoid unnecessary kernel re-compilations
when tweaking the shader tree.
We could enable selective closure compilation in the viewport later if it'll give
measurable speed improvements, but even then round-up is to happen outside of the
device level,
This commit also removes early output which happened in cases when max closure did
not change. It was wrong because other requested kernel features might have been
changed.
This features are now based on the scene settings, so scenes without those features
used are rendered even faster.
This gives about 30% speedup on the AMD A10 APU here, but at the same time it does
not mean such an improvement will happen on all the hardware. That being said, the
Tonga device here seems to have no measurable difference.
In any case it seems handy to have for the future, when we'll want to support SSS
in the kernel or to port selective compilation/split kernel to CUDA devices.
For now it's reported to the stdout, matching to the CUDA behavior.
In the future we can hide this into GLog logging once the kernels
are considered all stable and so.
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.
Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.
Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.
There should be no functional changes.
Reviewers: juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1314
They were lost during simplification of kernel loading but might be rather
crucial for the performance.
Also made it so cflags are shared across kernels. Surely it might lead to
some unwanted kernel re-compilation but at the same time they might easily
run out of sync with the changes in kernel and so.
Experimental feature set id currently unavailable for megakernel, it'll
require some changes to the cache system to distinguish cached regular
kernels from cached experimental kernels.
Currently unused, but some features will be enabled soon.
This required allocating some memory related on object transform needed
by ShaderData and currently it is done for all the platforms. Since we're
targeting full feature-complete platforms this is rather acceptable at
this point and in the future we'll do selective NO_HAIR/NO_SSS/NO_BLUR
kernels.
This is experimental still and in fact there're some major issues on
NVidia platform and it's not really clear if it's a bug in compiler,
some uninitizlied variable or other kind of issue.
Some stupid fixes like spaces around operator and missing semicolon,
plus fix for wrong detecting of ShaderData SOA size. Thar was harmless
since there's only one closure array, but still better to fix this.
This file was actually checking for features enabled on CPU and surely all
of them were enabled, so removing them does not cause any difference.
ideally we'll need to do runtime feature detection and just pass some stuff
as NULL to the kernel, or maybe also have variadic kernel entry points which
is also possible quite easily.
No need to store them in the class, they're unlikely to be changed
and if they do change we're in big trouble anyway.
More appropriate approach would be then to typedef this things in
kernel_types.h, but still use inlined sizeof(),
Only those ones are priority for now, all the rest are still testable
if CYCLES_OPENCL_TEST or CYCLES_OPENCL_SPLIT_KERNEL_TEST environment
variables are set.
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
This is currently unused but crucial for things like calculating amount of
device memory required to deal with the tasks.
Maybe not really best place to store it, but consider it good enough for now.
Previously we only had experimental flag passed to device's load_kernel() which
was all fine. But since we're gonna to have some extra parameters passed there
it makes sense to wrap them into a single struct, which will make it easier to
pass stuff around.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
For CPU it gives available instructions set (SSE, AVX and so).
For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
This is what was handy troubleshooting issues in the studio,
plus this is exactly the same thing which would be helpful
when solving issues with paths to compiled shaders and cubins
for standalone repository.
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.
Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.
Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.
Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
This adds an AABB collision check for objects with volumes and if there's a
collision detected then the object will have SD_OBJECT_INTERSECTS_VOLUME flag.
This solves a speed regression introduced by the fix for T39823 by skipping
volume stack update in cases no volumes intersects the current SSS object.
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.
Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
Now we build 2 .cubins per architecture (e.g. kernel_sm_21.cubin, kernel_experimental_sm_21.cubin).
The experimental kernel can be used by switching to the Experimental Feature Set: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Experimental_Features
This enables Subsurface Scattering and Correlated Multi Jitter Sampling on GPU, while keeping the stability and performance of the regular kernel.
Differential Revision: https://developer.blender.org/D762
Patch by Sergey and myself.
Developer / Builder Note:
CUDA Toolkit 6.5 is highly recommended for this, also note that building the experimental kernel requires a lot of system memory (~7-8GB).
This problem was introduced in 983cbafd1877f8dbaae60b064a14e27b5b640f18
Basically the issue is that we were not getting a unique index in the
baking routine for the RNG (random number generator).
Reviewers: sergey
Differential Revision: https://developer.blender.org/D749
In collaboration with Sergey Sharybin.
Also thanks to Wolfgang Faehnle (mib2berlin) for help testing the
solutions.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D690
For now it was mainly about OpenCL wrangler being duplicated
between Cycles and Compositor, but with OpenSubdiv work those
wranglers were gonna to be duplicated just once again.
This commit makes it so Cycles and Compositor uses wranglers
from this repositories:
- https://github.com/CudaWrangler/cuew
- https://github.com/OpenCLWrangler/clew
This repositories are based on the wranglers we used before
and they'll be likely continued maintaining by us plus some
more players in the market.
Pretty much straightforward change with some tricks in the
CMake/SCons to make this libs being passed to the linker
after all other libraries in order to make OpenSubdiv linked
against those wranglers in the future.
For those who're worrying about Cycles being less standalone,
it's not truth, it's rather more flexible now and in the future
different wranglers might be used in Cycles. For now it'll
just mean those libs would need to be put into Cycles repository
together with some other libs from Blender such as mikkspace.
This is mainly platform maintenance commit, should not be any
changes to the user space.
Reviewers: juicyfruit, dingto, campbellbarton
Reviewed By: juicyfruit, dingto, campbellbarton
Differential Revision: https://developer.blender.org/D707
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D656
This kernel is compiled with AVX2, FMA3, and BMI compiler flags. At the moment only Intel Haswell benefits from this, but future AMD CPUs will have these instructions as well.
Makes rendering on Haswell CPUs a few percent faster, only benchmarked with clang on OS X though.
Part of my GSoC 2014.
Now baking does one AA sample at a time, just like final render. There is
also some code for shader antialiasing that solves T40369 but it is disabled
for now because there may be unpredictable side effects.
The kernel for baking the world texture was the same as the one used for
baking. Now that's separate which allows the kernel to reserve much less
memory.
Fixes T40027. This means we get more CPU usage again when using multiple CUDA,
but the impact on performance is too big a problem with the current code.
Instead of 95, we can use 145 images now. This only affects Kepler and above (sm30, sm_35 and sm_50).
This can be increased further if needed, but let's first test if this does not come with a performance impact.
Originally developed during my GSoC 2013.
Expand Cycles to use the new baking API in Blender.
It works on the selected object, and the panel can be accessed in the Render panel (similar to where it is for the Blender Internal).
It bakes for the active texture of each material of the object. The active texture is currently defined as the active Image Texture node present in the material nodetree. If you don't want the baking to override an existent material, make sure the active Image Texture node is not connected to the nodetree. The active texture is also the texture shown in the viewport in the rendered mode.
Remember to save your images after the baking is complete.
Note: Bake currently only works in the CPU
Note: This is not supported by Cycles standalone because a lot of the work is done in Blender as part of the operator only, not the engine (Cycles).
Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Bake
Supported Passes:
-----------------
Data Passes
* Normal
* UV
* Diffuse/Glossy/Transmission/Subsurface/Emit Color
Light Passes
* AO
* Combined
* Shadow
* Diffuse/Glossy/Transmission/Subsurface/Emit Direct/Indirect
* Environment
Review: D421
Reviewed by: Campbell Barton, Brecht van Lommel, Sergey Sharybin, Thomas Dinge
Original design by Brecht van Lommel.
The entire commit history can be found on the branch: bake-cycles
This also updates the configurations to build kernels for compute capability
5.0 cards, when using and older CUDA toolkit version this will be skipped.
Also includes tweaks to improve performance with this version:
* Increase max registers on sm_30, sm_35 and sm_50
* No longer use texture storage on sm_30
Otherwise devices used for display will lock up the UI too much. This means
you might still get 100% CPU for the display device, but for others CPU usage
should be low still.
The check to see if a device is used for display may not be entirely reliable,
it checks if there is a watchdog timeout on the device, but I'm not entirely
sure that always exists for display devices or is disabled for non-display
devices, though some tools like cuda-gdb seem to make the same assumption.
Ref T39559
This makes it easier to have per kernel number of registers. Also, all the
tunable parameters for this are now in kernel.cu, rather than spread over cmake,
scons and device_cuda.cpp.
This fixes the ptaxs "ACCESS_VIOLATION" error and should allow our Linux and Windows build bots to compile again.
Unfortunately this comes with a performance penalty on sm_2x cards, so this is only a workaround for now. Branched Path is still globally disabled on GPU.
Issue was caused by the wrong usage of OCIO GLSL binding API. To make it
work properly on pre-GLSL-1.3 drivers shader is to be enabled after the
texture is binded to the opengl context. Otherwise it wouldn't know the
proper texture size.
This is actually a regression in 2.70 and to be ported to 'a'.
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.
This patch adds user control to this setting.
Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)
Reviewers: brecht, dingto
Reviewed By: brecht
CC: dingto, venomgfx
Differential Revision: https://developer.blender.org/D317
This switches api usage for cuda towards using more of the Async calls.
Updating only once every second is sufficiently cheap that I don't think it is worth doing it less often.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D262
Cycles GPU Performance Regression
From my testing this (what i should have done in the first place) reduces the regression a lot.
Lets hope it is enough or we have to go back to busy waiting.
This patch adds a network_error() function more alike how other devices handle error's
- it adds a check for errors on load_kernels to make sure we do not crash if rendering without a server.
- it uses the non throwing variation of boost::asio::read.
Reviewers: brecht
Reviewed By: brecht
CC: brecht
Differential Revision: https://developer.blender.org/D86
This is my first stab at this and is based on this IRC converstation:
<mib2berlin> brecht: this is meaning as reminder only, I know you have other things to do > http://openvidia.sourceforge.net/index.php/Optimization_Notes#avoiding_busy_waits
<brecht> mib2berlin: thanks, bookmarked
only tested on Ubuntu 14.04 / cuda 5.0 but ill do some more testing tomorrow.
Also unsure about the placement and the lifetime of the stream and the event. But creating / deleting these seems to incur a non trivial cost.
Reviewers: brecht
Reviewed By: brecht
CC: mib2berlin, dingto
Differential Revision: https://developer.blender.org/D262
* AVX is available on Intel Sandy Bridge and newer and AMD Bulldozer and newer.
* We don't use dedicated AVX intrinsics yet, but gcc auto vectorization gives a 3% performance improvement for Caminandes. Tested on an i5-3570, Linux x64.
* No change for Windows yet, MSVC 2008 does not support AVX.
Reviewed by: brecht
Differential Revision: https://developer.blender.org/D216
After update to Mac OS X 10.9.1, OpenCL works now on my Intel CPU in the 2013 Macbook Pro (even the entire kernel).
The Intel Iris Pro GPU still segfaults here though, even when all flags are disabled (building "clay like" kernel only).
Maybe we need the -no-missing-prototypes for AMD hardware still, but I couldn't find a way to distuinguish here.
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.
* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code
This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.
Reviewers: brecht
Differential Revision: http://developer.blender.org/D36
This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.
WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.
Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
* Remove support for CUDA Toolkit 4.x, only Toolkit 5.0 and above are supported now.
* Remove support for sm_1x cards (< Fermi) for good. We didn't officially support those cards for a few releases already, now remove some special code that was still there.
use arrays instead of textures for general storage on this card (image textures
are still stored as texture). Textures were found to be faster on older cards,
but the limits on 1D texture size have not increased along with the memory size,
which meant that the full 6 GB could not be used.
The performance actually seems to be slightly better with arrays in some tests
on Titan. For older cards there seems to be a bit of a mix, some are better and
others not. We may change those to use arrays too, but more testing is needed,
only Titan and Tesla K20 (sm_35) is changed for now.
The fact that arrays are faster is a bit surprising, as others found textures
to be faster on Kepler. However even if they were, the memory limitation is
more important to solve anyway.
https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum
except for curves, that's still missing from the OpenColorIO GLSL shader.
The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
* GPU kernel can now be compiled without __NON_PROGRESSIVE__ again, was broken after my last commit. Also add a check for have_error(), in case the GPU kernel comes without Non-Progressive, to avoid a crash.
* Don't compile progressive kernel twice on CPU, if __NON_PROGRESSIVE__ would be disabled there.
* Non-Progressive integrator is now available on the GPU (CUDA, sm_20 and above).
Implementation details:
* kernel_path_trace() has been split up into two functions:
kernel_path_trace_non_progressive() and kernel_path_trace_progressive().
* We compile two CUDA kernel entry functions (in kernel.cu) for the two integrators, they are still inside one .cubin file but due to the kernel separation there should be no performance problem. I tested with the BMW file on my Geforce 540M and the render times were the same for 100 samples (1.57 min in my case).
This is part of my GSoC project, SVN merge of r59032 + manual merge of UI changes for this from my branch.
* "Auto Detect" now again uses the umber of cores, instead number of cores + 1.
This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
the second time, as for example Intel CPU startup time is 9 seconds.
* Adds an cache for contexts and programs for each platform and device pair,
which also ensure now no two threads try to compile and write the binary cache
file at the same time.
* Change clFinish to clFlush so we don't block until the result is done, instead
it will block at the moment we copy back memory.
* Fix error in Cycles time_sleep implementation, does not affect any active code
though.
* Adds some (disabled) debugging code in the task scheduler.
Patch #35559 by Doug Gale.
* Support using devices from all OpenCL platforms, so that you can use e.g. both
Intel and NVidia OpenCL implementations if you have them installed.
* Fix compile error due to missing fmodf after recent math node change.
* Enable advanced shading for Intel OpenCL.
* CYCLES_OPENCL_DEBUG environment variable for generating debug symbols so you
can debug with gdb. This crashes the compiler with Intel OpenCL on Linux though.
To make this work the preprocessed kernel source code is written out, as gdb
needs this.
* Show OpenCL compiler warnings even if the build succeeded.
* Some small fixes to initialize cdDevice to NULL, add missing NULL check when
creating buffer and add missing space at end of build options for Apple OpenCL.
* Fix crash with multi device + opencl, now e.g. CPU + GPU render should work.
I did a few tweaks to the code and also:
* Fix viewport render failing sometimes with Apple CPU OpenCL, was not taking
workgroup size limits into account properly.
* Add compile error when advanced shading in the Blender binary and OpenCL kernel
are not in sync.
for Apple OpenCL on OS X 10.8 and simple AO render.
Also environment variable CYCLES_OPENCL_TEST can now be set to CPU, GPU,
ACCELERATOR, DEFAULT or ALL values to test particuler devices.
* Deprecate computing capability 1.3 (sm_13)
This commit disables auto build of sm_13 CUDA platform, which means that starting with Blender 2.67, we don't support sm_13 devices anymore. It has become difficult to support that and it was already feature incomplete (no render-passes, AO, Multi Closure etc).
It's still possible to manually enable sm_13 for own tests, but building might break in the future.
* CUDA: Make it more clear that sm_12 and below is not supported.
* OpenCL: __KERNEL_SHADING__ was declared twice for nvidia opencl device.
* Some reshuffle of defines in kernel_types.h. No functional changes.
precompiled cubins instead,
Logic here is following now:
- If there're precompiled cubins, assume CUDA compute is available,
otherwise
- If cuda toolkit found, assume CUDA compute is available
- In all other cases CUDA compute is not available
For windows there're still check for only precompiled binaries,
no runtime compilation is allowed.
Ended up with such decision after discussion with Brecht. The thing
is, if we'll support runtime compilation on windows we'll end up
having lots of reports about different aspects of something doesn't
work (you need particular toolkit version, msvc installed, environment
variables set properly and so) and giving feedback on such reports
will waste time.
Also some simple OSL optimization, passing thread data pointer directly instead
of via thread local storage, and creating ustrings for attribute lookup.
This commit adds memory usage information while rendering.
It reports memory used by device, meaning:
- For CPU it'll report real memory consumption
- For GPU rendering it'll report GPU memory consumption, but it'll
also mean the same memory is used from host side.
This information displays information about memory requested by Cycles,
not memory really allocated on a device. Real memory usage might be
higher because of memory fragmentation or optimistic memory allocator.
There's really nothing we can do against this.
Also in contrast with blender internal's render cycles memory usage
does not include memory used by scene, only memory needed by cycles
itself will be displayed. So don't freak out if memory usage reported
by cycles would be much lower than blender internal's.
This commit also adds RenderEngine.update_memory_stats callback which
is used to tell memory consumption from external engine to blender.
This information is used to generate information line after rendering
is finished.
- move object_iterators.c --> view3d_iterators. (ED_object.h had to include ED_view3d.h which isn't so nice)
- move projection functions from view3d_view.c --> view3d_project.c (view3d_view was becoming a mishmash of utility functions and operators).
- some some cmake includes as system-includes.
Just makes progressive refine :)
This means the whole image would be refined gradually using as much
threads as it's set in performance settings. Having enough tiles is
required to have this option working as it's expected.
Technically it's implemented by repeatedly computing next sample for
all the tiles before switching to next sample.
This works around 7-12% slower than regular tile-based rendering, so
use this option only if you really need it.
This commit also fixes progressive update of image when Save Buffers
option is enabled.
And one more thing this commit fixes is handling display buffer with
Save Buffers option enabled. If this option is enabled image buffer
wouldn't have neither byte nor float buffer until image is fully
rendered which could backfire in missing image while rendering in
cases color management cache became full.
This issue solved by allocating byte buffer for image buffer from
tile update callback.
Patch was reviewed by Brecht. He also made some minor edits to
original version to patch. Thanks, man!