This is an attempt to gracefully handle out-of-memory events
and stop rendering with an error message instead of a crash.
It uses bad_alloc exception, and usually i'm not really fond
of exceptions, but for such limited use for errors from which
we can't recover it should be fine.
Ideally we'll need to stop full Cycles Session, so viewport
render and persistent images frees all the memory, but that
we can support later, since it'll mainly related on telling
Blender what to do.
General rules are:
- Use as less exception handles as possible, try to find a
most geenric pace where to handle those.
For example, ccl::Session.
- Threads needs own handling, exception trap from one thread
will not catch exceptions from other threads.
That's why BVH build needs own thing.
Reviewers: brecht, juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1898
This mimics behavior of default allocators in STL and allows all the routines
to catch out-of-memory exceptions and hopefully recover from that situation/
Instead of treating Fermi GPU limits as default,
and overriding them for other devices,
we now nicely set them for each platform.
* Due to setting values for all platforms,
we don't have to offset the slot id for OpenCL anymore,
as the image manager wont add float images for OpenCL now.
* Bugfix: TEX_NUM_FLOAT_IMAGES was always 5, even for CPU,
so the code in svm_image.h clamped float textures with alpha on CPU after the 5th slot.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: brecht
Differential Revision: https://developer.blender.org/D1925
Seems particular CUDA implementations has some precision issues,
which made integer coordinate (which was expected to always be
positive) to go negative.
Couple of things:
- No need to use string streams to format the version string,
we can do it at compile time and don't bother with anything
at runtime.
- Function declaration was wring and would have caused linking
conflicts in cases when util_version.h was included from
multiple places.
We should have an utility function to get Cycles version so
applications which are linked to Cycles dynamically can query
the version, but that can't be done as an inlined function in
header and would need to be a function properly exported to a
global symbol table (aka, be implemented in a .cpp file).
Now Cycles has its own versioning, that is mainly interesting for external projects, which integrate the engine.
We start with version 1.7.0. Reasons for that:
* The engine is too mature for a 1.0 release.
* We assume that Cycles inside of Blender 2.61 was version 0.1. We count upwards in 0.1 steps, therefore Cycles inside of Blender 2.77 would be 1.7.
We use a common versioning scheme here, with 3 decimals for the major, minor and patch level.
At the moment cycles --version can be used to display the version, easy to parse for external projects. The info will be added to the UI later aswell.
When the Moon is full it was possible to have a dead-lock in task
scheduler's exit() method.
Similar problem was fixed in Blender's task scheduler 3 years ago
in bae2a2c.
Policy here is a bit more complicated, if tree becomes too deep we're
forced to create a leaf node and size of that leaf wouldn't be so well
predicted, which means it's quite tricky to use single stack array for
that.
Made it more official feature that StackAllocator will fall-back to
heap when running out of stack memory.
It's still much better than always using heap allocator.
We had per-tree statistics already, but it's a bit tricky to see overall
time because trees could be building in parallel.
In fact, we can now print statistics for any TaskPool.
Main use case of this ID will be to emulate TLS which otherwise
would require having some platform-specific implementations which
is not always really optimal.
See notes about the argument in util_task.h.
Currently unused, but will be handy for an upcoming changes.
It'll also be nice to be able to do scoped_lock() for both
Mutex and Spin, but currently it's not really easy to do,
need some changes in typedefs and such, will happen as a
separate commit.
This seems quite useful for the development, so you don't need to wait
all the kernels to be re-compiled when working on a new feature, which
speeds up re-iteration.
Marked as an advanced option, so if it doesn't work so well in practice
it's safe to revert anyway.
This is a mix of regression and old unsupported configuration.
Regression was caused by some checks added on Blender side which was
checking whether python function returned error or not. This made it
impossible to enable Cycles when running from a file path which can't
be encoded with MBCS codepage.
Non-regression issue was that it wasn't possible to use pre-compiled
CUDA kernels when running from a path with non-ascii multi-byte
characters.
This commit fixes regression and CUDA parts, but OSL still can't be
used from a non-ascii location because it uses non-widechar API to
work with file paths by the looks of it. Not sure we can solve this
just from our side by using some codepage trick (UTF-16?) since even
oslc fails to compile shader when there are non-ascii characters in
the path.
It's not really handy to silence something unused hoping for it'll be
used in the future. We can end up with quite some silencing then.
Also made this flag which i find rather useless to NOT cause -Werror
in Cycles code.
The issue was caused by static vectors allocating some internal
data using rebound element allocator for them, which was causing
access to a non-initialized statistics objects and was failing a
lot when switching Blender to a fully guarded allocation.
Additionally, we were not able to free that internal memory before
Blender exits, which was causing false-positive memory leak prints.
Now we're not using GuardedAllocator for those proxy containers.
Ideally this should be done as a GuardedAllocator::rebind, but
it didn't work for vector<bool> because it seems some internal
parts are converting bool to char32_t, which either makes it so
we can't use GuardedAllocator for those vectors or the compiler
get's confused when we're trying explicitly allow GuardedAllocator
for rebind<char32_t>.
This with current approach we should be fine for the release.
Was caused by some safety things of making sure we've for NULL
terminator for the buffer when doing mbs<->wcs conversion, but
it turns out this simply confuses str::string and it can no
longer have proper .size(). Let's assume behavior of string
allocation is same all over the std, and we can avoid having
that extra null-terminator allocated.
C++ requires specific alignment of the allocations which was not an
issue when using GCC but uncovered issue when using Clang on OSX.
Perhaps some versions of Clang might show errors on other platforms
as well.
We don't have vectors re-allocation happening multiple times from inside
a loop anymore, so we can safely switch to a memory guarded allocator for
vectors and keep track on the memory usage at various stages of rendering.
Additionally, when building from inside Blender repository, Cycles will
use Blender's guarded allocator, so actual memory usage will be displayed
in the Space Info header.
There are couple of tricky aspects of the patch:
- TaskScheduler::exit() now explicitly frees memory used by `threads`.
This is needed because `threads` is a static member which destructor
isn't getting called on Blender's exit which caused memory leak print
to happen.
This shouldn't give any measurable speed issues, reallocation of that
vector is only one of fewzillion other allocations happening during
synchronization.
- Use regular guarded malloc (not aligned one). No idea why it was
made to be aligned in the first place. Perhaps some corner case tests
or so. Vector was never expected to be aligned anyway. Let's see if
we'll have actual bugs with this.
Reviewers: dingto, lukasstockner97, juicyfruit, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1774
Basically the idea is to make code robust against extending
enum options in the future by falling back to a known safe
default setting when RNA is set to something unknown.
While this approach solves the issues similar to T47377,
but it wouldn't really help when/if any of the RNA values
gets ever deprecated and removed. There'll be no simple
solution to that apart from defining explicit mapping from
RNA value to Cycles one.
Another part which isn't so great actually is that we now
have to have some enum guards and give some explicit values
to the enum items, but we can live with that perhaps.
Reviewers: dingto, juicyfruit, lukasstockner97, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1785
The title says it all actually, the idea is to make Cycles
only requiring Boost via 3rd party dependencies like OIIO
and OSL.
So now there are only few places which still uses Boost:
- Foreach, function bindings and threading primitives.
Those we can easily get rid with C++11 bump (which seems
inevitable sooner or later if we'll want ot use newer
LLVM for OSL),
- Networking devices
There's no quick solution for those currently, but there
are some patches around which improves serialization.
Reviewers: juicyfruit, mont29, campbellbarton, brecht, dingto
Reviewed By: brecht, dingto
Differential Revision: https://developer.blender.org/D1764
We've upgraded to Boost-1.60 and MSVC2013 since the workaround
was originally committed. After checks with current compiler and
libraries the original bug is no longer happening.
This will make string comparison much faster in Windows, solving
synchronization bottlenecks of fewzillion objects.
Thanks Martin Felke (aka scorpion81) for the tests!
This panel is only visible when debug_value is set to 256 and has no
affect in other cases. However, if debug value is not set to this
value, environment variables will be used to control which features
are enabled, so there's no visible changes to anyone in fact.
There are some changes needed to prevent devices re-enumeration on
every Cycles session create.
Reviewers: juicyfruit, lukasstockner97, dingto, brecht
Reviewed By: lukasstockner97, dingto
Differential Revision: https://developer.blender.org/D1720
Although the code made it impossible to use time_start_ uninitialized, at least GCC did
still produce multiple warnings about it.
Since time_dt() is an extremely cheap operation and functionality does not change in any way when
removing the check in the constructor, this commit removes the check and therefore the warning.
This patch adds the "Hilbert Spiral", a custom-designed continuous space-filling curve, as a tile order for rendering in Cycles.
It essentially works by dividing the tiles into tile blocks which are processed in a spiral outwards from the center. Inside each
block, the tiles are processed in a regular Hilbert curve pattern. By rotating that pattern according to the spiral direction,
a continuous curve is obtained, which helps with cache coherency and therefore rendering speed.
The curve is a compromise between the faster-rendering Bottom-to-Top etc. orders and the Center order, which is a bit slower,
but starts with the more important areas. The Hilbert Spiral also starts in the center (unless huge tiles are used) and is still
marginally slower than Bottom-to-Top, but noticeably faster than Center.
Reviewers: sergey, #cycles, dingto
Reviewed By: #cycles, dingto
Subscribers: iscream, gregzaal, sergey, mib2berlin
Differential Revision: https://developer.blender.org/D1166
This reduces stress on the the stack memory which could be really handy
on certain operation systems which applies strict limits on the stack.
Reviewers: brecht, juicyfruit, dingto
Reviewed By: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1656
Previously shutter was instantly opening, staying opened for the shutter time
period of time and then instantly closing. This isn't quite how real cameras
are working, where shutter is opening with some curve. Now it is possible to
define user curve for how much shutter is opened across the sampling period
of time.
This could be used for example to make motion blur trails softer.
Currently only image loading benefits of this and will give magenta color
when image manager detects it's running out of memory.
This isn't ideal solution and can't handle all cases. For example, OOM
killer might kill process before it realized it run out of memory, but
in other cases this could prevent some crashes.
Reviewers: juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1502
Recent changes to kernel broke compilation of the kernels again, need some
other kind of solution for this issue.
Don't have much time for this currently, but will be addressed before the
release.
Meanwhile it's better to have some buildbot builds instead of totally failing
one.
Technically it was all wrong and it should have been called Extend instead
of Clip. Got confused by the naming in different libraries.
More options are still to come.
Currently only two mappings are supported by API, which is Repeat (old behavior)
and new Clip behavior. Internally this extension is being converted to periodic
flag which was already supported but wasn't exposed.
There's no support for OpenCL yet because of the way how we pack images into a
single texture.
Those settings are not exposed to UI or anywhere else and there should be no
functional changes so far.
Works totally similar to camera motion blur and majority of the changes are
related on just passing extra arguments to sync() functions.
Couple of things still to look into:
- Motion pass will not include motion caused by the zoom.
- Only perspective cameras are supported currently.
- Motion is being interpolated on projected coordinates, which might give
different results from constructing projection matrix from interpolated
field of view.
This could be good enough for us, but we need to consider improving this
at some point.
Reviewers: juicyfruit, dingto
Reviewed By: dingto
Differential Revision: https://developer.blender.org/D1383
This way we solve possible issues caused by regular allocator not being aware of
some classes preferring 16 bytes alignment needed for SSE to work properly. This
caused random crashes during rendering.
Now we always use aligned allocation in GuardedAllocator which shouldn't be any
measurable performance impact and the code is only used by developers after
defining special symbol, so there is no impact on release builds at all.
This makes OCIO viewport color correction a little bit faster (about -0.5s for 100 samples)
Also set max half float value to 65504.0 to conform with IEEE 754.
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.
Original patch by @lockal with own modification:
Don't make changes outside of the kernel. They don't make any difference
anyway and term saturate() has a bit different meaning outside of kernel.
This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D1224
Our own implementation is in fact the same performance as in fast_math from
OpenShadingLanguage, but implementation from fast_math is using explicit madd
function, which increases chance of compiler deciding to use intrinsics.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
This commit makes some preliminary fixes and tweaks aimed to make blender
compilable with C++11 feature set. This includes:
- Build system attribute to enable C++11 featureset.
It's for sure default OFF, but easy to enable to have a play around with
it and make sure all the stuff is compilable before we go C++11 for real.
- Changes in Compositor to use non-named cl_int structure fields.
This is because __STRICT_ANSI__ is defined by default by GCC and OpenCL
does not use named fields in this case.
- Changes to TYPE_CHECK() related on lack of typeof() in C++11
This uses decltype() instead with some trickery to make sure returned type
is not a reference.
- Changes for auto_ptr in Freestyle
This actually conditionally switches between auto_ptr and unique_ptr since
auto_ptr is deprecated in C++11. Seems to be not strictly needed but still
nice to be ready for such an update anyway/
This all based on changes form depsgraph_refactor branch apart from the weird
changes which were made in order to support MinGW compilation. Those parts of
change would need to be carefully reviewed again after official move to gcc49
in MinGW.
Tested on Linux with GCC-4.7 and Clang-3.5, other platforms are not tested and
likely needs some more tweaks.
Reviewers: campbellbarton, juicyfruit, mont29, lukastoenne, psy-fi, kjym3
Differential Revision: https://developer.blender.org/D1089
OpenCL doesn't let you to get address of vector components, which
is kinda annoying. On the other hand, maybe now compiler will have
more chances to optimize something out.
Title pretty says it all actually. Can only briefly mention that we're
indeed entering that state when after applying some WIP patches having
much fuller statistics about memory usage would help giving exact memory
benefit.
It's actually a bad level call, but it's inside ifdef block and disabled by
default and only intended to be used for development purposes.
Main idea of this change is to combine statistics coming from Cycles and
Blender during scene synchronization step, to see if further changes are
actually reducing memory footprint.
The method is called vector::free_memory(). Use with care since it'll invalidate
all the pointers to vector memory, all iterators and so on.
Currently unused, but might become handy when clearing unused data.
The commit implements a guarded allocator which can be used by STL classes
such as vectors, maps and so on. This allocator will keep track of current
and peak memory usage which then can be queried.
New code for allocator is only active when building Cycles with debug flag
(WITH_CYCLES_DEBUG) and doesn't distort regular builds too much.
Additionally now we're using own subclass of std::vector which allows us
to implement shrink_to_fit() method which would ensure capacity of the
vector is as big as it should be (without this making vector smaller will
still use all previous memory allocated).
This replaces our own implementation of aligned malloc with system calls,
which depends on which operation system you're on.
This is probably really minor noticeable change, but in the same time it
might reduce amount of wasted memory.
Purely developers-only feature which allows to disable some of the CPU
capabilities. This way it's easier to test different kernels on the
same machine.
Root of the issue goes to the fact that since the very beginning Cycles was
using ZYX euler rotation for mapping shader node but blender was always
using XYZ euler rotation.
This commit switches Cycles to use XYZ euler order and adds versioning code
to preserve backward compatibility.
There was no really nice solution here because either we're ending up with
versioning code or we'll need to deal with all sort of exceptions from blender
side in order to support ZYX order for the mapping node. The latest one is
also creepy from the other render engines points of view -- that might break
compatibility with existing bindings or introduce some extra headache for them
in the future.
This could also become a PITA for us with need of supporting all sort of weird
and wonderful exceptions in the refactored viewport project.
NOTE: This commit breaks forward compatibility, meaning opening new files in
older blender might not give proper result if Mapping node was used.
Also, libraries are to be re-saved separately from the scene file, otherwise
versioning code for them wouldn't run if scene file was re-saved with new
version of blender.
Reviewers: brecht, juicyfruit, campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D973
This way Cycles finally becomes feature-full on image projections
compared to Blender Internal and Gooseberry Project Team could
finally finish the movie.
This patch makes Cycles ignore the time spent in BVH construction etc. when
estimating the remaining time. Considering that the remaining time is calculated
based on the average time per tile so far, as far as I understand it makes no
sense to include the preprocessing time.
Reviewers: sergey, #cycles
Reviewed By: sergey, #cycles
Subscribers: sergey
Projects: #cycles
Differential Revision: https://developer.blender.org/D895
OpenCL apparently does not support templates, so the idea of generic
function for swapping is a bit of a failure. Now it is either inlined
into the code (in triangle intersection) or has specific implementation
for QBVH.
This is probably even better, because we can't create QBVH-specific
function in util_math anyway.
This commit contains all the tweaks which were missing in initial patch
re-integration from the standalone Cycles repository.
This commit also contains an utility cmake macro to help linking targets
with different libraries for release/debug builds, the name currently is
target_link_libraries_decoupled
it gets a target and list of libraries and makes sure debug builds are
using libraries with "_d" suffix.
After all this changes it'll hopefully be easier to interchange patches
between blender and standalone repositories, because they're now quite
identical.
This way it is now possible to use gflags >= 2.1, where all the
functions were moved from google to gflags namespace.
This isn't currently used in blender, but for standalone repository
this change is essential.
Basic idea is to check whether OIIO is compiled with embedded PugiXML parser
and if so use PugiXML from OIIO, otherwise find a standalone PugiXML library.
Most of them are not currently used but are essential for the further work.
- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i
- Added templatedversions of min4, max4 which are handy to use with register
variables.
- Added util_swap function which gets arguments by pointers.
So hopefully it'll be a portable version of std::swap.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
Currently it acts the same as set_cancel(), but this way we're able to
distinguish situations when rendering was aborted by user demand (for
example pressing Esc in standalone renderer) or if something went horribly
wrong (for example out of VRAM error).
It appears it's not really needed for convenient debugging when
using proper flags passed to the compiler. Basically, it is -g3
and set breakpoint to a function as if it's not in the namespace.
Not as if a code was any wrong, just it's possible to have more
clear solution for the issue i've tried to solve in the past.
Quite annoying, the same thing we do from the blender side, But as a positive
side we can get rid of some utf8/utf16 conversions.
Hopefully it all work fine now, at leats works on mu russki windoze laptop.
It works around strange shading bug when building with MSVC.
If such weirdeness continues, we perhaps would need to use
proper inline flags all the time.
Anyway, lets see how things will behave now.
Basically the title says it all, volume stack initialization now is aware that
camera might be inside of the volume. This gives quite noticeable render time
regressions in cases camera is in the volume (didn't measure them yet) because
this requires quite a few of ray-casting per camera ray in order to check which
objects we're inside. Not quite sure if this might be optimized.
But the good thing is that we can do quite a good job on detecting whether
camera is outside of any of the volumes and in this case there should be no
time penalty at all (apart from some extra checks during the sync state).
For now we're only doing rather simple AABB checks between the viewplane and
volume objects. This could give some false-positives, but this should be good
starting point.
Need to mention panoramic cameras here, for them it's only check for whether
there are volumes in the scene, which would lead to speed regressions even if
the camera is outside of the volumes. Would need to figure out proper check
for such cameras.
There are still quite a few of TODOs in the code, but the patch is good enough
to start playing around with it checking whether there are some obvious mistakes
somewhere.
Currently the feature is only available in the Experimental feature sey, need
to solve some of the TODOs and look into making things faster before considering
the feature is ready for the official feature set. This would still likely
happen in current release cycle.
Reviewers: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D794
This commit makes it possible to use Glog library for the debug logging.
For now only possible when using CMake and in order to use the logging
the WITH_CYCLES_LOGGING configuration variable is to be enabled.
When this option is not enabled or when using Scons there's no difference
in Cycles behavior at all, when using logging and no output to the console
impact is gonna to be minimal.
This is done in order to make it possible to have debug logging persistent
in code (without need to add it when troubleshooting some bug and removing
it afterwards).
For now actual logging is not placed yet, only all the functions needed for
the logging are written and so.
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.
Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
Before this Cycles used to try using the cache even so it knew for the
fact that reading it from the disk failed. This change doesn't make it
more stable if someone will try to trick Cycles and give malformed data
but it solves general cases when Blender crashed during the cache write
and will preserve rendering from crashing when trying to use that partial
cache.
For now it was mainly about OpenCL wrangler being duplicated
between Cycles and Compositor, but with OpenSubdiv work those
wranglers were gonna to be duplicated just once again.
This commit makes it so Cycles and Compositor uses wranglers
from this repositories:
- https://github.com/CudaWrangler/cuew
- https://github.com/OpenCLWrangler/clew
This repositories are based on the wranglers we used before
and they'll be likely continued maintaining by us plus some
more players in the market.
Pretty much straightforward change with some tricks in the
CMake/SCons to make this libs being passed to the linker
after all other libraries in order to make OpenSubdiv linked
against those wranglers in the future.
For those who're worrying about Cycles being less standalone,
it's not truth, it's rather more flexible now and in the future
different wranglers might be used in Cycles. For now it'll
just mean those libs would need to be put into Cycles repository
together with some other libs from Blender such as mikkspace.
This is mainly platform maintenance commit, should not be any
changes to the user space.
Reviewers: juicyfruit, dingto, campbellbarton
Reviewed By: juicyfruit, dingto, campbellbarton
Differential Revision: https://developer.blender.org/D707
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D656
This way util_simd.cpp would not require modifications
if/when SSE2 is suddenly supported on 32bit platforms.
This also allowed to unleash some issues with util_simd.h
related on the fact that there size_t and int are actually
the same types.
* Added support for uchar4 attributes to Cycles' attribute system.
* This is used for Vertex Colors now, which saves some memory (4 unsigned characters, instead of 4 floats).
* GPU Texture Limit on sm_20 and sm_21 decreased from 95 to 94, because we need a new texture for the uchar4 attributes. This is no problem for sm_30 or newer.
Part of my GSoC 2014.
This kernel is compiled with AVX2, FMA3, and BMI compiler flags. At the moment only Intel Haswell benefits from this, but future AMD CPUs will have these instructions as well.
Makes rendering on Haswell CPUs a few percent faster, only benchmarked with clang on OS X though.
Part of my GSoC 2014.
This makes the code a bit easier to understand, and might come in handy
if we want to reuse more Embree code.
Differential Revision: https://developer.blender.org/D482
Code by Brecht, with fixes by Lockal, Sergey and myself.
This basically records all volumes steps, which can then later be used multiple
time to take scattering samples, without having to step through the volume
again. From the paper:
"Importance Sampling Techniques for Path Tracing in Participating Media"
This works only on the CPU, due to usage of malloc/free.