Summary:
Mainly addressed to solve old TODO with color managed fallback
to CPU mode when displaying render result during rendering.
That fallback was caused by the fact that partial image update
was always acquiring image buffer for composite output and was
only modifying display buffer directly.
This was a big issue for Cycles rendering which renders layers
one by one and wanted to display progress of each individual
layer. This lead to situations when display buffer was based on
what Cycles passes via RenderResult and didn't take layer/pass
from image editor header into account.
Now made it so image buffer which partial update is operating
with always corresponds to what is set in image editor header.
To make Cycles displaying progress of all the layers one by one
made it so image_rect_update switches image editor user to
newly rendering render layer. It happens only once when render
engine starts rendering next render layer, so should not be
annoying for navigation during rendering.
Additional change to render engines was done to make it so
they're able to merge composite output to final result
without marking tile as done. This is done via do_merge_result
argument to end_result() callback. This argument is optional
so should not break script compatibility.
Additional changes:
- Partial display update for Blender Internal now happens from
the same thread as tile rendering. This makes it so display
conversion (which could be pretty heavy actually) is done in
separate threads. Also gives better UI feedback when rendering
easy scene with small tiles.
- Avoid freeing/allocating byte buffer for render result
if it's owned by the image buffer. Only mark it as invalid
for color management.
Saves loads of buffer re-allocations in cases when having
several image editors opened with render result. This change
in conjunction with the rest of the patch gave around
50%-100% speedup of render time when displaying non-combined
pass during rendering on my laptop.
- Partial display buffer update was wrong for buffers with number
of channels different from 4.
- Remove unused window from RenderJob.
- Made image_buffer_rect_update static since it's only used
in single file.
Reviewers: brecht
Reviewed By: brecht
CC: dingto
Differential Revision: http://developer.blender.org/D98
After update to Mac OS X 10.9.1, OpenCL works now on my Intel CPU in the 2013 Macbook Pro (even the entire kernel).
The Intel Iris Pro GPU still segfaults here though, even when all flags are disabled (building "clay like" kernel only).
Maybe we need the -no-missing-prototypes for AMD hardware still, but I couldn't find a way to distuinguish here.
Summary:
Version of those libraries might be useful to know.
- OIIO and OCIO is exposed via bpy.app.oiio and bpy.app.ocio.
There're "supported", "version" and "version_string" defined
in those modules.
- OSL is available as _cycles.osl_version and _cycles.osl_version_string.
Reviewers: campbellbarton
Reviewed By: campbellbarton
CC: dingto
Differential Revision: http://developer.blender.org/D79
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.
* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code
This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.
Reviewers: brecht
Differential Revision: http://developer.blender.org/D36
It's a simple estimate, not very precise but that isn't really possible always.
For progressive render it will become more accurate the longer you render.
Reviewed By: brecht
Differential Revision: http://developer.blender.org/D67
This code can't actually be enabled for building and is incomplete, but it's
here because we know we want to support this at some point and there's not much
reason to have it in a separate branch if a simple #ifdef can disable it.
This code can't actually be enabled for building and is incomplete, but it's
here because we know we want to support this at some point and there's not much
reason to have it in a separate branch if a simple #ifdef can disable it.
Not the most memory efficient way to store these things but it's simple and
implementing it better requires some work to natively support subd grids as
a primitive in some way.
It was never fully implemented and will be replaced by OpenSubdiv. Only linear
subdivision remains now. Also includes some refactoring in the split/dice code,
adding a SubdParams struct to pass around parameters more easily.
This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.
WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.
Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.
Ref T37477.
* Add a "Normal" Input to the Fresnel node.
* Fix for the Fresnel GLSL code (normalize the Incoming vector).
Patch #37384 by Philipp Oeser (lichtwerk) , thanks!
* Change the default Light Path settings.
* Diffuse/Glossy bounces are now set to 4, to give a bit faster renders in default scenes. More bounces are often not needed (especially in animation).
* Transmission bounces have been increased to 12, to not run into problems with dark glass too quickly.
* Max/Min bounces are now 12/3.
to standard nodes where the Blender socket names can differ from associated Cycles names and may require additional indices to make them unique. Script node sockets are already unique and exact due to
being generated from the script function parameters.
this range due to sampling noise.
Side note: I looked into the mist pass because it was apparently not calculating
mist correctly on characters with transparent hair. Turns out this is just
sampling noise that goes away with more samples.
This noise is because the ray will randomly go to the next transparency layer or
get reflected, the path tracing integrator will not branch the path and only pick
one of the two directions each time.
Branched path tracing however will shade all transparent layers for each AA
sample, which means this source of noise is eliminated.
* Remove the compatible falloff SSS implementation. We shouldn't support two implementations in the long term, and 2.7x is a good release number do break some compatibility as well.
* Version patch added, so Files with Compatible falloff will automatically use Cubic now.
It was already mentioned in the manual, that Compatible is deprecated.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#BSSRDF
* Remove support for CUDA Toolkit 4.x, only Toolkit 5.0 and above are supported now.
* Remove support for sm_1x cards (< Fermi) for good. We didn't officially support those cards for a few releases already, now remove some special code that was still there.
* 32 bit GCC builds now have the SSE BVH optimizations turned off, but still
compile with SSE flags for better performance.
* White color when rendering on Windows seems to have been unrelated to SSE,
rather it was a graphics driver not supporting half float textures, added a
check for that now.
There is some sort of problem with the SSE2 code path, but I couldn't find
the cause, maybe a compiler bug due to the large amount of inlining? For
now I've disabled SSE2 optimizatons in 32 bit GCC builds.
* Keep the Mapping node default type as Point for now, instead of Texture. The
latter is a better default, but this is breaking API compatibility and it's
too close to release to expect addons to be fixed in time.
* Vector Transform and Mapping nodes had properties with name "type" to set the
type of vector, but this conflicts with the node type property, so renamed to
vector_type now.
use arrays instead of textures for general storage on this card (image textures
are still stored as texture). Textures were found to be faster on older cards,
but the limits on 1D texture size have not increased along with the memory size,
which meant that the full 6 GB could not be used.
The performance actually seems to be slightly better with arrays in some tests
on Titan. For older cards there seems to be a bit of a mix, some are better and
others not. We may change those to use arrays too, but more testing is needed,
only Titan and Tesla K20 (sm_35) is changed for now.
The fact that arrays are faster is a bit surprising, as others found textures
to be faster on Kepler. However even if they were, the memory limitation is
more important to solve anyway.
https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum
scale and rotation in mapping node, there would be shearing, and the only way
to avoid that was to add 2 mapping nodes. This is because to transform the
texture, the inverse transform needs to be done on the texture coordinate
Now the mapping node has Texture/Point/Vector/Normal types to transform the
vector for a particular purpose. Point is the existing behavior, Texture is
the new default that behaves more like you might expect.
normal and point parameter types of OSL shaders are creating SOCK_VECTOR sockets in the script node. When these sockets are in turn used to define the fixed input values for these parameters they get
converted as OSL vector always, losing the distinction of vector/normal/point. To prevent OSL rejecting the value due to type mismatch, explicitly define the parameter defaults in the OSL script node
compiler function as vector, normal and point (unused types will simply be ignored).
A new hair bsdf node, with two closure options, is added. These closures allow the generation of the reflective and transmission components of hair. The node allows control of the highlight colour, roughness and angular shift.
Llimitations include:
-No glint or fresnel adjustments.
-The 'offset' is un-used when triangle primitives are used.
- add missing headers from cmake (own omission)
- quiet rna_test.c unused define warnings.
- minor style edits
- spelling corrections and ignore all uppercase words with spell checking script.
* Avoid special code, when Subsurface is enabled.
Ideally we should only use the function, and get rid of the extra duplicate, but this is slower on CUDA.
give a result more similar to the Compatible falloff option. The scale is x2
though to keep the perceived scatter radius roughly the same while changing the
sharpness. Difference with compatible will be mainly on non-flat geometry.
Instead of having ifdef __GNUC__ all over the headers
to use special compiler's hints use a special file where
all things like this are concentrated.
Makes code easier to follow and allows to manage special
attributes in more efficient way.
Thanks Campbell for review!
* More build fixes, 2 link errors remain. http://www.pasteall.org/45279
Note: Probably those paths should only be added for Windows and Linux, as "OPENIMAGEIO_LIBPATH" already inherit them for Mac OS. Also "OPENIMAGEIO_LIBRARIES" inherits the libs for Linux already. Is that intended or a lack of consistency?
* Fix some link errors on Windows, still missing png, zlib, jpeg and tiff.
I couldn't yet figure out the correct flags to pass on here, and the 2300 lines huge main CMakeLists file doesn't help with it...
except for curves, that's still missing from the OpenColorIO GLSL shader.
The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
* Clamp theta sky coordinates, to prevent a negative solarElevation.
Note: This means that you cannot get absolute night with the new model, but this is not supported anyway. So when you reach the maximum sunset, use the World Strength to further decrease the light.
* Added a new sky model by Hosek and Wilkie: "An Analytic Model for Full Spectral Sky-Dome Radiance" http://cgg.mff.cuni.cz/projects/SkylightModelling/
Example render:
http://archive.dingto.org/2013/blender/code/new_sky_model.png
Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Sky_Texture
Details:
* User can choose between the older Preetham and the new Hosek / Wilkie model via a dropdown. For older files, backwards compatibility is preserved. When we add a new Sky texture, it defaults to the new model though.
* For the new model, you can specify the ground albedo (see documentation for details).
* Turbidity now has a UI soft range between 1 and 10, higher values (up to 30) are still possible, but can result in weird colors or black.
* Removed the limitation of 1 sky texture per SVM stack. (Patch by Lukas Tönne, thanks!)
Thanks to Brecht for code review and some help!
This is part of my GSoC 2013 project, SVN merge of r59214, r59220, r59251 and r59601.
Notes:
* Made those edits by full checking of py files, so I should have spoted most needed edits, yet it remains quite probable I missed a few ones, we'll fix if/when someone notice it...
* Also made some cleanup "on the road"!
and "Branched Path Tracing", to try to make it more clear that this is not
related to progressive refinement, non-progressive was always a bad name anyway.
* Add a "Total Samples" info at the bottom of the panel.
This makes understanding the Non-Progressive integrator easier, as it displays how many samples are used for the different ray types.
* Rename Squared Samples to Square samples, to indicate that the action is not already done. The new Total Samples info should make this easier to understand now as well. Also added back for Progressive integrator, for consistency.
Screenshot:
http://www.pasteall.org/pic/show.php?id=57980
These are not animatable! Note this is the case of most (all?) render settings, maybe we should go over both Cycles and internal ones, there are still quite a bunch of them that are marked as animatable... :/
* OSL rendered Black with Compatible Fallof option, fixed.
Note: OSL uses compatible scattering when "Compatible" or "Bicubic" is selected. I guess compatible will be removed later? If not we need to fix this properly.
New features:
* Bump mapping now works with SSS
* Texture Blur factor for SSS, see the documentation for details:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#Subsurface_Scattering
Work in progress for feedback:
Initial implementation of the "BSSRDF Importance Sampling" paper, which uses
a different importance sampling method. It gives better quality results in
many ways, with the availability of both Cubic and Gaussian falloff functions,
but also tends to be more noisy when using the progressive integrator and does
not give great results with some geometry. It works quite well for the
non-progressive integrator and is often less noisy there.
This code may still change a lot, so unless you're testing it may be best to
stick to the Compatible falloff function.
Skin test render and file that takes advantage of the gaussian falloff:
http://www.pasteall.org/pic/show.php?id=57661http://www.pasteall.org/pic/show.php?id=57662http://www.pasteall.org/blend/23501
- Removed the cycles subdivision and interpolation of hairkeys.
- Removed the parent settings.
- Removed all of the advanced settings and presets.
- This simplifies the UI to a few settings for the primitive type and a shape mode.
* Replaced the Preetham model with the newer Hosek / Wilkie model:
"An Analytic Model for Full Spectral Sky-Dome Radiance" http://cgg.mff.cuni.cz/projects/SkylightModelling/
* We use the sample code data, which comes with the paper, but removed some unnecessary parts, we only need the xyz version.
* New "Albedo" UI paraemeter, to control the ground albedo (between 0 and 1).
* Works with SVM only atm (CPU and CUDA).
Example render:
http://www.pasteall.org/pic/show.php?id=57635
ToDo / Open Questions:
* OSL still uses the old model, will be done later. In the meantime it's useful to compare the two models this way.
* The new model needs a much weaker Strength value (0.01), otherwise it's white. Can this be fixed?
* Code cleanup.
* Added a new panel "Settings" to the object tab.
* Motion blur can now be enabled/disabled on a per object basis, so we can disable motion blur for certain objects.
* Also added some code for the Motion Multiplier, to weaken/strengthen the motion effect per object, but that is still disabled and hidden from the UI.
* Remove code for the unused Wave texture variations.
We have quite some unused code in the texture area, I guess it doesn't harm to clean a bit up here.
We can always get the code back from SVN if we need something.
* GPU kernel can now be compiled without __NON_PROGRESSIVE__ again, was broken after my last commit. Also add a check for have_error(), in case the GPU kernel comes without Non-Progressive, to avoid a crash.
* Don't compile progressive kernel twice on CPU, if __NON_PROGRESSIVE__ would be disabled there.
* Non-Progressive integrator is now available on the GPU (CUDA, sm_20 and above).
Implementation details:
* kernel_path_trace() has been split up into two functions:
kernel_path_trace_non_progressive() and kernel_path_trace_progressive().
* We compile two CUDA kernel entry functions (in kernel.cu) for the two integrators, they are still inside one .cubin file but due to the kernel separation there should be no performance problem. I tested with the BMW file on my Geforce 540M and the render times were the same for 100 samples (1.57 min in my case).
This is part of my GSoC project, SVN merge of r59032 + manual merge of UI changes for this from my branch.
* Code refactor to split the GPU kernel into two, one for each integrator.
This way we can enable Non-Progressive integrator on GPU in trunk without a performance drop.
Thanks to Brecht for some help and review!
for texture system in advance. Patch by Martijn Berger, with some tweaks.
There was about a 10% performance improvement on OS X in my tests with the
images.blend test file. This may be less on other platforms because OS X has
particularly slow mutex locks.
* Render Passes are now available for Subsurface Scattering (Direct, Indirect and Color pass).
This is part of my GSoC project, SVN merge of r58587, r58828 and r58835.
* After some feedback decided to remove this option from the Progressive integrator, it only makes sense for Non-Progressive where we have different values for the sample types.
* Added a node to convert a temperature in Kelvin to an RGB color. This can be used e.g. for lights, to easily find the right color temperature.
= Some common temperatures =
Candle light: 1500 Kelvin
Sunset/Sunrise: 1850 Kelvin
Studio lamps: 3200 Kelvin
Horizon daylight: 5000 Kelvin
Documentation: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Blackbody
Thanks to Philipp Oeser (lichtwerk), who essentially contributed to this with a patch! :)
This is part of my GSoC 2013 project. SVN merge of r57424, r57487, r57507, r57525, r58253 and r58774
* Added a Ray Depth output to the Light Path node, which gives the user access to the current bounce.
This can be used to limit the maximum ray bounce on a per shader basis. Another use case is to restrict light influence with this, to have a lamp only contribute to the direct lighting.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Light_Path
This is part of my GSoC 2013 project. SVN merge of r58091 and r58772 from soc-2013-dingto.
* Code cleanup to avoid duplicated enum code.
* Added a third type for conversion next to Point and Vector: Normal. This is basically the same result as with the Vector type, but normalizes the vector at the end.
Thanks to Brecht for code review!
* Fix some things which came up in code review. Includes some fixes for background lights and changes to variables, to avoid some castings.
Thanks to Brecht for code review! :)
* Avoid check for !LABEL_TRANSPARENT in "kernel_path_non_progressive_lighting", transparency is either handled in the outer loop or in the "kernel_path_indirect" function, but not here.
* Increase the maximum amount of closures per shader from 16 to 64, so more complex closure trees can be rendered.
I measured performance on CPU and GPU (Geforce 540M) and couldn't find a performance impact, but if someone encounters a noticeable impact on his system, please report.
* First step toward Subsurface Scattering render passes (Color, Direct and Indirect).
* Added UI, DNA and RNA for the new Passes on the Blender side.
* Basic Cycles integration.
* Only the SSS Color Pass works so far.
ToDo: Direct and Indirect Pass.
Should "subsurface" be a part of BsdfEval and "path_subsurface" of PathRadiance or is that the wrong way? Should it be integrated more like the AO render pass? Some input from Brecht or Stuart would be nice. :)
* "Auto Detect" now again uses the umber of cores, instead number of cores + 1.
This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.
* Add Presets for Sampling. This comes with a simple Preview and Final preset, but as this is varying a lot depending on the scene, they should just be a starting point. The user can add own presets here.
* Some UI layout changes to match the settings a bit better.
* Add a "Squared Samples" option to the UI, to use squared values for ease of use. This can make it easier from an artist point of view, to weak settings.
With this enabled, all Sample values will be squared. So 10 Samples become 100 Samples.
For the Non-Progressive integrator: 4 AA Samples * 5 Diffuse Samples would become 16 AA Samples * 25 Diffuse = 400 in total.
Patch by Matt Heimlich, with some minor edits by myself. Thanks!
* If Preview Samples are set to 0 (unlimited) it now assumes 65536 instead of INT_MAX.
This doesn't affect regular sampling, you can still enter fixed values of 100k or whatever.
* Fix the weird results with 800-804.3 Kelvin in SVM. This was an offset issue with the lookup table, made the table slightly larger now (from 954 to 956) which gives a small gap between the R/G/B components.
* Use Luminance also for values below 800 Kelvin, for consistency.
* Make it more clear for the user what affects 3D View and Final render.
* Static / Dynamic BVH only affects viewport, BVH Cache only final. (see BlenderSync::get_scene_params)
buffers option, it requires specific tile sizes and if they don't match what
OpenEXR expects file saving can get stuck.
Now I've made support for his optional, with a bl_use_save_buffers property for
RenderEngine, set to False by default.
* Added a Ray Depth output to the Light Path node, which returns the current ray bounce (0, 1, 2, 3...)
* This can be used to use different shaders for direct and indirect lighting and artificial effects.
Examples:
* http://www.pasteall.org/pic/show.php?id=55158 Here we use the output to apply a different shader to the third bounce. As in this example, you can use Math Nodes (Greater Than / Less Than) if you want to use values outside of the 0/1 range.
* http://www.pasteall.org/pic/show.php?id=55159 Here we restrict the maximum bounce on a per shader basis for the left sphere. This way it looks like we would only have 1 max bounce set in the scene "Light paths" panel.
This can be used to e.g. improve performance for objects far from the camera, which do not need full GI.
Technical notes:
* Implemented for both integrators and SVM/OSL.
* This is done by passing state.bounce to the shader_setup_from_* functions.
* Note: We don't pass state.bounce to kernel_shader_evaluate() and therefore shader_setup_from_displacement() method doesn't set the value, this is outside the path trace loop. Maybe a ToDo?
RGB color components gave non-grey results when you might no expect it.
What happens is that some of the color channels are zero in the direct light
pass because their channel is zero in the color pass. The direct light pass is
defined as lighting divided by the color pass, and we can't divide by zero. We
do a division after all samples are added together to ensure that multiplication
in the compositor gives the exact combined pass even with antialiasing, DoF, ..
Found a simple tweak here, instead of setting such channels to zero it will set
it to the average of other non-zero color channels, which makes the results look
like the expected grey.
Issue is caused by missing sse flags for Clang compilers,
this flags only was set for GNU C compilers.
Added if branch for Clang now, which contains the same
flags apart from -mfpmath=sse, This is because Clang was
claiming it's unused argument.
Probably OSX would need some further checks since it's
also using Clang. I've got no idea why it could have
worked for OSX before..
* After some more thinking, solved the remaining ToDos. :)
* Added is_object check to check if we have a valid object.
* If we operate on the world, and try to convert from/to object space, we now assume world space instead, same as OSL.
* Implementation of the node for SVM. This covers all possible transformations: World <> Object <> Camera space.
As far as I can tell, it also works fine with Motion Blur enabled.
ToDo:
* SVM differs from OSL, when the node is used on the world.
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
texture coordinate that should automatically use the default normal or texture
coordinate appropriate for that node, rather than some fixed value specified by
the user.