- add missing headers from cmake (own omission)
- quiet rna_test.c unused define warnings.
- minor style edits
- spelling corrections and ignore all uppercase words with spell checking script.
* Avoid special code, when Subsurface is enabled.
Ideally we should only use the function, and get rid of the extra duplicate, but this is slower on CUDA.
give a result more similar to the Compatible falloff option. The scale is x2
though to keep the perceived scatter radius roughly the same while changing the
sharpness. Difference with compatible will be mainly on non-flat geometry.
Instead of having ifdef __GNUC__ all over the headers
to use special compiler's hints use a special file where
all things like this are concentrated.
Makes code easier to follow and allows to manage special
attributes in more efficient way.
Thanks Campbell for review!
- replace numbers with defines for allocation increments and default array size.
- move array reallocation into a static function (deduplicate 2x).
also fix own mistake with uninitialized slop-space var in memory printing statistics.
This commit attempts to fix the following error:
intern\guardedalloc\intern\mallocn.c: In function 'rem_memblock':
intern\guardedalloc\intern\mallocn.c:977:48: error: conversion to 'intptr_t' from 'size_t' may change the sign of the result [-Werror=sign-conversion]
From the references I've managed to find, it appears that
the second arg to munmap() should be size_t not intptr_t.
Fortunately though, we don't use this arg anyways atm, so
this should be quite harmless...
* More build fixes, 2 link errors remain. http://www.pasteall.org/45279
Note: Probably those paths should only be added for Windows and Linux, as "OPENIMAGEIO_LIBPATH" already inherit them for Mac OS. Also "OPENIMAGEIO_LIBRARIES" inherits the libs for Linux already. Is that intended or a lack of consistency?
* Fix some link errors on Windows, still missing png, zlib, jpeg and tiff.
I couldn't yet figure out the correct flags to pass on here, and the 2300 lines huge main CMakeLists file doesn't help with it...
except for curves, that's still missing from the OpenColorIO GLSL shader.
The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
shader for converting colors from linear to display space, based on the scene
color management settings.
if engine.support_display_space_shader(scene): # test graphics card support
engine.bind_display_space_shader(scene)
# draw pixels ..
engine.unbind_display_space_shader()
* Clamp theta sky coordinates, to prevent a negative solarElevation.
Note: This means that you cannot get absolute night with the new model, but this is not supported anyway. So when you reach the maximum sunset, use the World Strength to further decrease the light.
* Added a new sky model by Hosek and Wilkie: "An Analytic Model for Full Spectral Sky-Dome Radiance" http://cgg.mff.cuni.cz/projects/SkylightModelling/
Example render:
http://archive.dingto.org/2013/blender/code/new_sky_model.png
Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Sky_Texture
Details:
* User can choose between the older Preetham and the new Hosek / Wilkie model via a dropdown. For older files, backwards compatibility is preserved. When we add a new Sky texture, it defaults to the new model though.
* For the new model, you can specify the ground albedo (see documentation for details).
* Turbidity now has a UI soft range between 1 and 10, higher values (up to 30) are still possible, but can result in weird colors or black.
* Removed the limitation of 1 sky texture per SVM stack. (Patch by Lukas Tönne, thanks!)
Thanks to Brecht for code review and some help!
This is part of my GSoC 2013 project, SVN merge of r59214, r59220, r59251 and r59601.
Notes:
* Made those edits by full checking of py files, so I should have spoted most needed edits, yet it remains quite probable I missed a few ones, we'll fix if/when someone notice it...
* Also made some cleanup "on the road"!
and "Branched Path Tracing", to try to make it more clear that this is not
related to progressive refinement, non-progressive was always a bad name anyway.
* Add a "Total Samples" info at the bottom of the panel.
This makes understanding the Non-Progressive integrator easier, as it displays how many samples are used for the different ray types.
* Rename Squared Samples to Square samples, to indicate that the action is not already done. The new Total Samples info should make this easier to understand now as well. Also added back for Progressive integrator, for consistency.
Screenshot:
http://www.pasteall.org/pic/show.php?id=57980
The reason of this is because PATH_MAX is not guaranteed
to be defined on all platforms and Hurd doesn't define it.
So either we need to support arbitrary long file path or
we need to define own maximum path length.
The rule here would be:
- If it's not big trouble to support arbitrary long paths
(i.e. in ghost by using std::string instead of char*)
then arbitrary long path shall be implemented.
- For other cases to use PATH_MAX please include BLI_fileops.h
which takes care of making sure PATH_MAX is defined.
Additional change: get rid of own changes made yesterday
which were supposed to make storage.c work fine in cases
PATH_MAX is not define, but on the second though it lead
to unneeded complication of the code.
Thanks Campbell for review!
For now assume sizeof(int) == 4 for all supported platforms, could be
changed in the future.
Added an assert to functions which depends on this this, so we'll
easily notice bad things happening.
These are not animatable! Note this is the case of most (all?) render settings, maybe we should go over both Cycles and internal ones, there are still quite a bunch of them that are marked as animatable... :/
- Re-arrange locks, so no actual memory allocation
(which is relatively slow) happens from inside
the lock. operation system will take care of locks
which might be needed there on it's own.
- Use spin lock instead of mutex, since it's just
list operations happens from inside lock, no need
in mutex here.
- Use atomic operations for memory in use and total
used blocks counters.
This makes guarded allocator almost the same speed
as non-guarded one in files from Tube project.
There're still MemHead/MemTail overhead which might
be bad for CPU cache utilization
* OSL rendered Black with Compatible Fallof option, fixed.
Note: OSL uses compatible scattering when "Compatible" or "Bicubic" is selected. I guess compatible will be removed later? If not we need to fix this properly.
New features:
* Bump mapping now works with SSS
* Texture Blur factor for SSS, see the documentation for details:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#Subsurface_Scattering
Work in progress for feedback:
Initial implementation of the "BSSRDF Importance Sampling" paper, which uses
a different importance sampling method. It gives better quality results in
many ways, with the availability of both Cubic and Gaussian falloff functions,
but also tends to be more noisy when using the progressive integrator and does
not give great results with some geometry. It works quite well for the
non-progressive integrator and is often less noisy there.
This code may still change a lot, so unless you're testing it may be best to
stick to the Compatible falloff function.
Skin test render and file that takes advantage of the gaussian falloff:
http://www.pasteall.org/pic/show.php?id=57661http://www.pasteall.org/pic/show.php?id=57662http://www.pasteall.org/blend/23501
- Removed the cycles subdivision and interpolation of hairkeys.
- Removed the parent settings.
- Removed all of the advanced settings and presets.
- This simplifies the UI to a few settings for the primitive type and a shape mode.
* Replaced the Preetham model with the newer Hosek / Wilkie model:
"An Analytic Model for Full Spectral Sky-Dome Radiance" http://cgg.mff.cuni.cz/projects/SkylightModelling/
* We use the sample code data, which comes with the paper, but removed some unnecessary parts, we only need the xyz version.
* New "Albedo" UI paraemeter, to control the ground albedo (between 0 and 1).
* Works with SVM only atm (CPU and CUDA).
Example render:
http://www.pasteall.org/pic/show.php?id=57635
ToDo / Open Questions:
* OSL still uses the old model, will be done later. In the meantime it's useful to compare the two models this way.
* The new model needs a much weaker Strength value (0.01), otherwise it's white. Can this be fixed?
* Code cleanup.
* Added a new panel "Settings" to the object tab.
* Motion blur can now be enabled/disabled on a per object basis, so we can disable motion blur for certain objects.
* Also added some code for the Motion Multiplier, to weaken/strengthen the motion effect per object, but that is still disabled and hidden from the UI.
Now sounds that stopped playing but are still kept in the device can be differentiated from paused sounds with this state.
This should also fix the performance issues mentioned in [#36466] End of SequencerEntrys not set correctly.
Please test if sound pausing, resuming and stopping works fine in the BGE and sequencer, my tests all worked fine, but there might be a use case that needs some fixing.
* Remove code for the unused Wave texture variations.
We have quite some unused code in the texture area, I guess it doesn't harm to clean a bit up here.
We can always get the code back from SVN if we need something.
* GPU kernel can now be compiled without __NON_PROGRESSIVE__ again, was broken after my last commit. Also add a check for have_error(), in case the GPU kernel comes without Non-Progressive, to avoid a crash.
* Don't compile progressive kernel twice on CPU, if __NON_PROGRESSIVE__ would be disabled there.
* Non-Progressive integrator is now available on the GPU (CUDA, sm_20 and above).
Implementation details:
* kernel_path_trace() has been split up into two functions:
kernel_path_trace_non_progressive() and kernel_path_trace_progressive().
* We compile two CUDA kernel entry functions (in kernel.cu) for the two integrators, they are still inside one .cubin file but due to the kernel separation there should be no performance problem. I tested with the BMW file on my Geforce 540M and the render times were the same for 100 samples (1.57 min in my case).
This is part of my GSoC project, SVN merge of r59032 + manual merge of UI changes for this from my branch.
* Code refactor to split the GPU kernel into two, one for each integrator.
This way we can enable Non-Progressive integrator on GPU in trunk without a performance drop.
Thanks to Brecht for some help and review!
for texture system in advance. Patch by Martijn Berger, with some tweaks.
There was about a 10% performance improvement on OS X in my tests with the
images.blend test file. This may be less on other platforms because OS X has
particularly slow mutex locks.
* Render Passes are now available for Subsurface Scattering (Direct, Indirect and Color pass).
This is part of my GSoC project, SVN merge of r58587, r58828 and r58835.
* After some feedback decided to remove this option from the Progressive integrator, it only makes sense for Non-Progressive where we have different values for the sample types.
* Added a node to convert a temperature in Kelvin to an RGB color. This can be used e.g. for lights, to easily find the right color temperature.
= Some common temperatures =
Candle light: 1500 Kelvin
Sunset/Sunrise: 1850 Kelvin
Studio lamps: 3200 Kelvin
Horizon daylight: 5000 Kelvin
Documentation: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Blackbody
Thanks to Philipp Oeser (lichtwerk), who essentially contributed to this with a patch! :)
This is part of my GSoC 2013 project. SVN merge of r57424, r57487, r57507, r57525, r58253 and r58774
* Added a Ray Depth output to the Light Path node, which gives the user access to the current bounce.
This can be used to limit the maximum ray bounce on a per shader basis. Another use case is to restrict light influence with this, to have a lamp only contribute to the direct lighting.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Light_Path
This is part of my GSoC 2013 project. SVN merge of r58091 and r58772 from soc-2013-dingto.
* Code cleanup to avoid duplicated enum code.
* Added a third type for conversion next to Point and Vector: Normal. This is basically the same result as with the Vector type, but normalizes the vector at the end.
Thanks to Brecht for code review!
* Fix some things which came up in code review. Includes some fixes for background lights and changes to variables, to avoid some castings.
Thanks to Brecht for code review! :)
* Avoid check for !LABEL_TRANSPARENT in "kernel_path_non_progressive_lighting", transparency is either handled in the outer loop or in the "kernel_path_indirect" function, but not here.
* Increase the maximum amount of closures per shader from 16 to 64, so more complex closure trees can be rendered.
I measured performance on CPU and GPU (Geforce 540M) and couldn't find a performance impact, but if someone encounters a noticeable impact on his system, please report.
* First step toward Subsurface Scattering render passes (Color, Direct and Indirect).
* Added UI, DNA and RNA for the new Passes on the Blender side.
* Basic Cycles integration.
* Only the SSS Color Pass works so far.
ToDo: Direct and Indirect Pass.
Should "subsurface" be a part of BsdfEval and "path_subsurface" of PathRadiance or is that the wrong way? Should it be integrated more like the AO render pass? Some input from Brecht or Stuart would be nice. :)
* "Auto Detect" now again uses the umber of cores, instead number of cores + 1.
This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.
* Add Presets for Sampling. This comes with a simple Preview and Final preset, but as this is varying a lot depending on the scene, they should just be a starting point. The user can add own presets here.
* Some UI layout changes to match the settings a bit better.
* Add a "Squared Samples" option to the UI, to use squared values for ease of use. This can make it easier from an artist point of view, to weak settings.
With this enabled, all Sample values will be squared. So 10 Samples become 100 Samples.
For the Non-Progressive integrator: 4 AA Samples * 5 Diffuse Samples would become 16 AA Samples * 25 Diffuse = 400 in total.
Patch by Matt Heimlich, with some minor edits by myself. Thanks!
* If Preview Samples are set to 0 (unlimited) it now assumes 65536 instead of INT_MAX.
This doesn't affect regular sampling, you can still enter fixed values of 100k or whatever.
* Fix the weird results with 800-804.3 Kelvin in SVM. This was an offset issue with the lookup table, made the table slightly larger now (from 954 to 956) which gives a small gap between the R/G/B components.
* Use Luminance also for values below 800 Kelvin, for consistency.
* Make it more clear for the user what affects 3D View and Final render.
* Static / Dynamic BVH only affects viewport, BVH Cache only final. (see BlenderSync::get_scene_params)
buffers option, it requires specific tile sizes and if they don't match what
OpenEXR expects file saving can get stuck.
Now I've made support for his optional, with a bl_use_save_buffers property for
RenderEngine, set to False by default.
* Added a Ray Depth output to the Light Path node, which returns the current ray bounce (0, 1, 2, 3...)
* This can be used to use different shaders for direct and indirect lighting and artificial effects.
Examples:
* http://www.pasteall.org/pic/show.php?id=55158 Here we use the output to apply a different shader to the third bounce. As in this example, you can use Math Nodes (Greater Than / Less Than) if you want to use values outside of the 0/1 range.
* http://www.pasteall.org/pic/show.php?id=55159 Here we restrict the maximum bounce on a per shader basis for the left sphere. This way it looks like we would only have 1 max bounce set in the scene "Light paths" panel.
This can be used to e.g. improve performance for objects far from the camera, which do not need full GI.
Technical notes:
* Implemented for both integrators and SVM/OSL.
* This is done by passing state.bounce to the shader_setup_from_* functions.
* Note: We don't pass state.bounce to kernel_shader_evaluate() and therefore shader_setup_from_displacement() method doesn't set the value, this is outside the path trace loop. Maybe a ToDo?
RGB color components gave non-grey results when you might no expect it.
What happens is that some of the color channels are zero in the direct light
pass because their channel is zero in the color pass. The direct light pass is
defined as lighting divided by the color pass, and we can't divide by zero. We
do a division after all samples are added together to ensure that multiplication
in the compositor gives the exact combined pass even with antialiasing, DoF, ..
Found a simple tweak here, instead of setting such channels to zero it will set
it to the average of other non-zero color channels, which makes the results look
like the expected grey.
Issue is caused by missing sse flags for Clang compilers,
this flags only was set for GNU C compilers.
Added if branch for Clang now, which contains the same
flags apart from -mfpmath=sse, This is because Clang was
claiming it's unused argument.
Probably OSX would need some further checks since it's
also using Clang. I've got no idea why it could have
worked for OSX before..
* After some more thinking, solved the remaining ToDos. :)
* Added is_object check to check if we have a valid object.
* If we operate on the world, and try to convert from/to object space, we now assume world space instead, same as OSL.
* Implementation of the node for SVM. This covers all possible transformations: World <> Object <> Camera space.
As far as I can tell, it also works fine with Motion Blur enabled.
ToDo:
* SVM differs from OSL, when the node is used on the world.
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
texture coordinate that should automatically use the default normal or texture
coordinate appropriate for that node, rather than some fixed value specified by
the user.
* On nvidia Kepler GPUs (sm_30 and above), there are now 145 byte images available, instead of 95.
We could extend this to about 200 if needed.
Could not test this, as I don't have a Kepler GPU, so feedback on this would be appreciated.
Thanks to Brecht for review and some fixes. :)
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
* Enable the Non-Progressive integrator on GPU (CUDA) for testing.
In order to compile the CUDA kernel with it, you need at least 6GB of system memory and CUDA Toolkit 5.0 or 5.5.
It should also work with CUDA Toolkit 4.2, but in this case you should have 12GB of RAM.
In case any problems arise, just change line 65 of kernel_types.h to disable Non-Progressive again.
-- #define __NON_PROGRESSIVE__
++ //#define __NON_PROGRESSIVE__
* Replaced the Brute Force version with a nice lookup table, this speeds it up a lot.
Patch by Philipp Oeser (lichtwerk) with some cleanup and changes by myself. Thanks!
ToDo:
* Temperature values between 800 and 804 Kelvin are wrong in SVM, check on this.
* First (brute force) implementation for SVM. This works and delivers the same result as OSL, but it's slow.
* Code inside svm_blackbody.h inspired by a patch by Philipp Oeser (#35698), thanks.
Ideas:
* Use a lookup table to perform the calculations on render/ level.
* Implement it as a RNA property only, and do the calculation like Sun/Sky precompute.
and sm_30 cards, so hopefully it should all work now.
Also includes some warnings fixes related to nvcc compiler arguments, should make
no difference otherwise.
Apparently, it's bad idea to rely on compiler to cast NULL
which is (void*)0 to int -- and in fact if i was a compiler
would also generate an error.
Further, couldn't see why we need to pass NULL or 0 th add_node,
argument value is defautl to 0 already.
* Added a node to convert wavelength (in nanometers, from 380nm to 780nm) to RGB values. This can be useful to match real world colors easier.
* Code cleanup:
** Moved color functions (xyz and hsv) into dedicated utility files.
** Remove svm_lerp(), use interp() instead.
Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Wavelength
Example render:
http://www.pasteall.org/pic/show.php?id=53202
This is part of my GSoC 2013. (revisions 57322, 57326, 57335 and 57367 from soc-2013-dingto).
to be done in cycles itself to keep compatibility for bytecode too.
Also fix broken button to compile OSL from the text editors, this got broken after
recent change to disable editing of library linked nodes.