The issue was noticed with gcc-4.7 (used by the release build environment)
which didn't generate optimal enough code for BVH references swap. Seems it
looked up for the assign operator for each of the reference structure members
even though nothing special was required for assignment.
Forcing compiler to use simple memory copy gives speedup of like 2-3 times.
The issue doesn't happen with OSX's clang and new gcc-4.9, but since we're
gonna to stick to gcc-4.7 for official releases for quite some time still it's
nice to have performance issues resolved for all the compilers.
Didn't put the code into #ifdef so if in the future some issues appears with
alignment or assignment which need to happen as an operator we notice this
earlier.
* Alpha Property was removed (Fix T42690)
* Some tweaks to make the panel look better again.
* Use abreviated form "Multiple Importance" everywhere, for consistency.
So now cases when object has both hair motion blur and deformation motion blur
vector pass is all correct.
We could get rid of the flag in the future, still need to look deeper into all
the areas trying to find a more clear solution.
Issue was caused by mismatch in pre/post transform matrix spaces for mesh and
curve vectors. This happened because of current way how static transform apply
works: it only stores post/pre in the world space if there's triangle motion
exists. This lead to situation when there's no triangle motion happening but
was hair motion happening.
After long time of trying to solve it in a nice way, ended up solving it in
a bit slow way -- pre/post transform is still storing in the same spaces as
they used to be stored and just convert hair pre/post position to a world
space in the kernel.
This is because currently it's not so clear how to deal with cases when curve
and mesh motion needs different space of pre/post transform (which happens in
cases when only one of the motions exists).
Would think of some magic, and meanwhile artists could be happy with proper
render results.
It appears it's not really needed for convenient debugging when
using proper flags passed to the compiler. Basically, it is -g3
and set breakpoint to a function as if it's not in the namespace.
Not as if a code was any wrong, just it's possible to have more
clear solution for the issue i've tried to solve in the past.
Create unique flag for output shaders with displacement data and use it
to calculate transformed normal. Implementation suggested by Brecht Van
Lommel.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D890
The idea is to avoid memory allocation when only one segment step is to be allocated.
This gives some speedup which is difficult to measure on this trashcan from hell, but
it's about from 7% to 10% in the extreme case with single volume filling the whole of
the viewport. This seems to depends on the phase of the bug-o-meter in the studio.
On the linux boxes it's not that spectacular speedup, it's about 2% on my laptop and
about 3% on the studio desktop. This is likely because of the awesomeness of jemalloc.
Add compile-time check for particular glibc version which fixed the issue.
This makes it so own-compiled blender is the fastest in the world, and the
only issue remains what should we do for release builds.
After some discussion with Campbell we decided to keep it as is for now
because slowdown is not that much noticeable. We'll disable this workaround
for release builds when all the majority of the distros will switch to the
new version of glibc.
That code was mainly needed for the transition period, now we've
got all platforms updated to new OSL.
Plus there are some crucial fixes baking in the current upstream
sources which we'll need to have for the next Blender release.
Even tho it's not 100% clear when we'll switch to OSL-1.6 we'd better
start preparing earlier for this, so we don't spend time on this later.
Plus this code helps troubleshooting some OSL issues, which requires
testing with latest versions of OSL.
The issue was caused by GLEW MX enabled in SCons by default so
basically previous commit already fixed the crash. But we need
to be safe here.
For now the fix is simple and not that clean, just check if
there's an OpenGL context available and if not we don't do any
GLSL magic.
This is to be cleaned up after some discussion with the viewport
project guys.
This mainly happens when over-saturating already saturated color.
After some discussion with Campbell and loads of tests we decided
to clamp the result RGB color. As an alternative we might want to
clamp corrected HSV values instead, but that would lead to some
larger changes in the render results.
TODO: The same is to be done for compositor nodes.
This is basically just a wrapper class, which maps the generic call from the OSL spec to our closures.
Example usage:
shader microfacet_osl(
color Color = color(0.8),
int Distribution = 0,
normal Normal = N,
vector Tangent = normalize(dPdu),
float RoughnessU = 0.0,
float RoughnessV = 0.0,
float IOR = 1.4,
int Refract = 0,
output closure color BSDF = 0)
{
if (Distribution == 0)
BSDF = Color * microfacet("ggx", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
else
BSDF = Color * microfacet("beckmann", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
}
As it appears we can't really use mitchell filter together with the
current filter importance sampling,
This reverts commit 742911314322e5dae3a07469d0ca53b61427f978.
It's the same filter which is used by default by Blender Internal renderer
and it gives crispier edges than gaussian filter.
Default filter for Cycles is unchanged because it's unclear if new filter
gives more noise or not. After some further real production tests we can
consider making Mitchell filter default for Cycles as well.
This way it's easier to extend bitfields and see when we start running
out of free bits.
Plus added brief description of what SD_VOLUME_CUBIC flag means.
It is per-material setting which could be found under the Volume settings
in the material and world context buttons.
There could still be some code-wise improvements, like using variable-size
macro for interp3d instead of having interp3d_ex to which you can pass the
interpolation method.
This is the first step towards supporting cubic interpolation for voxel
data (such as smoke and fire). It is not epxosed to the interface at all
yet, this is to be done soon after this change.
This is so-called GPU limitation boundary hit, told compiler to NOT include
volume bound function, otherwise some real weird things used to happen.
We actually might want to do the same for CPU, inlining everything is not
the way to get fastest code.
Quite annoying, the same thing we do from the blender side, But as a positive
side we can get rid of some utf8/utf16 conversions.
Hopefully it all work fine now, at leats works on mu russki windoze laptop.
Also reduce number of branching and multiplications a bit by inlining the branches.
This gives an unmeasurable speedup, which is in case of BMW is about 2% here.
Not as if it gives noticeable changes render-time, but it's just weird to
convert float4 to float 3 to just access individual x/y/z components.
Plus some compilers might be more stupid than GCC and don't optimize this
out well.
After discussion with cambo here we decided it's better to choose arbitrary side of the box
(in this case it's X-axis) and use image from it. That's better than doing a blackness.
P.S. This is literally a corner case anyway.
Ray actually should have infinite length, so we can detect camera in a volume
which is bigger that the far clipping of the camera.
This might also give some speedup (wouldn't expect much tho) because we don't
need to re-calculate ray direction and length after every bounce now.
basically we skip all non-volume objects now in the volume stack function.
Depending on the show it might give some percent of speedup.
Most of the speedup would be gained in the scenes when having SSS object
intersecting the volume and taking a reasonable amount of frame space.
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.
Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.
Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.
Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
* On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba407
* On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.
Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.
From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D813
Was hooked up last year for testing purposes, as we already had some code for it, but the closure itself is not really good nor really useful, so let's remove it.
This way there's much less cross-references between objects and meshes
device update functions.
The only thing remained s the object bounds calculation which is needed
by bvh update. This could also be decoupled, but it's not that crucial
yet because its's how it used to be for ages now.
This adds an AABB collision check for objects with volumes and if there's a
collision detected then the object will have SD_OBJECT_INTERSECTS_VOLUME flag.
This solves a speed regression introduced by the fix for T39823 by skipping
volume stack update in cases no volumes intersects the current SSS object.
This means it's no longer needed to enable experimental feature set in order to
have proper camera in volume support. And this also means if there's something
wrong going on, or if there's speed regression for cases when camera is obviously
not in the volume -- this issues are to be reported and handled in the regular
matter.
Happy blending!
The idea is to only count intersections with objects which has volumetric shader
and ignore all other objects.
This is probably as fast as we can go without involving some forth level magic.
Now we do much better preliminary check for panoramic camera is inside the
volume object boundings.
Also we're now cacheing the has_volume in the mesh, which makes it unneeded
iterations for each object's shaders.
Should be no functional changes, just faster sync and panoramic-in-volume
rendering.
It works around strange shading bug when building with MSVC.
If such weirdeness continues, we perhaps would need to use
proper inline flags all the time.
Anyway, lets see how things will behave now.
Instead of having a label which basically duplicated the information
about experimental feature set being used (which had a bug because
it claimed experimental GPU kernel is used even if compute device is
CPU btw) now we've got an enum item icon.
So once you switched to experimental feature set you'll see an
exclamation mark icon in the enum, so you know something might be
unstable or slow.
In order to compile the new kernel you need to specify sm_52 in SCons / CMake, and use CUDA Toolkit 6.5.19, from here: https://developer.nvidia.com/cuda-downloads-geforce-gtx9xx
Note: sm_52 is not enabled per default yet, so it won't be bundled with the Buildbot builds. That will be addressed later.
Basically the title says it all, volume stack initialization now is aware that
camera might be inside of the volume. This gives quite noticeable render time
regressions in cases camera is in the volume (didn't measure them yet) because
this requires quite a few of ray-casting per camera ray in order to check which
objects we're inside. Not quite sure if this might be optimized.
But the good thing is that we can do quite a good job on detecting whether
camera is outside of any of the volumes and in this case there should be no
time penalty at all (apart from some extra checks during the sync state).
For now we're only doing rather simple AABB checks between the viewplane and
volume objects. This could give some false-positives, but this should be good
starting point.
Need to mention panoramic cameras here, for them it's only check for whether
there are volumes in the scene, which would lead to speed regressions even if
the camera is outside of the volumes. Would need to figure out proper check
for such cameras.
There are still quite a few of TODOs in the code, but the patch is good enough
to start playing around with it checking whether there are some obvious mistakes
somewhere.
Currently the feature is only available in the Experimental feature sey, need
to solve some of the TODOs and look into making things faster before considering
the feature is ready for the official feature set. This would still likely
happen in current release cycle.
Reviewers: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D794
Basically the title says it all, we need to update volume stack when doing ray
scatter for SSS. This leads to speed regressions in cases scene does have both
volume and SSS (performance in case there's no SSS or no volume should be the
same).
We might try optimizing kernel_path_subsurface_update_volume_stack() a bit by
either recording all intersections or using some more appropriate visibility
flags.
Reviewers: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D795
This is rather useful to see how good optimization went and so.
Currently uses quite simple notation: shader nodes are nodes on the
graph, connects between graph nodes are named by the sockets names,
so i.e. connection between BSDF and Mix would be named bsdf:closure1.
Could be improved in the feature to draw fancier graph, but it's good
enough already.
Use in the following way:
- To create graphix file call graph->dump_graph("graph.dot")
- To visualize the grapf call: dot -Tpng graph.dot -o graph.png
This commit makes it possible to use Glog library for the debug logging.
For now only possible when using CMake and in order to use the logging
the WITH_CYCLES_LOGGING configuration variable is to be enabled.
When this option is not enabled or when using Scons there's no difference
in Cycles behavior at all, when using logging and no output to the console
impact is gonna to be minimal.
This is done in order to make it possible to have debug logging persistent
in code (without need to add it when troubleshooting some bug and removing
it afterwards).
For now actual logging is not placed yet, only all the functions needed for
the logging are written and so.
tri_shader does no longer need to a float.
Reviewers: dingto, sergey
Reviewed By: dingto, sergey
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D789
Basically the same as AC2c58e96685e8, but for Mix RGB Shaders, in case we use the Mix type. This way the node can be used as texture switch for example, setting the Factor to 0.0 or 1.0, without wasting extra memory / render time.
This adds a fresnel conductive OSL preset to the Text Editor. Based on a patch by Lukas Stockner.
Differential revision: https://developer.blender.org/D145
See the differential for details.
* Fix caustic properties, was not updated.
* Remove wrong items, leftovers from panel splitting.
* Add missing items. Even if the bundled presets do not set those, a user expects that all properties inside the panel are taken into account, when adding a new preset.
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.
Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
Before this Cycles used to try using the cache even so it knew for the
fact that reading it from the disk failed. This change doesn't make it
more stable if someone will try to trick Cycles and give malformed data
but it solves general cases when Blender crashed during the cache write
and will preserve rendering from crashing when trying to use that partial
cache.
I don't see a reason not to do this, and this also fixes update problems when 3D View rendering is running (no volume shader), and then a volume shader gets added.
There were several issues involved into triangle ribbons hair:
- Even for the viewport rendering the blender scene camera was
used for orientation. This made hair triangles oriented to
the scene camera, not to the viewport camera.
- Triangle orientation was actually supposing the camera is
perspective. Triangles weren't oriented properly for the
orthographic camera resulting in different hair width across
it's length.
This issues are solved now, but there are some related TODOs:
- Rotating viewport doesn't re-orient the triangles, so after
viewport navigation hair might not look correct. However,
with this fix toggling viewport render (to force hair sync)
makes viewport render correct.
This isn't so much trivial fix, would require making BVH
aware of the dynamic triangle orientation, so they get
properly oriented without full hair re-sync.
- Panorama camera behavior didn't change but looks like it
should, however not really sure atm what's the right thing
to do here.
The issue was caused by the changed defaults from the Cycles side.
Because of those properties being saved as an IDProp and not being
saved to the file, every change to the defaults would ruin someone's
day updating the values.
Added a bpy.app.handler.version_update which is run after the regular
do_versions() are done and could be sued by the scripts to apply
versioning code on their settings.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D761
Issue was caused by the precision issues which made sdivm by 1 under
it's actual value. We can try to do some eps magic, but from the tests
on laptop and desktop doing integer division is not slower than using
floats here.
Thanks for Aldo Zang for the help with the fix for the panorama/fisheye
depth of field calculation and the overall math.
Reviewed By: sergey, dingto
Subscribers: juicyfruit, gregzaal, #cycles, dingto, matray
Differential Revision: https://developer.blender.org/D753
Now we build 2 .cubins per architecture (e.g. kernel_sm_21.cubin, kernel_experimental_sm_21.cubin).
The experimental kernel can be used by switching to the Experimental Feature Set: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Experimental_Features
This enables Subsurface Scattering and Correlated Multi Jitter Sampling on GPU, while keeping the stability and performance of the regular kernel.
Differential Revision: https://developer.blender.org/D762
Patch by Sergey and myself.
Developer / Builder Note:
CUDA Toolkit 6.5 is highly recommended for this, also note that building the experimental kernel requires a lot of system memory (~7-8GB).
It can be helpful in some cases and it works properly, so no need to hide it behind the experimental flag anymore. It's only enabled for the CPU though.
Limitations:
* Smoke/Fire rendering is *not* supported on GPU yet, that is also documented here: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Materials/Volume
* Decoupled Ray Marching is also not supported yet, so no Equi-Angular and MIS sampling yet.
Note for Builders and Developers:
* Make sure to use the CUDA Toolkit 6.5 from now on. 6.0 might still work, but can cause slower renders.
This was originally caused by a6ae12a where i didn't foresee unclear distinguishing
between empty and non-synced meshes will give issues for the viewport. They're the
same for final rendering, but for viewport we need to be accurate here.
That was only needed in the beginning, when we did not had support for tangents. It's time to clean some of the defines up, it's getting a bit too much.
* __VOLUME__ is basic volume support with Emission and Absorption.
* __VOLUME_SCATTER__ enables volume Scattering support.
* __VOLUME_DECOUPLED__ enables Decoupled Ray Marching.
This problem was introduced in 983cbafd1877f8dbaae60b064a14e27b5b640f18
Basically the issue is that we were not getting a unique index in the
baking routine for the RNG (random number generator).
Reviewers: sergey
Differential Revision: https://developer.blender.org/D749
Operators' poll func might be called from anywhere in Blender, so they should
not make any assumption about available context. material, lamp and world
are specific to context from Properties space...
It now uses the tile size to split the job. For CPU this may add
overhead, but for GPU this is highly needed.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D690
* Don't compute expf() for every step, instead sum the intermediate values and calculate it every N (8 for now) steps. This helps a few percent (~5% on a cube with wave texture) in my tests here.