The idea is to avoid memory allocation when only one segment step is to be allocated.
This gives some speedup which is difficult to measure on this trashcan from hell, but
it's about from 7% to 10% in the extreme case with single volume filling the whole of
the viewport. This seems to depends on the phase of the bug-o-meter in the studio.
On the linux boxes it's not that spectacular speedup, it's about 2% on my laptop and
about 3% on the studio desktop. This is likely because of the awesomeness of jemalloc.
Add compile-time check for particular glibc version which fixed the issue.
This makes it so own-compiled blender is the fastest in the world, and the
only issue remains what should we do for release builds.
After some discussion with Campbell we decided to keep it as is for now
because slowdown is not that much noticeable. We'll disable this workaround
for release builds when all the majority of the distros will switch to the
new version of glibc.
That code was mainly needed for the transition period, now we've
got all platforms updated to new OSL.
Plus there are some crucial fixes baking in the current upstream
sources which we'll need to have for the next Blender release.
Even tho it's not 100% clear when we'll switch to OSL-1.6 we'd better
start preparing earlier for this, so we don't spend time on this later.
Plus this code helps troubleshooting some OSL issues, which requires
testing with latest versions of OSL.
Ghost depends on glew-mx, so glew-mx should be passed to linker after the ghost.
We're also using spaces for indentation in python, including scons rules.
The issue was caused by GLEW MX enabled in SCons by default so
basically previous commit already fixed the crash. But we need
to be safe here.
For now the fix is simple and not that clean, just check if
there's an OpenGL context available and if not we don't do any
GLSL magic.
This is to be cleaned up after some discussion with the viewport
project guys.
This mainly happens when over-saturating already saturated color.
After some discussion with Campbell and loads of tests we decided
to clamp the result RGB color. As an alternative we might want to
clamp corrected HSV values instead, but that would lead to some
larger changes in the render results.
TODO: The same is to be done for compositor nodes.
This is basically just a wrapper class, which maps the generic call from the OSL spec to our closures.
Example usage:
shader microfacet_osl(
color Color = color(0.8),
int Distribution = 0,
normal Normal = N,
vector Tangent = normalize(dPdu),
float RoughnessU = 0.0,
float RoughnessV = 0.0,
float IOR = 1.4,
int Refract = 0,
output closure color BSDF = 0)
{
if (Distribution == 0)
BSDF = Color * microfacet("ggx", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
else
BSDF = Color * microfacet("beckmann", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
}
The ones in extern/glew-es have been changed to NOTE instead of XXX
GHOST_ContextEGL.cpp: It really does seem that it is not possible to query the swap interval using EGL
GHOST_WidnowCocoa.h: The comment referring to Carbon is clearly out of date, so I removed it.
math_geom.c: The node about not using tmax again is correct, but the code is kept for a future maintainer who will need to know how to compute it if they modify that code.
paint_image_proj.c (2698): The question about integer truncation does not appear to have been resolved. It still seems to be an incorrectly implementation of rounding (I'd suggest using the round function instead of this hack).
As it appears we can't really use mitchell filter together with the
current filter importance sampling,
This reverts commit 742911314322e5dae3a07469d0ca53b61427f978.
It's the same filter which is used by default by Blender Internal renderer
and it gives crispier edges than gaussian filter.
Default filter for Cycles is unchanged because it's unclear if new filter
gives more noise or not. After some further real production tests we can
consider making Mitchell filter default for Cycles as well.
This way it's easier to extend bitfields and see when we start running
out of free bits.
Plus added brief description of what SD_VOLUME_CUBIC flag means.
It is per-material setting which could be found under the Volume settings
in the material and world context buttons.
There could still be some code-wise improvements, like using variable-size
macro for interp3d instead of having interp3d_ex to which you can pass the
interpolation method.
This is the first step towards supporting cubic interpolation for voxel
data (such as smoke and fire). It is not epxosed to the interface at all
yet, this is to be done soon after this change.
This is so-called GPU limitation boundary hit, told compiler to NOT include
volume bound function, otherwise some real weird things used to happen.
We actually might want to do the same for CPU, inlining everything is not
the way to get fastest code.
Quite annoying, the same thing we do from the blender side, But as a positive
side we can get rid of some utf8/utf16 conversions.
Hopefully it all work fine now, at leats works on mu russki windoze laptop.
Also reduce number of branching and multiplications a bit by inlining the branches.
This gives an unmeasurable speedup, which is in case of BMW is about 2% here.
This gives more precise information about memory usage which might be real handy
when doing memory optimization.
It works good here for as long as i can tell but if for some reason you'll be
experiencing some weird slowdown please let me know.
Not as if it gives noticeable changes render-time, but it's just weird to
convert float4 to float 3 to just access individual x/y/z components.
Plus some compilers might be more stupid than GCC and don't optimize this
out well.
We queried the wrong value when looking for the bound 2D texture. This
is not totally robust because currently bound texture may not be a 2D
one, but this should work for now.
After discussion with cambo here we decided it's better to choose arbitrary side of the box
(in this case it's X-axis) and use image from it. That's better than doing a blackness.
P.S. This is literally a corner case anyway.
Ray actually should have infinite length, so we can detect camera in a volume
which is bigger that the far clipping of the camera.
This might also give some speedup (wouldn't expect much tho) because we don't
need to re-calculate ray direction and length after every bounce now.
basically we skip all non-volume objects now in the volume stack function.
Depending on the show it might give some percent of speedup.
Most of the speedup would be gained in the scenes when having SSS object
intersecting the volume and taking a reasonable amount of frame space.
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.
Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.
Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.
Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
* On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba407
* On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.
Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.
From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D813
Was hooked up last year for testing purposes, as we already had some code for it, but the closure itself is not really good nor really useful, so let's remove it.
This way there's much less cross-references between objects and meshes
device update functions.
The only thing remained s the object bounds calculation which is needed
by bvh update. This could also be decoupled, but it's not that crucial
yet because its's how it used to be for ages now.
This adds an AABB collision check for objects with volumes and if there's a
collision detected then the object will have SD_OBJECT_INTERSECTS_VOLUME flag.
This solves a speed regression introduced by the fix for T39823 by skipping
volume stack update in cases no volumes intersects the current SSS object.
This means it's no longer needed to enable experimental feature set in order to
have proper camera in volume support. And this also means if there's something
wrong going on, or if there's speed regression for cases when camera is obviously
not in the volume -- this issues are to be reported and handled in the regular
matter.
Happy blending!
The idea is to only count intersections with objects which has volumetric shader
and ignore all other objects.
This is probably as fast as we can go without involving some forth level magic.
Now we do much better preliminary check for panoramic camera is inside the
volume object boundings.
Also we're now cacheing the has_volume in the mesh, which makes it unneeded
iterations for each object's shaders.
Should be no functional changes, just faster sync and panoramic-in-volume
rendering.
This helps to improve the accuracy of UV unwrapping and laplacian deform for
high poly meshes, which could get warped quite badly. It's not much slower,
doubles are pretty fast on modern CPUs, but it does double memory usage. This
seems acceptable as otherwise high poly meshes would not work correctly anyway.
Fixes T39004.
It works around strange shading bug when building with MSVC.
If such weirdeness continues, we perhaps would need to use
proper inline flags all the time.
Anyway, lets see how things will behave now.
Instead of having a label which basically duplicated the information
about experimental feature set being used (which had a bug because
it claimed experimental GPU kernel is used even if compute device is
CPU btw) now we've got an enum item icon.
So once you switched to experimental feature set you'll see an
exclamation mark icon in the enum, so you know something might be
unstable or slow.
Updating maximum requires a bit of a cycle which usually does 1 iteration only,
sometimes needs a bit more but seems there's no speed regressions.
For now the code is commented out. This way it's easier for others to verify
there's no speed regressions.
Reviewers: campbellbarton
Differential Revision: https://developer.blender.org/D626
In order to compile the new kernel you need to specify sm_52 in SCons / CMake, and use CUDA Toolkit 6.5.19, from here: https://developer.nvidia.com/cuda-downloads-geforce-gtx9xx
Note: sm_52 is not enabled per default yet, so it won't be bundled with the Buildbot builds. That will be addressed later.
Basically the title says it all, volume stack initialization now is aware that
camera might be inside of the volume. This gives quite noticeable render time
regressions in cases camera is in the volume (didn't measure them yet) because
this requires quite a few of ray-casting per camera ray in order to check which
objects we're inside. Not quite sure if this might be optimized.
But the good thing is that we can do quite a good job on detecting whether
camera is outside of any of the volumes and in this case there should be no
time penalty at all (apart from some extra checks during the sync state).
For now we're only doing rather simple AABB checks between the viewplane and
volume objects. This could give some false-positives, but this should be good
starting point.
Need to mention panoramic cameras here, for them it's only check for whether
there are volumes in the scene, which would lead to speed regressions even if
the camera is outside of the volumes. Would need to figure out proper check
for such cameras.
There are still quite a few of TODOs in the code, but the patch is good enough
to start playing around with it checking whether there are some obvious mistakes
somewhere.
Currently the feature is only available in the Experimental feature sey, need
to solve some of the TODOs and look into making things faster before considering
the feature is ready for the official feature set. This would still likely
happen in current release cycle.
Reviewers: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D794
Basically the title says it all, we need to update volume stack when doing ray
scatter for SSS. This leads to speed regressions in cases scene does have both
volume and SSS (performance in case there's no SSS or no volume should be the
same).
We might try optimizing kernel_path_subsurface_update_volume_stack() a bit by
either recording all intersections or using some more appropriate visibility
flags.
Reviewers: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D795
This is rather useful to see how good optimization went and so.
Currently uses quite simple notation: shader nodes are nodes on the
graph, connects between graph nodes are named by the sockets names,
so i.e. connection between BSDF and Mix would be named bsdf:closure1.
Could be improved in the feature to draw fancier graph, but it's good
enough already.
Use in the following way:
- To create graphix file call graph->dump_graph("graph.dot")
- To visualize the grapf call: dot -Tpng graph.dot -o graph.png
This commit makes it possible to use Glog library for the debug logging.
For now only possible when using CMake and in order to use the logging
the WITH_CYCLES_LOGGING configuration variable is to be enabled.
When this option is not enabled or when using Scons there's no difference
in Cycles behavior at all, when using logging and no output to the console
impact is gonna to be minimal.
This is done in order to make it possible to have debug logging persistent
in code (without need to add it when troubleshooting some bug and removing
it afterwards).
For now actual logging is not placed yet, only all the functions needed for
the logging are written and so.
tri_shader does no longer need to a float.
Reviewers: dingto, sergey
Reviewed By: dingto, sergey
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D789
Basically the same as AC2c58e96685e8, but for Mix RGB Shaders, in case we use the Mix type. This way the node can be used as texture switch for example, setting the Factor to 0.0 or 1.0, without wasting extra memory / render time.
This adds a fresnel conductive OSL preset to the Text Editor. Based on a patch by Lukas Stockner.
Differential revision: https://developer.blender.org/D145
See the differential for details.
* Fix caustic properties, was not updated.
* Remove wrong items, leftovers from panel splitting.
* Add missing items. Even if the bundled presets do not set those, a user expects that all properties inside the panel are taken into account, when adding a new preset.
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.
Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
Before this Cycles used to try using the cache even so it knew for the
fact that reading it from the disk failed. This change doesn't make it
more stable if someone will try to trick Cycles and give malformed data
but it solves general cases when Blender crashed during the cache write
and will preserve rendering from crashing when trying to use that partial
cache.
I don't see a reason not to do this, and this also fixes update problems when 3D View rendering is running (no volume shader), and then a volume shader gets added.
There were several issues involved into triangle ribbons hair:
- Even for the viewport rendering the blender scene camera was
used for orientation. This made hair triangles oriented to
the scene camera, not to the viewport camera.
- Triangle orientation was actually supposing the camera is
perspective. Triangles weren't oriented properly for the
orthographic camera resulting in different hair width across
it's length.
This issues are solved now, but there are some related TODOs:
- Rotating viewport doesn't re-orient the triangles, so after
viewport navigation hair might not look correct. However,
with this fix toggling viewport render (to force hair sync)
makes viewport render correct.
This isn't so much trivial fix, would require making BVH
aware of the dynamic triangle orientation, so they get
properly oriented without full hair re-sync.
- Panorama camera behavior didn't change but looks like it
should, however not really sure atm what's the right thing
to do here.
The issue was caused by the changed defaults from the Cycles side.
Because of those properties being saved as an IDProp and not being
saved to the file, every change to the defaults would ruin someone's
day updating the values.
Added a bpy.app.handler.version_update which is run after the regular
do_versions() are done and could be sued by the scripts to apply
versioning code on their settings.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D761
Issue was caused by the precision issues which made sdivm by 1 under
it's actual value. We can try to do some eps magic, but from the tests
on laptop and desktop doing integer division is not slower than using
floats here.
Thanks for Aldo Zang for the help with the fix for the panorama/fisheye
depth of field calculation and the overall math.
Reviewed By: sergey, dingto
Subscribers: juicyfruit, gregzaal, #cycles, dingto, matray
Differential Revision: https://developer.blender.org/D753
Now we build 2 .cubins per architecture (e.g. kernel_sm_21.cubin, kernel_experimental_sm_21.cubin).
The experimental kernel can be used by switching to the Experimental Feature Set: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Experimental_Features
This enables Subsurface Scattering and Correlated Multi Jitter Sampling on GPU, while keeping the stability and performance of the regular kernel.
Differential Revision: https://developer.blender.org/D762
Patch by Sergey and myself.
Developer / Builder Note:
CUDA Toolkit 6.5 is highly recommended for this, also note that building the experimental kernel requires a lot of system memory (~7-8GB).
It can be helpful in some cases and it works properly, so no need to hide it behind the experimental flag anymore. It's only enabled for the CPU though.
Limitations:
* Smoke/Fire rendering is *not* supported on GPU yet, that is also documented here: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Materials/Volume
* Decoupled Ray Marching is also not supported yet, so no Equi-Angular and MIS sampling yet.
Note for Builders and Developers:
* Make sure to use the CUDA Toolkit 6.5 from now on. 6.0 might still work, but can cause slower renders.
This was originally caused by a6ae12a where i didn't foresee unclear distinguishing
between empty and non-synced meshes will give issues for the viewport. They're the
same for final rendering, but for viewport we need to be accurate here.
That was only needed in the beginning, when we did not had support for tangents. It's time to clean some of the defines up, it's getting a bit too much.
* __VOLUME__ is basic volume support with Emission and Absorption.
* __VOLUME_SCATTER__ enables volume Scattering support.
* __VOLUME_DECOUPLED__ enables Decoupled Ray Marching.
This problem was introduced in 983cbafd1877f8dbaae60b064a14e27b5b640f18
Basically the issue is that we were not getting a unique index in the
baking routine for the RNG (random number generator).
Reviewers: sergey
Differential Revision: https://developer.blender.org/D749
Operators' poll func might be called from anywhere in Blender, so they should
not make any assumption about available context. material, lamp and world
are specific to context from Properties space...
It now uses the tile size to split the job. For CPU this may add
overhead, but for GPU this is highly needed.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D690
* Don't compute expf() for every step, instead sum the intermediate values and calculate it every N (8 for now) steps. This helps a few percent (~5% on a cube with wave texture) in my tests here.
Root of the issue goes back to the on-fly normals commit and the
latest fix for it wasn't actually correct. I've mixed two fixes
in there.
So the idea here goes back to storing negative scaled object flag
and flip runtime-calculated normal if this flag is set, which is
pretty much the same as the original fix for the issue from me.
The issue with motion blur wasn't caused by the rumtime normals
patch and it had issues before, because it already did runtime
normals calculation. Now made it so motion triangles takes the
negative scale flag into account.
This actually makes code more clean imo and avoids rather confusing
flipping code in mesh.cpp.
This is rather workaround solution for now, which seems to
work and it's not that huge to maintain (one liner apart from
the comment).
Idea is to make sure PeekMessage peeks the message when window
proc receives WM_MOUSEWHEEL (some touchpad drivers seems to
swallow the messages making it so PeekMessage doesn't get
anything).
We can't really make CPU and GPU results look the same in all possible
circumstances, but here we can make them look close enough to each other
by making it so sobol pattern for bounce number is the smae for both
CPU and GPU.
This makes CPU and GPU render results look the same with low number of
samples, high number of samples was never an issue.
This commit merges the code in the pie-menu branch.
As per decisions taken the last few days, there are no pie menus
included and there will be an official add-on including overrides of
some keys with pie menus. However, people will now be able to use the
new code in python.
Full Documentation is in http://wiki.blender.org/index.php/Dev:Ref/
Thanks:
Campbell Barton, Dalai Felinto and Ton Roosendaal for the code review
and design comments
Jonathan Williamson, Pawel Lyczkowski, Pablo Vazquez among others for
suggestions during the development.
Special Thanks to Sean Olson, for his support, suggestions, testing and
merciless bugging so that I would finish the pie menu code. Without him
we wouldn't be here. Also to the rest of the developers of the original
python add-on, Patrick Moore and Dan Eicher and finally to Matt Ebb, who
did the research and first implementation and whose code I used to get
started.
- Forgot to handle command line arguments
- Because of the fact we need to be able to
use stdout and stderr we need to use regular
console application for the wrapper.
- Because of using regular application for the
wrapper we need to check forparent PID in the
isStartedFromCommandPrompt().
I really hope it's not gonna to become any more
complicated.
In collaboration with Sergey Sharybin.
Also thanks to Wolfgang Faehnle (mib2berlin) for help testing the
solutions.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D690
For now it was mainly about OpenCL wrangler being duplicated
between Cycles and Compositor, but with OpenSubdiv work those
wranglers were gonna to be duplicated just once again.
This commit makes it so Cycles and Compositor uses wranglers
from this repositories:
- https://github.com/CudaWrangler/cuew
- https://github.com/OpenCLWrangler/clew
This repositories are based on the wranglers we used before
and they'll be likely continued maintaining by us plus some
more players in the market.
Pretty much straightforward change with some tricks in the
CMake/SCons to make this libs being passed to the linker
after all other libraries in order to make OpenSubdiv linked
against those wranglers in the future.
For those who're worrying about Cycles being less standalone,
it's not truth, it's rather more flexible now and in the future
different wranglers might be used in Cycles. For now it'll
just mean those libs would need to be put into Cycles repository
together with some other libs from Blender such as mikkspace.
This is mainly platform maintenance commit, should not be any
changes to the user space.
Reviewers: juicyfruit, dingto, campbellbarton
Reviewed By: juicyfruit, dingto, campbellbarton
Differential Revision: https://developer.blender.org/D707
The issue was caused by the render engine loading edit mesh, which re-allocates
mesh array which might be referenced by other object's derived meshed.
Worst thing about this is that updating render engine happens from the end of
scene update function, after all the objects are updated and so. This is needed
so render engine gets the update objects which is correct.
The only proper way to solve the issue is to make it so viewport engine does not
leave objects in inconsistent state, meaning nobody will reference to freed data.
In order to reach this we do edit mesh loading before running objects update so
all the objects which uses that mesh will have proper references in the derived
mesh.
This also solves old creepyness which happened before when having single object
in edit mode. tweaking it will calculate derived mesh as a part of scene update,
then this derived mesh will be freed by edit mesh loading and viewport will be
creating derived mesh again.
Now render engine is expected to do nothing with meshes which are in edit mode,
but they still need to load edit data for non0meshes. It's not really easy to
do from the BKE level because needed functions are implemented in the editor.
Thanks Campbell for the review!
Differential Revision: https://developer.blender.org/D697
(original patch by Sergey Sharybin)
Note: RNA API can't use size_t at the moment. Once it does this patch
can be tweaked a bit to fully benefit from size_t larger dimensions.
(right now num_pixels is passed as int)
Reviewed By: sergey, campbellbarton
Differential Revision: https://developer.blender.org/D688
uses old behavior again. OSX menu and CTL-CMD-F still work as lion fullscreen as well as right-upper corner fs window-icon
- We must investigate here why double promotion happens from op calls ( dispatchEvents on redraw cause duplicated calls here )
- The actual op calls cause fs to be in a wrong state, so also mousehandles fail and CTX_wm_window(C) is not valid.
- similar problem is with quit op, which does not close the app right ( totblocks )
- i would prefer to try getting direct os function call here rather
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D656