This way we'll be sure (in debug builds) that regular BVH traversal is not used
for QBVH tree (could happen because of mismatch of logic in kernel and render).
The reason for this is that we don't sue SSE optimization for 32bit platforms
because of T36316.
Things to look into:
- Nail the root of the issue of that report
- Implement non-SSE traversal code for QBVH
Seems the parent check didn't go deep enough and only checked single parent.
Now it checks the chain of parents which seems to be correct but requires
much more intense testing.
This commit implements heuristic which allows to skip nodes pushed to the stack
from intersection if distance to them is larger than the distance to the current
intersection.
This should solve speed regression which i didn't notice in the original QBVH
commit (which could have because i had WIP version of this patch applied in my
local branch).
From quick tests speed seems to be much closer to what is was with regular BVH.
There's still some possible code cleanup, but they'll need a bit of assembly
code check and now i want to make it so artists can happily use Cycles over the
holidays.
basically shadow rays were totally broken and most of the time did not record
any intersections, leading to really ad rendering artifacts.
This commit makes it so regardless of enabled optimization level render result
would be the same.
This issue doesn't happen with 6.5.12 and there's slight piece of hope it'll be
fixed in next toolkit releases..
For now we're forcing CUDA to not inline ray precalculation. This could lead to
some speed regression, but wouldn't expect it to be huge -- this code does not
run that often comparing to actual triangle intersection.
We might remove this again in the future, but for testing purposes
during the release cycle, this will be useful.
The setting defaults to QBVH, and can be found in the Performance panel.
This is harmless for now because tail of the node is zero in there, but better
to fix it early so in the case of extending BVH nodes this code doesn't give
issues.
This commit enables QBVH optimization structure automatically if rendering
with CPU and SSE2 support is detected.
This brings render time of agent shot back to the speed it used to be before
the watertight intersections commit, single koro and sponza scenes are about
7% faster here.
This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.
This commit also does some changes to the loop inspired by Embree:
- Visibility flags are only checked for primitives.
Doing visibility check for every node cost quite reasonable amount of time
and in most cases those checks are true-positive.
Other idea here would be to do visibility checks for leaf nodes only, but
this would need to be investigated further.
- For minimum hair width we extend all the nodes' bounding boxes.
Again doing curve visibility check is quite costly for each of the nodes and
those checks returns truth for most of the hierarchy anyway.
There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.
Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).
The code is enabled for compilation in kernel. but blender wouldn't use it
still.
Most of them are not currently used but are essential for the further work.
- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i
- Added templatedversions of min4, max4 which are handy to use with register
variables.
- Added util_swap function which gets arguments by pointers.
So hopefully it'll be a portable version of std::swap.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
The idea is to store visibility flags for leaf nodes only since visibility check
for inner nodes costs too much for QBVH hence it is not optimal to perform.
Leaf QBVH nodes have plenty of space to store all sort of flags, so we can make
nodes one element smaller, saving noticeable amount of memory.
Previously offsets were calculated based on the BVH node size,
which is wrong and real PITA in cases when some extra data is
to be added into (or removed from) the node.
Now use offsets which are not calculated form the node size.
This solves quite an over-allocation in BVH instances packing code,
unfortunately, it's not a magic bullet to solve memory bump caused
by the recent QBVH changes.
For that we'll likely need to decouple storage for leaf and inner
nodes. However, it's not really clear for now if it's something
important since that'd still be just a fraction of memory comparing
to all the hi-res textures.
Title says it all, quite straightforward implementation.
Would only mention that there's a bit of code duplication around packing node
into pack.nodes. Trying to de-duplicate it ends up in quite hairy code (like
functions with loads of arguments some of which could be NULL in certain
circumstances etc..). Leaving solving this duplication for later.
Before all the nodes were counted and allocated, leading to situations when
bunch of allocated memory is not used because reasonable amount of nodes are
simply ignored.
Visibility flags are set to all visibility anyway, So there was no reason
to perform that test.
TODO: We need to investigate if having primitive intersection functions
which doesn't do visibility check gives any speedup here as well.
This way extending intersection routines with some pre-calculation step wouldn't
explode the single file size, hopefully keeping them all in a nice maintainable
state.
These nodes were assuming sRGB input/output which is for sure wrong for the
shader pipeline which works in the linear space.
So now conversion to/from linear space happens in these nodes which makes them
making sence in the shader context but which might change look and feel of
existing scenes.
The issue was caused by the way how RNA pointer was created for the bMain:
namely Cycles was using RNA_id_pointer_create to create the pointer, which
would then try to refine the poniter based on the ID type.
This is just wrong and worked so far just because of co-incident, with the
file path from the bug report first letters in the ID name happened to be
NT which corresponds to NodeTree, and for sure refining such pointer will
fail.
Simple solution -- use proper way to create RNA pointer for non-ID block.
As discussed in rB983c71931b1886d4, we should print a warning in case of building on non-Windows and WITH_BF_IME enabled. We also terminate build in this case, so the warning isn't scrolled away. Was worked out together with @sergey.
This was caused by some internal optimization which evaluated SSS with
size of zero as BSDF but used different ID so the evaluation result
didn't appear in regular diffuse pass.
This lead to situation when SSS data was nowhere stored if the
size was zero.
Now SSS with zero size and close-to-zero sizes will be handled in the
same way from the passes point of view.
Original patch by @random (D765) with some minor work done by @campbell
and me.
At this place, I'd like call out a number of people who were involved and
deserve a big "Thank you!":
* At the first place @randon who developed and submitted the patch
* The Blendercn community which helped a lot with testing - espacially
* @yuzukyo, @leon_cheung and @kjym3
* @campbellbarton, @mont29 and @sergey for their help and advises during
* review
* @ton who realized the importance of this early on and asked me for
* reviewing
We are still not finished, as this is only the first part of the
implementaion, but there's more to come!
SpaceMouse Wireless
SpaceMouse Pro Wireless
Device info is from user reports. I don’t yet have the new devices, so
these are untested but likely to work :D
This way CUDA errors are visible in the image info line,
which makes things to behave the same across viewport and
final rendering.
That's right, we've got error reported via reports and info
line now. This is based on the feedback from our gooseberry
team.
This way when something goes wrong in Cycles (for example out of VRAM, timelimit
launching the kernel etc) we'll have a nice report in the Info space header.
Sure it'll be nice to have mention of error in the image editor's information
line, but that's for the future.
This fixes T42747: "CUDA error" appears only momentarily, then disappears
Currently it acts the same as set_cancel(), but this way we're able to
distinguish situations when rendering was aborted by user demand (for
example pressing Esc in standalone renderer) or if something went horribly
wrong (for example out of VRAM error).
This way for example we wouldn't wait a fortune while BVH is building after
GPU run out of memory when loading images just to see the render failure
message.
Forbid OSL from polluting current conext with obscure stuff from
windows.h, it's not useful and unhealthy anyway.
Maybe we sohuld also forbid using abbreviated Glog constants as
well tho.
Since the aligned allocation of shader closures in OSL memory pool
this workaround is no longer needed.
Also put a comment which describes the desired layout of the structure
so array of shader closures is all nicely aligned.
This solves bugs like T42210 which are caused by compiler being
smart and using some SSE instructions to operate with closure
classes, which was failing because those classes are not allocated
by the regular allocator but allocated in memory pool in OSL.
With newer versions of OSL it is now possible to force closure
classes being aligned to a given boundary and this commit uses
this new functionality.
Unfortunately, it means we're no longer compatible with older
versions of OSL, only latest git version from upstream and our
branch at github are supported:
https://github.com/Nazg-Gul/OpenShadingLanguage/tree/blender-fixes
For OSX and Windows it's not an issue because libraries are
already updated there, Linux users would need to run install_deps
script.
Fixes the problem that for big sequences too many file handles were open at the same time.
Changes the playback handles that the audio sequencing code manages to be closed and reopened when needed. The metric used is the current playback position in relation to the strip. If the strip is more than 10 seconds (configurable) away from the playback cursor, the handle is released and reopened when needed.
See D915.
This patch includes the work done in the terrible consequencer branch
that hasn't been merged to master minus a few controversial and WIP
stuff, like strip parenting, new sequence data structs and cuddly
widgets.
What is included:
* Strip extensions only when slipping. It can very easily be made an
option but with a few strips with overlapping durations it makes view
too crowded and difficult to make out.
* Threaded waveform loading + code that restores waveforms on undo (not
used though, since sound_load recreates everything. There's a patch for
review D876)
* Toggle to enable backdrop in the strip sequence editor
* Toggle to easily turn on/off waveform display
* Snapping during transform on sequence boundaries. Snapping to start or
end of selection depends on position of mouse when invoking the operator
* Snapping of timeline indicator in sequencer to strip boundaries. To
use just press and hold ctrl while dragging.
Reviewers: campbellbarton
Differential Revision: https://developer.blender.org/D904
The issue was noticed with gcc-4.7 (used by the release build environment)
which didn't generate optimal enough code for BVH references swap. Seems it
looked up for the assign operator for each of the reference structure members
even though nothing special was required for assignment.
Forcing compiler to use simple memory copy gives speedup of like 2-3 times.
The issue doesn't happen with OSX's clang and new gcc-4.9, but since we're
gonna to stick to gcc-4.7 for official releases for quite some time still it's
nice to have performance issues resolved for all the compilers.
Didn't put the code into #ifdef so if in the future some issues appears with
alignment or assignment which need to happen as an operator we notice this
earlier.
* Alpha Property was removed (Fix T42690)
* Some tweaks to make the panel look better again.
* Use abreviated form "Multiple Importance" everywhere, for consistency.
So now cases when object has both hair motion blur and deformation motion blur
vector pass is all correct.
We could get rid of the flag in the future, still need to look deeper into all
the areas trying to find a more clear solution.
Issue was caused by mismatch in pre/post transform matrix spaces for mesh and
curve vectors. This happened because of current way how static transform apply
works: it only stores post/pre in the world space if there's triangle motion
exists. This lead to situation when there's no triangle motion happening but
was hair motion happening.
After long time of trying to solve it in a nice way, ended up solving it in
a bit slow way -- pre/post transform is still storing in the same spaces as
they used to be stored and just convert hair pre/post position to a world
space in the kernel.
This is because currently it's not so clear how to deal with cases when curve
and mesh motion needs different space of pre/post transform (which happens in
cases when only one of the motions exists).
Would think of some magic, and meanwhile artists could be happy with proper
render results.
This is mainly to address old issue when one need to have SDL library installed
in order to use our official builds. Some hip distros already installs SDL,
but it's not quite the same across all the variety of the distros.
We also now switching to SDL-2.0, most of the distros have it in repositories
already, so it shouldn't be huge deal to install it if needed.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D878
It appears it's not really needed for convenient debugging when
using proper flags passed to the compiler. Basically, it is -g3
and set breakpoint to a function as if it's not in the namespace.
Not as if a code was any wrong, just it's possible to have more
clear solution for the issue i've tried to solve in the past.
Create unique flag for output shaders with displacement data and use it
to calculate transformed normal. Implementation suggested by Brecht Van
Lommel.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D890
The idea is to avoid memory allocation when only one segment step is to be allocated.
This gives some speedup which is difficult to measure on this trashcan from hell, but
it's about from 7% to 10% in the extreme case with single volume filling the whole of
the viewport. This seems to depends on the phase of the bug-o-meter in the studio.
On the linux boxes it's not that spectacular speedup, it's about 2% on my laptop and
about 3% on the studio desktop. This is likely because of the awesomeness of jemalloc.
Add compile-time check for particular glibc version which fixed the issue.
This makes it so own-compiled blender is the fastest in the world, and the
only issue remains what should we do for release builds.
After some discussion with Campbell we decided to keep it as is for now
because slowdown is not that much noticeable. We'll disable this workaround
for release builds when all the majority of the distros will switch to the
new version of glibc.
That code was mainly needed for the transition period, now we've
got all platforms updated to new OSL.
Plus there are some crucial fixes baking in the current upstream
sources which we'll need to have for the next Blender release.
Even tho it's not 100% clear when we'll switch to OSL-1.6 we'd better
start preparing earlier for this, so we don't spend time on this later.
Plus this code helps troubleshooting some OSL issues, which requires
testing with latest versions of OSL.
Ghost depends on glew-mx, so glew-mx should be passed to linker after the ghost.
We're also using spaces for indentation in python, including scons rules.
The issue was caused by GLEW MX enabled in SCons by default so
basically previous commit already fixed the crash. But we need
to be safe here.
For now the fix is simple and not that clean, just check if
there's an OpenGL context available and if not we don't do any
GLSL magic.
This is to be cleaned up after some discussion with the viewport
project guys.
This mainly happens when over-saturating already saturated color.
After some discussion with Campbell and loads of tests we decided
to clamp the result RGB color. As an alternative we might want to
clamp corrected HSV values instead, but that would lead to some
larger changes in the render results.
TODO: The same is to be done for compositor nodes.
This is basically just a wrapper class, which maps the generic call from the OSL spec to our closures.
Example usage:
shader microfacet_osl(
color Color = color(0.8),
int Distribution = 0,
normal Normal = N,
vector Tangent = normalize(dPdu),
float RoughnessU = 0.0,
float RoughnessV = 0.0,
float IOR = 1.4,
int Refract = 0,
output closure color BSDF = 0)
{
if (Distribution == 0)
BSDF = Color * microfacet("ggx", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
else
BSDF = Color * microfacet("beckmann", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
}
The ones in extern/glew-es have been changed to NOTE instead of XXX
GHOST_ContextEGL.cpp: It really does seem that it is not possible to query the swap interval using EGL
GHOST_WidnowCocoa.h: The comment referring to Carbon is clearly out of date, so I removed it.
math_geom.c: The node about not using tmax again is correct, but the code is kept for a future maintainer who will need to know how to compute it if they modify that code.
paint_image_proj.c (2698): The question about integer truncation does not appear to have been resolved. It still seems to be an incorrectly implementation of rounding (I'd suggest using the round function instead of this hack).
As it appears we can't really use mitchell filter together with the
current filter importance sampling,
This reverts commit 742911314322e5dae3a07469d0ca53b61427f978.
It's the same filter which is used by default by Blender Internal renderer
and it gives crispier edges than gaussian filter.
Default filter for Cycles is unchanged because it's unclear if new filter
gives more noise or not. After some further real production tests we can
consider making Mitchell filter default for Cycles as well.
This way it's easier to extend bitfields and see when we start running
out of free bits.
Plus added brief description of what SD_VOLUME_CUBIC flag means.
It is per-material setting which could be found under the Volume settings
in the material and world context buttons.
There could still be some code-wise improvements, like using variable-size
macro for interp3d instead of having interp3d_ex to which you can pass the
interpolation method.
This is the first step towards supporting cubic interpolation for voxel
data (such as smoke and fire). It is not epxosed to the interface at all
yet, this is to be done soon after this change.
This is so-called GPU limitation boundary hit, told compiler to NOT include
volume bound function, otherwise some real weird things used to happen.
We actually might want to do the same for CPU, inlining everything is not
the way to get fastest code.