The issue was caused bu the optimization in surface attributes for cases when
there's only a volume shader used. Some attributes doesn't make sense in that
case and were skipped from calculation.
However, it is possible that kernel would still try to access them (because of
the shader setup etc). Prevented an infinite loop in the kernel now, which
should not have much affect on regular renders.
It is possible that ray distance will be zero which would make intersection
refinement return NaN as the refined position which would later lead to all
sort of mathematical issues.
Don't think there are ways to improve intersection accuracy for such rays
so just return original intersection coordinate.
This should fix T43475.
TODO: Need to look into possible issues in Ashikhmin BSDF which might return
zero-length reflected/transmitted ray?
This commit enables BVH leaf nodes split by the primitive type and makes it
so BVH traversal code is now aware and benefits from this.
As was mentioned in original commit, this change is crucial to be able to do
single ray to multiple triangle intersection. But it also appears to give
barely visible speedup in some scene.
In any case there should be no noticeable slowdown, and this change is what
we need to have anyway.
OpenCL apparently does not support templates, so the idea of generic
function for swapping is a bit of a failure. Now it is either inlined
into the code (in triangle intersection) or has specific implementation
for QBVH.
This is probably even better, because we can't create QBVH-specific
function in util_math anyway.
This way we'll be sure (in debug builds) that regular BVH traversal is not used
for QBVH tree (could happen because of mismatch of logic in kernel and render).
This commit implements heuristic which allows to skip nodes pushed to the stack
from intersection if distance to them is larger than the distance to the current
intersection.
This should solve speed regression which i didn't notice in the original QBVH
commit (which could have because i had WIP version of this patch applied in my
local branch).
From quick tests speed seems to be much closer to what is was with regular BVH.
There's still some possible code cleanup, but they'll need a bit of assembly
code check and now i want to make it so artists can happily use Cycles over the
holidays.
basically shadow rays were totally broken and most of the time did not record
any intersections, leading to really ad rendering artifacts.
This commit makes it so regardless of enabled optimization level render result
would be the same.
This issue doesn't happen with 6.5.12 and there's slight piece of hope it'll be
fixed in next toolkit releases..
For now we're forcing CUDA to not inline ray precalculation. This could lead to
some speed regression, but wouldn't expect it to be huge -- this code does not
run that often comparing to actual triangle intersection.
This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.
This commit also does some changes to the loop inspired by Embree:
- Visibility flags are only checked for primitives.
Doing visibility check for every node cost quite reasonable amount of time
and in most cases those checks are true-positive.
Other idea here would be to do visibility checks for leaf nodes only, but
this would need to be investigated further.
- For minimum hair width we extend all the nodes' bounding boxes.
Again doing curve visibility check is quite costly for each of the nodes and
those checks returns truth for most of the hierarchy anyway.
There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.
Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).
The code is enabled for compilation in kernel. but blender wouldn't use it
still.
Most of them are not currently used but are essential for the further work.
- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i
- Added templatedversions of min4, max4 which are handy to use with register
variables.
- Added util_swap function which gets arguments by pointers.
So hopefully it'll be a portable version of std::swap.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
Previously offsets were calculated based on the BVH node size,
which is wrong and real PITA in cases when some extra data is
to be added into (or removed from) the node.
Now use offsets which are not calculated form the node size.
Visibility flags are set to all visibility anyway, So there was no reason
to perform that test.
TODO: We need to investigate if having primitive intersection functions
which doesn't do visibility check gives any speedup here as well.
This way extending intersection routines with some pre-calculation step wouldn't
explode the single file size, hopefully keeping them all in a nice maintainable
state.
So now cases when object has both hair motion blur and deformation motion blur
vector pass is all correct.
We could get rid of the flag in the future, still need to look deeper into all
the areas trying to find a more clear solution.
Issue was caused by mismatch in pre/post transform matrix spaces for mesh and
curve vectors. This happened because of current way how static transform apply
works: it only stores post/pre in the world space if there's triangle motion
exists. This lead to situation when there's no triangle motion happening but
was hair motion happening.
After long time of trying to solve it in a nice way, ended up solving it in
a bit slow way -- pre/post transform is still storing in the same spaces as
they used to be stored and just convert hair pre/post position to a world
space in the kernel.
This is because currently it's not so clear how to deal with cases when curve
and mesh motion needs different space of pre/post transform (which happens in
cases when only one of the motions exists).
Would think of some magic, and meanwhile artists could be happy with proper
render results.
It is per-material setting which could be found under the Volume settings
in the material and world context buttons.
There could still be some code-wise improvements, like using variable-size
macro for interp3d instead of having interp3d_ex to which you can pass the
interpolation method.
Not as if it gives noticeable changes render-time, but it's just weird to
convert float4 to float 3 to just access individual x/y/z components.
Plus some compilers might be more stupid than GCC and don't optimize this
out well.
* On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba407
* On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.
Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.
From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D813
The idea is to only count intersections with objects which has volumetric shader
and ignore all other objects.
This is probably as fast as we can go without involving some forth level magic.
tri_shader does no longer need to a float.
Reviewers: dingto, sergey
Reviewed By: dingto, sergey
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D789
Root of the issue goes back to the on-fly normals commit and the
latest fix for it wasn't actually correct. I've mixed two fixes
in there.
So the idea here goes back to storing negative scaled object flag
and flip runtime-calculated normal if this flag is set, which is
pretty much the same as the original fix for the issue from me.
The issue with motion blur wasn't caused by the rumtime normals
patch and it had issues before, because it already did runtime
normals calculation. Now made it so motion triangles takes the
negative scale flag into account.
This actually makes code more clean imo and avoids rather confusing
flipping code in mesh.cpp.
Fix T41115: Motion Blur renders Objects Black - But not in Viewport Preview
This actually extends previous fix to normals and makes it all much nicer now.
Worth doing some intense testing, quick one worked just fine but there always
could be some corner cases.
Fix T41079: Solid black render of object with negative scale and smooth shading
In both cases the issue was caused by negative scaled objects with single mesh
users for which scale gets applied when using static BVH.
Since the on-fly normals calculation land normals for such cases weren't flipped
leading them to point to a wrong direction.
Added a special object flag for this, which is a bit of a bummer because now
we've got less bits for real useful things, but this is the only way to get
proper normals without adding more complexity in the on-fly calculations.
* CUDA can be compiled with Volume support again, change line 78 kernel_types.h for that.
Volumes are still fragile on GPU though, got some Memory/Address CUDA errors in tests.. needs to be investigated more deeply.
* Added support for uchar4 attributes to Cycles' attribute system.
* This is used for Vertex Colors now, which saves some memory (4 unsigned characters, instead of 4 floats).
* GPU Texture Limit on sm_20 and sm_21 decreased from 95 to 94, because we need a new texture for the uchar4 attributes. This is no problem for sm_30 or newer.
Part of my GSoC 2014.
Instead of pre-calculation and storage, we now calculate the face normal during render.
This gives a small slowdown (~1%) but decreases memory usage, which is especially important for GPUs,
where you have limited VRAM.
Part of my GSoC 2014.