This way, connecting Value or RGB node to e.g. a Math node will still allow folding.
Note: The same should be done for the ConvertNode, but I leave that for another day.
Previously RGB Curves node will clamp input to 0..1 which is rather useless
when one wants to use HDR image textures and do bit of correction on them.
Now kernel code supports extrapolation of baked LUT based on first/last two
table points and performs linear extrapolation.
The only tricky part is to guess the range to bake the LUT for. Currently
it's using simple approach -- minmax of the input curves. While this behaves
ok for the simple cases it's easy to trick the system up causing incorrect
results.
Not sure we can solve those issues in a general case and since the new code
is giving more expected results it's not that bad actually. In the worst
case artist migh always create explicit point to make sure LUT is created
for the needed HDR range.
Reviewers: brecht, juicyfruit
Subscribers: sebastian_k
Differential Revision: https://developer.blender.org/D1658
This must have happened months ago, but as I did not `make clean` any build folder since then,
so only noted that today.
Issue is same as dirty patch we have to apply to ODL sources before building it in install_deps.sh - for
some mysterious reason, it has become impossible to compoile .osl files into .oso ones without
giving explicit output file name (otherwise it just produces `.oso` file - utterly stupid and useless).
We could probably fix that in own OSL source, but think being explicit here does not hurt anyway, so...
Let's go the easy way.
This reduces stress on the the stack memory which could be really handy
on certain operation systems which applies strict limits on the stack.
Reviewers: brecht, juicyfruit, dingto
Reviewed By: brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1656
There were multiple issues which are solved now:
- It was possible that ray wouldn't be bounced off the BSSRDF, for example
when PDF or shader eval is zero. In this case PathState might have been
left in pre-bounced state which would have been gave incorrect shading
results.
This is solved by having separate PathState for each of the hits.
- Path radiance summing wasn't happening correct as well, indirect rays
were using wrong path radiance in the case when there were more than
one hit recorded.
This is now using a bit trickier state machine which calculates path
radiance for just SSS (both direct and indirect) and then sums it back
to the final radiance.
- Previous commit wasn't totally correct either and was an induced bug
due to wrong path state left from the "un-happened" ray bounce.
There should be no special case happening here, BSSRDFs will be replaced
with diffuse ones due to PATH_RAY_DIFFUSE_ANCESTOR flag.
- Merged back codebases for "delayed" and "immediate" indirect SSS ray
tracing, hopefully making it easier to maintain the codebase.
Sure this changes brings memory usage back by about 4-5%, but overall
it's still about 2x memory reduction for the experimental kernel here.
Thanks Brecht for the review!
This is actually how it was intended to work, just didn't notice it wasn't
really happening in the main ray loop.
Solves some memory issues reported in T46880.
There are some issues to be solved with the recent optimization we did for
the indirect rays for the SSS. Those issues will take a bit of a time to
be fully solved still and we need to unlock Caminandes team now, so let's
revert some changes back.
CUDA will still use delayed indirect rays since it's an experimental
feature.
For the details about what's to be done still please refer to T46880.
This wasn't really a complete fix and only worked if there was a single scatter
event recorded only. Proper fix requires some more thoughts to make it correct
without memory use increase.
This reverts commit bf9e88bfbebaf5c6228363560970fa526e779c8b.
Radiance sum and reset was happening in different order after 26f1c51.
This is a quick fix to unlock Caminandes team, perhaps we can avoid having
separate variable to detect when radiance is to be sum.
Previously render nodes will be always created with just a VECTOR socket
type and then those sockets will try to be set as all point, vector and
normal to work around lack of such a subtype distinguishing in blender.
This change makes it so subtype is being queried from OSL itself and
proper subtupe is being used for socket.
It's still not in use for the official builds because it requires changes
applied recently on the 1.7 branch of OSL:
https://github.com/imageworks/OpenShadingLanguage/commit/f70e58f
This solves artists confusion reported in T46117.
Reviewers: #cycles, juicyfruit
Reviewed By: #cycles, juicyfruit
Subscribers: juicyfruit
Differential Revision: https://developer.blender.org/D1627
Graph::disconnect() actually modifies links, needs to create a copy to iterate
if disconnect happens form inside the loop.
Question tho whether we can control this somehow..
Reported by BzztPloink in IRC, thanks!
The issue was caused by possible use of object->derivedFinal from the render
thread, The patch tries to eliminate (or at least minimize, huh) amount of
access to the derivedFinal of a source object. It's still possible that in
the case of particle source derived mesh will be still unsafely used, but
with the patch applied we can easily change runtime part of the code and
cache derived mesh on the preparation stage.
Some ideas for the future:
- Check whether cache() was called on the point density node when calling
calc().
- Cache derivedMesh in the runtime part of point density node to avoid
possible remained thread conflicts.
- NULL the runtime part of the node on .blend load
Reviewers: campbellbarton, plasmasolutions
Reviewed By: plasmasolutions
Differential Revision: https://developer.blender.org/D1614
Seems set_intersection() requires passing explicit comparator if non-default
one is used for the sets. A bit weird, but can't really find another explanation
here about whats' going on here.
The issue was caused by not really optimal graph traversal for gathering nodes
dependencies which could have exponential complexity with a long tree branches
connected with multiple connections between them.
Now we optimize the depth traversal and perform early output if the node was
already traversed.
Please note that this adds some limitations to the use of SVM compiler's
find_dependencies() in the cases when skip_node is not NULL and one wants to
perform dependencies find sequentially with the same set. This doesn't happen
in the code, but one should be aware of this.
The issue was than nodes dependencies were stored as set<ShaderNode*> which
is actually a so called "strict weak ordered", meaning order of nodes in
the set is strictly defined, but based on the ShaderNode pointer. This means
that between different render invokations order of original nodes could be
different due to different pointers allocated for ShaderNode.
This commit makes it so dependencies and maps used for ShaderNodes are based
on the node->id which has much more predictable order. It's still possible
to trick the system by doing some crazy edits during viewport rendfer and
cause difference between viewport and final render stacks.
Reviewers: brecht
Reviewed By: brecht
Subscribers: LazyDodo
Differential Revision: https://developer.blender.org/D1630
This gives much lower stack usage on GPU and reduces kernel memory size to
around 448MB on GTX560Ti (comparing to 652MB with previous commit and 946MB
with official release). There's also a barely measurable speedup of around
5%, but this is to be confirmed still.
At this stage we're using only ~3% for the experimental kernel and SSS
rendering seems to be faster by 40% and after some further testing we might
consider making SSS and CMJ official features and remove experimental
precompiled kernels.
The idea is to delay shooting indirect rays for the SSS sampling and
trace them after the main integration loop was finished.
This reduces GPU stack usage even further and brings it down to around
652MB (comparing to 722MB before the change and 946MB with previous
stable release).
This also solves the speed regression happened in the previous commit
and now simple SSS scene (SSS suzanne on the floor) renders in 0:50
(comparing to 1:16 with previous commit and 1:03 with official release).
This commit introduces a SSS-oriented intersection structure which is replacing
old logic of having separate arrays for just intersections and shader data and
encapsulates all the data needed for SSS evaluation.
This giver a huge stack memory saving on GPU. In own experiments it gave 25%
memory usage reduction on GTX560Ti (722MB vs. 946MB).
Unfortunately, this gave some performance loss of 20% which only happens on GPU.
This is perhaps due to different memory access pattern. Will be solved in the
future, hopefully.
Famous saying: won in memory - lost in time (which is also valid in other way
around).
This is sort of extension of existing Use Environment option which now allows to
disable AO on the render layer basis.
Useful in cases like disabling AO for the background because it might make it
too flat and so.
Reviewers: juicyfruit, dingto, brecht
Reviewed By: brecht
Subscribers: eyecandy, venomgfx
Differential Revision: https://developer.blender.org/D1633
This gives few percent extra memory saving for the CUDA kernel when
using regular path tracing.
Still more like an experiment, but will be handy in the future.
Summary: By calculating the Camera-to-Screen-Matrix first, one inversion can be saved in the Camera sync.
It won't really improve speed and/or precision, it's mainly a small cleanup.
Reviewers: sergey, dingto
Subscribers:
Gives few percent of memory improvement for regular feature set kernel
and could give significant memory improvement for Experimental kernel.
It could also give some degree of performance improvement, but this I
didn't really measure reliably yet.
Code is ifdef-ed for now, since it's only working on Linux and requires
CUDA toolkit to be installed (other platform only use precompiled
kernels).
This is just an experiment for now and a base for the proper feature
support in the future (with runtime compilation using CUDA 7?).
This just replaces internal argument `experimental` with `requested_features`
making it possible to access particular requested settings when building
kernels.
Basically we can not use sharp closure as a substitude when filter glossy is
used. This is because we can not blur sharp reflection/refraction.
This is quite quick and not really clean implementation. Not really happy
with manual handling of original settings, but this is as good as we can do
in the quick patch. It's a good acknowledgment and we now can re-consider
some aspects of graph simplification to make such cases more natively
supported.
P.S. This failure would have been shown by our regression tests, so please,
bother a bit to run Cycles's test sweep before doing such optimizations.
This commit adds the Blackman-Harris windows function as a pixel filter to Cycles. On some cases, such as wireframes or high-frequency textures,
Blackman-Harris can give subtle but noticable improvements over the Gaussian window.
Also, the gaussian window was truncated too early, which degraded quality a bit, therefore the evaluation region is now three times as wide.
To avoid artifacts caused by the wider curve, the filter table size is increased to 1024.
Reviewers: #cycles
Differential Revision: https://developer.blender.org/D1453