No need to store them in the class, they're unlikely to be changed
and if they do change we're in big trouble anyway.
More appropriate approach would be then to typedef this things in
kernel_types.h, but still use inlined sizeof(),
This makes OCIO viewport color correction a little bit faster (about -0.5s for 100 samples)
Also set max half float value to 65504.0 to conform with IEEE 754.
Only those ones are priority for now, all the rest are still testable
if CYCLES_OPENCL_TEST or CYCLES_OPENCL_SPLIT_KERNEL_TEST environment
variables are set.
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
The goal is to be able to compile kernel with nodes which are actually needed
to render current scene, hence improving performance of the kernel,
The idea is:
- Have few node groups, starting with a group which contains nodes are used
really often, and then couple of groups which will be extension of this one.
- Have feature-based nodes disabling, so it's possible to disable nodes related
to features which are not used with the currently used nodes group.
This commit only lays down needed routines for this approach, actual split will
happen later after gathering statistics from bunch of production scenes.
This will be used by split kernel in order to compile most optimal kernel.
Maximum number of closures is actually being cached in the session, so viewport
rendering will not trigger kernel re-loading when number of closures goes down.
This is currently unused but crucial for things like calculating amount of
device memory required to deal with the tasks.
Maybe not really best place to store it, but consider it good enough for now.
Previously we only had experimental flag passed to device's load_kernel() which
was all fine. But since we're gonna to have some extra parameters passed there
it makes sense to wrap them into a single struct, which will make it easier to
pass stuff around.
This header was already included into some of the implementation files already,
and this change is needed for some upcoming changes in the way how kernel_types.h
works.
Previously it was only set at compilation time which is all fine but does
not let us to check which closure the node corresponds to prior to the
compilation.
Now we calculate color in range 800..12000 using an approximation a/x+bx+c for R and G and ((at + b)t + c)t + d) for B.
Max absolute error for RGB for non-lut function is less than 0.0001, which is enough to get the same 8 bit/channel color as for OSL with a noticeable performance difference.
However there is a slight visible difference between previous non-OSL implementation because of lookup table interpolation and offset-by-one mistake.
The previous implementation gave black color outside of soft range (t > 12000), now it gives the same color as for 12000.
Also blackbody node without input connected is being converted to value input at shader compile time.
Reviewers: dingto, sergey
Reviewed By: dingto
Subscribers: nutel, brecht, juicyfruit
Differential Revision: https://developer.blender.org/D1280
This way it is possible to have viewport simplification bumped all the way up,
making viewport really responsive but still have final render to use highest
subdivision possible.
Reviewers: lukastoenne, campbellbarton, dingto
Reviewed By: campbellbarton, dingto
Subscribers: dingto, nutel, eyecandy, venomgfx
Differential Revision: https://developer.blender.org/D1273
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.
Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.
This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain
In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.
Patch by Sergey and myself.
Differential Revision: https://developer.blender.org/D1264
Object flags are depending on bounding box which is only available after
mesh synchronization.
This was broken since 7fd4c44 which happened quite close to the release
and oddly enough was not sopped by anyone. Render test is coming for this.
Was spotted by Thomas Dinges while working on another patch.