This is an initial implementation of BVH8 optimization structure
and packated triangle intersection. The aim is to get faster ray
to scene intersection checks.
Scene BVH4 BVH8
barbershop_interior 10:24.94 10:10.74
bmw27 02:41.25 02:38.83
classroom 08:16.49 07:56.15
fishy_cat 04:24.56 04:17.29
koro 06:03.06 06:01.45
pavillon_barcelona 09:21.26 09:02.98
victor 23:39.65 22:53.71
As memory goes, peak usage raises by about 4.7% in a complex
scenes.
Note that BVH8 is disabled when using OSL, this is because OSL
kernel does not get per-microarchitecture optimizations and
hence always considers BVH3 is used.
Original BVH8 patch from Anton Gavrikov.
Batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
There were following issues with ccl_restrict_ptr:
- We already had ccl_restrict for all platforms.
- It was secretly adding `const` qualifier to the declaration,
which is quite weird since non-const pointer can also be
declared as restricted.
- We never in Blender are using foo_ptr or FooPtr type definitions,
so not sure why we should introduce such a thing here.
- It is absolutely wrong from semantic point of view to put pointer
into the restrict macro -- const is a part of type, not part of
hint for compiler that some pointer is never aliased.
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.
To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.
Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.
Finally, thanks to all the people who supported this project:
- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
that could and/or should work better!
The issue was caused by recent change in inline policy.
There is some sort of memory corruption happening here, ASAN suggests
it's stack overflow issue. Not quite sure why it is happening tho and
was not able to solve anything here yet in the past hours.
Committing fix which works with a big TODO note.
The issue is visible on AVX2 machine when rendering cycles_reports_test.
Ever since we merged the extra texture types (half etc) and spit kernel the compile time for cycles_kernel has been going out of control.
It's currently sitting at a cool 1295.762 seconds with our standard compiler (2013/x64/release)
I'm not entirely sure why msvc gets upset with it, but the inlining of matrix near the bottom of the tri-cubic 3d interpolator is the source of the issue, this patch excludes it from being inlined.
This patch bring it back down to a manageable 186 seconds. (7x faster!!)
with the attached bzzt.blend that @sergey kindly provided i got the following results with builds with identical hashes
58:51.73 buildbot
58:04.23 Patched
it's really close, the slight speedup could be explained by the switch instead of having multiple if's (switches do generate more optimal code than a chain of if/else/if/else statements) but in all honesty it might just have been pure luck (dev box,very polluted, bad for benchmarks) regardless, this patch doesn't seem to slow down anything with my limited testing.
{F532336}
{F532337}
Reviewers: brecht, lukasstockner97, juicyfruit, dingto, sergey
Reviewed By: brecht, dingto, sergey
Subscribers: InsigMathK, sergey
Tags: #cycles
Differential Revision: https://developer.blender.org/D2595
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
Based on existing ssef data type and to my knowledge it's also what happens in
Embree nowadays.
Inspired by Maxym Dmytrychenko and required for the upcoming triangle
intersection commit.
Hopefully the copyright message is correct.
This way OpenCL devices can also benefit from a smaller memory footprint, when using e.g. bumpmaps (greyscale, 1 channel).
Additional target for my GSoC 2016.
This is an initial commit for half texture support in Cycles.
It adds the basic infrastructure inside of the ImageManager and support for these textures on CPU.
Supported:
* Half Float OpenEXR images (can be used for e.g HDRs or Normalmaps) now use 1/2 the memory, when loaded via disk (OIIO).
ToDo:
Various things like support for inbuilt half textures, GPU... will come later, step by step.
Part of my GSoC 2016.
Compile time per kernel increased alot after recent image commits, re-shuffle some code to fix this.
Patch by "LazyDodo".
Differential Revision: https://developer.blender.org/D2012
This way, we also save 3/4th of memory for single channel byte textures (e.g. Bump Maps).
Note: In order for this to work, the texture *must* have 1 channel only.
In Gimp you can e.g. do that via the menu: Image -> Mode -> Grayscale
Until now, single channel textures were packed into a float4, wasting 3 floats per pixel. Memory usage of such textures is now reduced by 3/4.
Voxel Attributes such as density, flame and heat benefit from this, but also Bumpmaps with one channel.
This commit also includes some cleanup and code deduplication for image loading.
Example Smoke render from Cosmos Laundromat: http://www.pasteall.org/pic/show.php?id=102972
Memory here went down from ~600MB to ~300MB.
Reviewers: #cycles, brecht
Differential Revision: https://developer.blender.org/D1981
Basically the idea is to make code robust against extending
enum options in the future by falling back to a known safe
default setting when RNA is set to something unknown.
While this approach solves the issues similar to T47377,
but it wouldn't really help when/if any of the RNA values
gets ever deprecated and removed. There'll be no simple
solution to that apart from defining explicit mapping from
RNA value to Cycles one.
Another part which isn't so great actually is that we now
have to have some enum guards and give some explicit values
to the enum items, but we can live with that perhaps.
Reviewers: dingto, juicyfruit, lukasstockner97, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1785
Fix T45769: Image Texture Node clipping bug
Simple mistakes in the normalized/pixel-space coordinates handling.
Render tests for this feature are coming.
Clipping wasn't working totally correct, need to check original coordinates,
not the integer ones,
Now CPU gives the same exact results for both SVM and OSL, CUDA is still doing
something crazy with edges.
Currently only two mappings are supported by API, which is Repeat (old behavior)
and new Clip behavior. Internally this extension is being converted to periodic
flag which was already supported but wasn't exposed.
There's no support for OpenCL yet because of the way how we pack images into a
single texture.
Those settings are not exposed to UI or anywhere else and there should be no
functional changes so far.
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
The goal is to be able to compile kernel with nodes which are actually needed
to render current scene, hence improving performance of the kernel,
The idea is:
- Have few node groups, starting with a group which contains nodes are used
really often, and then couple of groups which will be extension of this one.
- Have feature-based nodes disabling, so it's possible to disable nodes related
to features which are not used with the currently used nodes group.
This commit only lays down needed routines for this approach, actual split will
happen later after gathering statistics from bunch of production scenes.
This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.
This commit also does some changes to the loop inspired by Embree:
- Visibility flags are only checked for primitives.
Doing visibility check for every node cost quite reasonable amount of time
and in most cases those checks are true-positive.
Other idea here would be to do visibility checks for leaf nodes only, but
this would need to be investigated further.
- For minimum hair width we extend all the nodes' bounding boxes.
Again doing curve visibility check is quite costly for each of the nodes and
those checks returns truth for most of the hierarchy anyway.
There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.
Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).
The code is enabled for compilation in kernel. but blender wouldn't use it
still.
Most of them are not currently used but are essential for the further work.
- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i
- Added templatedversions of min4, max4 which are handy to use with register
variables.
- Added util_swap function which gets arguments by pointers.
So hopefully it'll be a portable version of std::swap.