Old code was working quite unreliable in combination with fast math
flag, especially when compiling with Clang. It seems we were hitting
result of the following bug submitted to Clang [1].
Basically, it was happening so that (int)sqrtf(64) was 7 when Cycles
is built with Clang but was correct 8 when built with GCC.
This commit works this around. Annoying, but don't see other way to
keep sampling pattern the same for Clang and GCC.
[1] https://bugs.llvm.org//show_bug.cgi?id=24063
The issue was happening when display device is set to None, which makes it so
all the color transformation is a no-op which does not really require any LUT.
This is something we can not know from Blender side easily, because LUT sampling
and related logic is fully done in OCIO library itself. The following happens:
- OCIO sees that no LUT is needed and uses simple pass-through logic in the
color conversion function.
- GLSL compiles sees that uniform used for LUT is unused in the GLSL code and
strips it out.
We can not know this from Blender side because technically any conversion to
the same space might be a no-op and that we wouldn't know without some tricky
parse of the OCIO configuration.
So for now we simply avoid crash but are disabling checks for existence of the
uniform.
Ideally would be nice to have some GLSL-code parses which gets the uniforms
from the code itself, so we can distinguish between typo in the uniform name
and uniform being optimized out.
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.
To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.
Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.
Finally, thanks to all the people who supported this project:
- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
that could and/or should work better!
Adds 2 static unsigned int to Gawain and GPUTexture to roughly keep track of the memory stats on the GPU.
Stats can be view at the bottom of the GPU stats with debug value > 20.
The issue was caused by unlimited textures commit, root of the issue is that
displacement code updates some of the image slots directly, so it needs to
ensure device vectors are all proper size.
Previously, every RenderPass would have a bitfield that specified its type. That limits the number of passes to 32, which was reached a while ago.
However, most of the code already supported arbitrary RenderPasses since they were also used to store Multilayer EXR images.
Therefore, this commit completely removes the passflag from RenderPass and changes all code to use the unique pass name for identification.
Since Blender Internal relies on hardcoded passes and to preserve compatibility, 32 pass names are reserved for the old hardcoded passes.
To support these arbitrary passes, the Render Result compositor node now adds dynamic sockets. For compatibility, the old hardcoded sockets are always stored and just hidden when the corresponding pass isn't available.
To use these changes, the Render Engine API now includes a function that allows render engines to add arbitrary passes to the render result. To be able to add options for these passes, addons can now add their own properties to SceneRenderLayers.
To keep the compositor input node updated, render engine plugins have to implement a callback that registers all the passes that will be generated.
From a user perspective, nothing should change with this commit.
Differential Revision: https://developer.blender.org/D2443
Differential Revision: https://developer.blender.org/D2444
Reduce thread divergence in kernel_shader_eval.
Rays are sorted in blocks of 2048 according to shader->id.
On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster.
No sorting for CUDA split kernel.
Reviewers: sergey, maiself
Reviewed By: maiself
Differential Revision: https://developer.blender.org/D2598
Previously the logic was different for duplis and regular objects: regular objects
were using render visibility when Render Layer option is enabled which duplis were
always using viewport visibility when rendering from the viewport.
This was quite confusing because caused different results in viewport and render
when artists were expecting them to match 1:1.
This implements branched path tracing for the split kernel.
General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.
Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.
Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.
Reviewers: sergey, nirved
Reviewed By: nirved
Subscribers: Blendify, bliblubli
Differential Revision: https://developer.blender.org/D2611
We can now use object and other modes on top of Cycles.
Since we are now always on "render_to_view" (old Rendered mode), the
pause button is always visible.