The Progress system in Cycles had two limitations so far:
- It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image.
- Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased.
This patch fixes both problems:
First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time.
The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samples*num_tiles, the code now counts every sample for every pixel, so that the final value is num_samples*num_pixels.
Along with that, some unused variables were removed from the Progress and Session classes.
Reviewers: brecht, sergey, #cycles
Subscribers: brecht, candreacchio, sergey
Differential Revision: https://developer.blender.org/D2214
They are defined for MSVC but seems to be missing in GCC and CLang-3.8.
Maybe some further tweaks to policy when to define those functions is
needed, but should be fine for now.
I can no longer reproduce crash with neither of the files where
the crash was originally visible. This is something where other
changes (light threshold, sampling) had an effect and made code
to work as it is supposed to. Could have been optimizator issue
or something like that.
Let's see if we hit same issue again.
Main intention is to give some quick way to control scene's memory
usage by clamping textures which are too big. This is really handy
on the early production stages when you first create really nice
looking hi-res textures and only when it all works and approved
start investing time on optimizing your scene.
This is a new option in Scene Simplify panel and it acts as
following: when texture size is bigger than the given value it'll
be scaled down by half for until it fits into given limit.
There are various possible improvements, such as:
- Use threaded scaling using our own task manager.
This is actually one of the main reasons why image resize is
manually-implemented instead of using OIIO's resize. Other
reason here is that API seems limited to construct 3D texture
description easily.
- Vectorization of uchar4/float4/half4 textures.
- Use something smarter than box filter.
Was playing with some other filters, but not sure they are
really better: they kind of causes more fuzzy edges.
Even with such a TODOs in the code the option is already quite
useful.
Reviewers: brecht
Reviewed By: brecht
Subscribers: jtheninja, Blendify, gregzaal, venomgfx
Differential Revision: https://developer.blender.org/D2362
This allows to save a memory copy, which will be particularly useful for network rendering.
Reviewers: sergey, brecht, dingto, juicyfruit, maiself
Differential Revision: https://developer.blender.org/D2323
In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways.
To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased.
This is the same approach that's also used in path termination based on length.
Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise.
Reviewers: #cycles
Subscribers: sergey, brecht
Differential Revision: https://developer.blender.org/D2217
When using the Normal output of the Texture Coordinate node on Point and Spot lamps, the coordinates now depend on the rotation of the lamp.
On Area lamps, the Parametric output of the Geometry node now returns UV coordinates on the area lamp.
Credit for the Area lamp part goes to Stefan Werner (from D1995).
Oh man, is it a compiler bug? Is it something we do stupid?
For now more crap to prevent crashes. During the conference will talk to
Maxyn about how can we troubleshoot such weird issues.
Basically don't use rcp() in areas which seems to be critical after
second look. Also disabled some multiplication operators, not sure
yet why they might be a problem.
Tomorrow will be setting up a full test with all cases which were
buggy in our farm to see if this fix is complete.
There is some precision issues for big magnitude coordinates which started
to give weird behavior of release builds. Some weird memory usage in BVH
which is tricky to nail down because only happens in release builds and GDB
reports all variables as optimized out when trying to use RelWithDebInfo.
There are two things in this commit:
- Attempt to make vectorized code closer to original one, hoping that it'll
eliminate precision issue.
This seems to work for transform_point().
- Similar trick did not work for transform_direction() even tho absolute
error here is much smaller. For now disabled that function, need a more
careful look here.
Several ideas here:
- Optimize calculation of near_{x,y,z} in a way that does not require
3 if() statements per update, which avoids negative effect of wrong
branch prediction.
- Optimization of direction clamping for BVH.
- Optimization of point/direction transform.
Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).
Once again ,thanks Maxym for inspiration!
Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
The idea here is to avoid if statements which could cause wrong
branch prediction.
Gives a bit of measurable speedup up to ~1%. Still nice :)
Inspired by Maxym Dmytrychenko, thanks!
Initialization order of global stats and node types was not strictly
defined and it was possible to have node types initialized first and
stats after that. This will zero out memory which was allocated from
the statistics causing assert failure when de-initializing node types.
This was giving some speedup but made intersection tests to fail
from watertight point of view.
Needs deeper investigation, but need to quickly get it fixed for
the studio.
This gives about 5% speedup on AVX2 kernels (other kernels still
have SSE disabled for math operations) and this solves the slowdown
of koro scene mention in the previous commit.
The title says it all actually. This commit also contains
changes to pass float3 as const reference in affected functions.
This should make MSVC happier without breaking OpenCL because it's
only done in areas which are ifdef-ed for non-OpenCL.
Another patch based on inspiration from Maxym Dmytrychenko, thanks!
Based on existing ssef data type and to my knowledge it's also what happens in
Embree nowadays.
Inspired by Maxym Dmytrychenko and required for the upcoming triangle
intersection commit.
Hopefully the copyright message is correct.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
Previously it was falling back to just a path after #include
statement was finished. Now we fall back to a proper current
file name after dealing with the preprocessor statement.
Basically just moves cached kernels from ~/.config/blender/BLENDER_VERSION to
~/.cache/cycles/kernels. This has following benefits:
- Follows XDG specification more closely,
not as if it's totally crucial or measurable by users, but still nice.
- Prevents unexpected sizes of config folder, makes disk space used in more
predictable for users way.
- Allows to share kernels across multiple Blender versions,
which makes it easier debugging at the times close to release.
- "Copy Previous Settings" operator will no longer be copying possibly
gigabytes of cached kernels, which used to lead to really nast disk usage
and annoying delays of copying settings.
- In the future we can have some smart logic to clear old unused cached
kernels.
Currently only done for Linux and OSX. Windows still follows old "cache"
folder logic, but it's not really important for now because we don't
support kernel compilation on this platform yet.
Reviewers: dingto, juicyfruit, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2197
Weirdly enough, this version of XCode seems to have static_assert()
even when NOT using C++11. This is totally weird and counter intuitive
since static_assert() is supposed to be C++11 onlky feature.
Can XCode stop using future, please? :)
This way OpenCL devices can also benefit from a smaller memory footprint, when using e.g. bumpmaps (greyscale, 1 channel).
Additional target for my GSoC 2016.
Now we have the 4 component ones first (float4, byte4, half4) followed by the 1 component ones (float, byte, half).
Makes code a bit more consistent and also reduces code a bit when enabling half support on GPU in next commit.
This also exposed a typo in half CPU images for 3D textures, which wasn't used yet, but good to have that one fixed anyway.
Enables Catmull-Clark subdivision meshes with support for creases and attribute
subdivision. Still waiting on OpenSubdiv to fully support face varying
interpolation for subdividing uv coordinates tho. Also there may be some
inconsistencies with Blender's subdivision which will be resolved at a
later time.
Code for reading patch tables and creating patch maps is borrowed
from OpenSubdiv.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2111
Currently cycles cannot correctly render motion blur for objects that appear or
disappear during the shutter window. Until that can be fixed properly, it may be
better to hide such particles rather than let them render as if they were
stationary for half of the frame.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2125
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
This adds support for ngons and attributes on subdivision meshes. Ngons are
needed for proper attribute interpolation as well as correct Catmull-Clark
subdivision. Several changes are made to achieve this:
- new primitive `SubdFace` added to `Mesh`
- 3 more textures are used to store info on patches from subd meshes
- Blender export uses loop interface instead of tessface for subd meshes
- `Attribute` class is updated with a simplified way to pass primitive counts
around and to support ngons.
- extra points for ngons are generated for O(1) attribute interpolation
- curves are temporally disabled on subd meshes to avoid various bugs with
implementation
- old unneeded code is removed from `subd/`
- various fixes and improvements
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2108
- In fresnel_dielectric, the differentials calculation sometimes divided by zero.
- When the normal map was (0.5, 0.5, 0.5), the code would try to normalize a zero vector. Now, it just uses the regular normal as a fallback.
- The approximate error function used in Beckmann sampling sometimes overflowed to inf while calculating r^16. The final value is 1 - 1/r^16, however,
so now it just returns 1 if the computation would overflow otherwise.
This way restrict can be used for CUDA and OpenCL as well.
From quick tests in areas i've been testing this it might give some
barely measurable %% of speedup, but it increases registers pressure.
So use of this qualifier is still really limited.
This is a special builder type which is allowed to orient nodes to
strands direction, hence minimizing their surface area in comparison
with axis-aligned nodes. Such nodes are much more efficient for hair
rendering.
Implementation of BVH builder is based on Embree, and generally idea
there is to calculate axis-aligned SAH and oriented SAH and if SAH
of oriented node is smaller than axis-aligned SAH we create unaligned
node.
We store both aligned and unaligned nodes in the same tree (which
seems to be different from what Embree is doing) so we don't have
any any extra calculations needed to set up hair ray for BVH
traversal, hence avoiding any possible negative effect of this new
BVH nodes type.
This new builder is currently not in use, still need to make BVH
traversal code aware of unaligned nodes.