Doesn't currently change anything, but would need for some future
work here.
It uses existing padding in kernel BVH structure, so there is
nothing changed memory-wise.
The issue here was mainly coming from minimal pixel width feature
which is quite commonly enabled in production shots.
This feature will use some probabilistic heuristic in the curve
intersection function to check whether we need to return intersection
or not. This probability is calculated for every intersection check.
Now, when we use multiple BVH nodes for curve primitives we increase
probability of that primitive to be considered a good intersection
for us. This is similar to increasing minimal width of curve.
What is worst here is that change in the intersection probability
fully depends on exact layout of BVH, meaning probability might
change differently depending on a view angle, the way how builder
binned the primitives and such. This makes it impossible to do
simple check like dividing probability by number of BVH steps.
Other solution might have been to split BVH into fully independent
trees, but that will increase memory usage of all the static
objects in the scenes, which is also not something desirable.
For now used most simple but robust approach: store BVH primitives
time and test it in curve intersection functions. This solves the
regression, but has two downsides:
- Uses more memory.
which isn't surprising, and ANY solution to this problem will
use more memory.
What we still have to do is to avoid this memory increase for
cases when we don't use BVH motion steps.
- Reduces number of maximum available textures on pre-kepler cards.
There is not much we can do here, hardware gets old but we need
to move forward on more modern hardware..
Quite simple fix for now which only deals with this case. Maybe we want to do
some "clipping" on image load time so regular textures wouldn't give NaN as
well.
This allows us to use faster math and still have reliable
isnan/isfinite tests.
Only do it for host side, kernels stays unchanged.
Thanks Lukas Stockner for the tip!
Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.
Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
This commit enables record-all behavior of transparent shadows
rays.
Render times difference goes as following:
GTX 1080 render time
BMW -0.5%
Fishy Cat -0.0%
Pabellon Barcelona -11.6%
Classroom +1.2%
Koro -58.6%
Kernel will now use some extra VRAM memory to store the intersection
array (200MB on my configuration). This we can optimize out with some
further commits.
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
Now we break the traversal cycle and then perform volume attenuation
and check with zero throughput. Not sure it makes any measurable sense
at this moment, but in the future it might help de-duplicating some
extra logic here.
The order was wrong from the semantic point of view, caused
by some legacy workarounds in Libmv. Didn't realize it's was
not how things were expected to be used.
This is a speed up option which is mainly useful for viewport. Gives nice speedup in
the barbershop scene of 2x when replacing GI with AO after 2nd bounce without loosing
too much details.
Reviewers: brecht
Subscribers: eyecandy, venomgfx
Differential Revision: https://developer.blender.org/D2383
Currently the tests don't run on windows for the following reasons
1) render_graph_finalize has an linking issue due missing a bunch of libraries (not sure why this is not an issue for linux)
2) This one is more interesting, in test/python/cmakelists.txt ${TEST_BLENDER_EXE_BARE} and ${TEST_BLENDER_EXE} are flat out wrong, but for some reason this doesn't matter for most tests, cause ctest will actually go out and look for the executable and fix the path for you *BUT* only for the command, if you use them in any of the parameters it'll happily pass on the wrong path.
3) on linux you can just run a .py file, windows is not as awesome and needs to be told to run it with pyton.
4) had to use the NAME/COMMAND long form of add_test otherwise $<TARGET_FILE:blender> doesn't get expanded, why? beats me.
5) missing idiff.exe for msvc2015/x64 in the libs folder.
This patch addresses 1-4 , but given I have no working Linux build environment, I'm unsure if it'll break anything there
5 has been fixed in rBL61751
Reviewers: juicyfruit, brecht, sergey
Reviewed By: sergey
Subscribers: Blendify
Tags: #cycles, #automated_testing
Differential Revision: https://developer.blender.org/D2367
Blenders baking system currently doesn't support the topology used by
adaptive subdivision and primitive ids will be wrong or out of range
leading to crashes. Updating the baking system to support other
topologies would be a bit involved, so for now we simply disable
subdivision while baking to avoid crashes.
We started to run out of bits there, so now we separate flags
which came from __object_flags and which are either runtime or
coming from __shader_flags.
Rule now is: SD_OBJECT_* flags are to be tested against new
object_flags field of ShaderData, all the rest flags are to
be tested against flags field of ShaderData.
There should be no user-visible changes, and time difference
should be minimal. In fact, from tests here can only see hardly
measurable difference and sometimes the new code is somewhat
faster (all within a noise floor, so hard to tell for sure).
Reviewers: brecht, dingto, juicyfruit, lukasstockner97, maiself
Differential Revision: https://developer.blender.org/D2428
Cycles add-on did not actually support reloading correctly.
When you want to correctly reload sub-modules (i.e. modules of an add-on
which is a package), you need to use importlib, a mere import will do
nothing with already loaded modules (RNA classes are sort of
pre-registered when they are evaluated, through the meta-class system).
This way we can stop traversing BVH node early on.
Gives about 2-2.5x times render time improvement with 3 BVH steps.
Hopefully this gives no measurable performance loss for scenes with
single BVH step.
Traversal is currently only implemented for QBVH, meaning old CPUs
and GPU do not benefit from this change.
Similar to the previous commit, the statistics goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 46 260
1 27 373
2 18 598
3 15 826
Scene used for the tests is the agent's body from one of the barber
shop scenes (no textures or anything, just a diffuse material).
Once again this is limited to regular (non-spatial split) BVH,
Support of spatial split to this feature will come later.
The idea is to create several smaller BVH nodes for each of the motion
curve primitives. This acts as a forced spatial split for the single
primitive.
This gives up render time speedup of motion blurred hair in the cost
of extra memory usage. The numbers goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 258 191
1 123 278
2 69 453
3 43 627
Scene used for the tests is the agent's hair from one of the barber
shop scenes.
Currently it's only limited to scenes without spatial split enabled,
since the spatial split builder requires some changes to work properly
with motion steps coordinates.
Also fixed some issues with motion keys calculation:
- Clamp lower and upper limits of curves so we can safely call those
functions for the very first and very last curve segment.
- Fixed wrong indexing for the curve radius array.
- Fixed wrong motion attribute offset calculation.
Mimics how regular triangles are working and makes it more clear where
the stuff is located in the kernel.
Needed to have some forward declarations because of the current placement
of things in the kernel.
Was a bit confusing to have transparent and translucent depth
exposed but no diffuse or glossy.
Reviewers: brecht
Subscribers: eyecandy
Differential Revision: https://developer.blender.org/D2399
This is important for the reliable behavior or isnan/isfinite/min/max
functions to work with nan and non-finite values. Some of the issues
with fast math are possible to work around, but didn't find a way to
have reliable min/max implementation yet.
Please NEVER EVER use such a statement, it's only causing HUGE
issues. What is even worse: it's not always possible to immediately
see that the hell is coming from such a statement.
There is still some statements in the existing code, will leave
those for a later cleanup.
This avoids intersection AABB of different curve primitives
which makes it less ray-to-primitive intersections.
This gives about 30% speedup of hair rendering in the barber
shop scenes here. There is still some work to be done on those
files to solve major speed issues on certain frames.
This way we can have different limits for regular and motion curves
which we'll do in one of the upcoming commits in order to gain some
percents of speedup.
The reasoning here is that motion curves are usually intersecting
lots of others bounding boxes, which makes it inefficient to have
single primitive in the leaf node.
Maximal number of elements is supposed to be inclusive. That is what
it was always meant in this file and what @brecht considered still
the case in 6974b69c6172.
In fact, the commit message to that change mentions that we allowed
up to 2 curve primitives per leaf while in fact it was doing up to 1
curve primitive.
Making it real 2 primitives at a max gives about 5% slowdown for the
koro.blend scene. This is a reason why BVHParams.max_curve_leaf_size
was changed to 1 by this change.
Since the beginning of times hair settings in cycles were global for
the whole scene but were located in the particle context. This causes
quite some trickery to get shots set up for the movies here in the
studio by forcing artists to create dummy particle system to change
settings of hair on the shot.
While ideally this settings should be properly become per-particle
system for the time being it will save sweat and blood to move the
settings to scene context.
Reviewers: brecht
Subscribers: jtheninja, eyecandy, venomgfx, Blendify
Differential Revision: https://developer.blender.org/D2287
The issue was that we used to compare number of vertices for mesh after the auto
smooth was applied (at the center of the shutter time) with number of vertices
prior to the auto smooth applied. This caused false-positive consideration of a
mesh as changing topology.
Now we do autosplit as early as possible and do it from blender side, so Cycles
does not need to re-implement splitting on it's side.
This matches behavior of Multiscatter GGX and could become handy later on
when/if we decide it would be beneficial to replace on closure with another.
Reviewers: lukasstockner97, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2413
There was 16 bits reserved for primitive type, while we only need 4.
Reviewers: brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2401
Discard the whole volume stack on the last bounce (but keep
world volume if present).
Volumes are expected to be closed manifol meshes, meaning if
ray entered the volume there should be an intersection event
of ray exisintg the volume. Case when ray hit nothing and
there are still non-world volumes in the stack can happen in
either of cases.
1. Mesh is not closed manifold.
Such configurations are not really supported anyway and should
not be used.
Previous code would have consider the infinite length of the
ray to sample across, so render result wasn't really correct
anyway.
2. Exit intersection is more far away than the camera far
clip distance.
This case also will behave differently now, but previously it
wasn't really correct either, so it's not like we're breaking
something which was working as expected.
3. We missed exit event due to intersection precision issues.
This is exact the case which this patch fixes and avoid
fireflies.
4. Volume has Camera only visibility (all the rest visibility
is set to off)
This is what could be considered a regression but could be
solved quite easily by checking volume stack's objects flags
and keep entries which doesn't have Volume Scatter visibility
(or even better: ensure Volume Scatter visibility for objects
with volume closure),
Fixes T46108: Cycles - Overlapping emissive volumes generates unexpected bright hotspots around the intersection
Also fixes fireflies appearing on the edges of cube with
emissive volue.
Reviewers: juicyfruit, brecht
Reviewed By: brecht
Maniphest Tasks: T46108
Differential Revision: https://developer.blender.org/D2212
The Progress system in Cycles had two limitations so far:
- It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image.
- Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased.
This patch fixes both problems:
First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time.
The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samples*num_tiles, the code now counts every sample for every pixel, so that the final value is num_samples*num_pixels.
Along with that, some unused variables were removed from the Progress and Session classes.
Reviewers: brecht, sergey, #cycles
Subscribers: brecht, candreacchio, sergey
Differential Revision: https://developer.blender.org/D2214
They are defined for MSVC but seems to be missing in GCC and CLang-3.8.
Maybe some further tweaks to policy when to define those functions is
needed, but should be fine for now.
I can no longer reproduce crash with neither of the files where
the crash was originally visible. This is something where other
changes (light threshold, sampling) had an effect and made code
to work as it is supposed to. Could have been optimizator issue
or something like that.
Let's see if we hit same issue again.
There were two cases where correlation issues were obvious:
- File from T38710 was giving issues in 2.78a again
- File from T50116 was having totally different shadow between
sample 1 and sample 32.
Use some more simplified version of CMJ hash which seems to give
nice randomized value which solves the correlation.
This commit will break all unit test files, but it's a bug fix
so perhaps OK to commit this.
This also fixes T41143: Sobol gives nonuniform noise
Proper science paper about hash function is coming.
Reviewers: brecht
Reviewed By: brecht
Subscribers: lukasstockner97
Differential Revision: https://developer.blender.org/D2385
This is a way to avoid possible memory corruption when render threads works
in parallel with UI thread.
Not guarantees complete safe, but makes things easier to check anyway.
Was giving huge artifacts in the barber shop file here in the studio,
Maybe not fully optimal solution, but committing it for now to have
closer look later.
Main intention is to give some quick way to control scene's memory
usage by clamping textures which are too big. This is really handy
on the early production stages when you first create really nice
looking hi-res textures and only when it all works and approved
start investing time on optimizing your scene.
This is a new option in Scene Simplify panel and it acts as
following: when texture size is bigger than the given value it'll
be scaled down by half for until it fits into given limit.
There are various possible improvements, such as:
- Use threaded scaling using our own task manager.
This is actually one of the main reasons why image resize is
manually-implemented instead of using OIIO's resize. Other
reason here is that API seems limited to construct 3D texture
description easily.
- Vectorization of uchar4/float4/half4 textures.
- Use something smarter than box filter.
Was playing with some other filters, but not sure they are
really better: they kind of causes more fuzzy edges.
Even with such a TODOs in the code the option is already quite
useful.
Reviewers: brecht
Reviewed By: brecht
Subscribers: jtheninja, Blendify, gregzaal, venomgfx
Differential Revision: https://developer.blender.org/D2362
There is some define conflict between system headers and clew,
so delay include of clew.h as much as possible.]
This is something which needed to be done in the code before
the refactor, hopefully such change will still work.
For the multi-GPU case users still have to reconfigure the devices they want to use.
Based on patch from Lukas Stockner.
Differential Revision: https://developer.blender.org/D2347
This can be used together with camera culling to keep nearby objects visible in
reflections, using a minimum distance within which objects are visible. It is
also useful to cull small objects far from the camera.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2332
The code was templated already, so don't see big reason to have
3 versions of templated functions. It was giving some extra code
to maintain and in fact already had divergency for support of huge
image resolution (missing size_t cast in byte image loading).
There should be no changes visible by artists.
`kernel_path.h` and `kernel_path_branched.h` have a lot of conditional code and
it was kind of hard to tell what code belonged to which directive. Should be
easier to read now.
Add subframe to the animated seed hash calculation.
Should be no difference for the regular files, only for cases when scene is
rendered from sequencer with a speed effect, which is not really a common thing.
This is a late follow-up commit to the light sample threshold changes which
caused difference in rendering all existing .blend files which is not something
we are happy about: it is fine to use new optimized defaults for new files, but
existing ones should always be rendering in the same way as they used to be.
Sorry for the inconveniece, but such thing should have been done to begin with.
If this setting was modified it will not be reset to zero.
Now all render tests should be passing again.
P.S. Also really annoying to bump subversion for such reasons, but currently we
don't have better way to achieve what we want.
After CUDA dynload changes having CUDA toolkit became required
in order to compile Cycles. This only happened due to wrong
default value to the option.
Previously, it was only possible to choose a single GPU or all of that type (CUDA or OpenCL).
Now, a toggle button is displayed for every device.
These settings are tied to the PCI Bus ID of the devices, so they're consistent across hardware addition and removal (but not when swapping/moving cards).
From the code perspective, the more important change is that now, the compute device properties are stored in the Addon preferences of the Cycles addon, instead of directly in the User Preferences.
This allows for a cleaner implementation, removing the Cycles C API functions that were called by the RNA code to specify the enum items.
Note that this change is neither backwards- nor forwards-compatible, but since it's only a User Preference no existing files are broken.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: brecht, juicyfruit, mib2berlin, Blendify
Differential Revision: https://developer.blender.org/D2338
With this fix, using a MIS map resolution equal to the image size for closest imterpolation or twice the size for linear interpolation gets rid of all fireflies.
Previously, a much higher resolution was needed to get acceptable noise levels.