Previously, the code would only update the status string if the main status changed.
However, the main status did not include the remaining time, and therefore it wasn't updated until the amount of rendered tiles (which is part of the main status) changed.
This commit therefore makes the BlenderSession remember the time of the last status update and forces a status update if the last one was more than a second ago.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D2465
Pass globals as a bare pointer, same as it sued to be prior to split kernel rework.
AMD CPU platform and Intel OpenCL were complaining about this.
Perhaps we shouldn't pass globals as pointer at all, this isn't something what is
really portable and can cause issues on 32 bit perhaps.
The range is controlled using the following command line arguments:
--cycles-resumable-start-chunk
--cycles-resumable-end-chunk
Those are 1-based index of range for rendering.
While this compiler is not officially supported yet, getting it to work is
a nice thing because more and more AMD cards will fall under MESA driver.
It's also nice to use explicit comparison with NULL, which makes it more
clear whether variable is a boolean or pointer. Even Rust enforces this!
Patch by Ian Bruce with own modifications.
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
Reduces memory allocation for split kernel.
This allows for faster rendering due to bigger global size,
specially when GPU memory is limited.
Perfromance results:
R9 290 total render time
Before After Change
BMW 4:37 4:34 -1.1 %
Classroom 14:43 14:30 -1.5 %
Fishy Cat 11:20 11:04 -2.4 %
Koro 12:11 12:04 -1.0 %
Pabellon Barcelona 22:01 20:44 -5.8 %
Pabellon Barcelona(*) 15:32 15:09 -2.5 %
(*) without glossy connected to volume
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.
Will also be a little faster in some cases because of larger global work size.
Because the split kernel can render multiple samples in parallel it is
necessary to have everything initialized before rendering of any samples
begins. The code that normally handles initialization of
`rng_state` (`kernel_path_trace_setup()`) only does so for the first sample,
which was causing artifacts in the split kernel due to uninitialized
`rng_state` for some samples.
Note that because the split kernel can render samples in parallel this
means that the split kernel is incompatible with the LCG.
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.
Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.
Simple workaround for some issues we've been having with AMD drivers hanging
and rendering systems unresponsive. Unfortunately this makes things a bit
slower, but its better than having to do hard reboots. Will be removed when
drivers have been fixed.
Define CYCLES_DISABLE_DRIVER_WORKAROUNDS to disable for testing purposes.
This does a few things at once:
- Refactors host side split kernel logic into a new device
agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
allows as many threads as will fit in memory regardless of tile size, which
can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
number of arguments passed to kernels. Means there's less code to deal
with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
There was a bug in the intended code behaviour to always seek with a
pitch of 1.0 regardless of pitch/pitch animation/doppler effects.
Check the bug report for a more detailed explanation of problems
concerning pitch and seeking.
The idea is to make it simpler to remove noise from scenes when some prop uses
Sharp glossy closure and causes noise in certain cases. Previously Sharp Glossy
was not affected by Filter Glossy at all, which was quite confusing.
Here is a file which demonstrates the issue: {F417797}
After applying the patch all the noise from the scene is gone.
This change also solves fireflies reported in T50700.
Reviewers: brecht, lukasstockner97
Differential Revision: https://developer.blender.org/D2416
New logic of split_faces was leaving mesh in a proper state
from Blender's point of view, but Cycles wanted loop normals
to be "flushed" to vertex normals.
Now we do such a flush from Cycles side again, so we don't
leave bad meshes behind.
Thanks Bastien for assistance here!
The issue seems to be caused by vertex normal being re-calculated
to something else than loop normal, which also caused wrong loop
normals after re-calculation.
For now issue is solved by preserving CD_NORMAL for loops after
split_faces() is finished, so render engine can access original
proper value.
Doesn't currently change anything, but would need for some future
work here.
It uses existing padding in kernel BVH structure, so there is
nothing changed memory-wise.
The issue here was mainly coming from minimal pixel width feature
which is quite commonly enabled in production shots.
This feature will use some probabilistic heuristic in the curve
intersection function to check whether we need to return intersection
or not. This probability is calculated for every intersection check.
Now, when we use multiple BVH nodes for curve primitives we increase
probability of that primitive to be considered a good intersection
for us. This is similar to increasing minimal width of curve.
What is worst here is that change in the intersection probability
fully depends on exact layout of BVH, meaning probability might
change differently depending on a view angle, the way how builder
binned the primitives and such. This makes it impossible to do
simple check like dividing probability by number of BVH steps.
Other solution might have been to split BVH into fully independent
trees, but that will increase memory usage of all the static
objects in the scenes, which is also not something desirable.
For now used most simple but robust approach: store BVH primitives
time and test it in curve intersection functions. This solves the
regression, but has two downsides:
- Uses more memory.
which isn't surprising, and ANY solution to this problem will
use more memory.
What we still have to do is to avoid this memory increase for
cases when we don't use BVH motion steps.
- Reduces number of maximum available textures on pre-kepler cards.
There is not much we can do here, hardware gets old but we need
to move forward on more modern hardware..
Quite simple fix for now which only deals with this case. Maybe we want to do
some "clipping" on image load time so regular textures wouldn't give NaN as
well.
This allows us to use faster math and still have reliable
isnan/isfinite tests.
Only do it for host side, kernels stays unchanged.
Thanks Lukas Stockner for the tip!
Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.
Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
This commit enables record-all behavior of transparent shadows
rays.
Render times difference goes as following:
GTX 1080 render time
BMW -0.5%
Fishy Cat -0.0%
Pabellon Barcelona -11.6%
Classroom +1.2%
Koro -58.6%
Kernel will now use some extra VRAM memory to store the intersection
array (200MB on my configuration). This we can optimize out with some
further commits.
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
Now we break the traversal cycle and then perform volume attenuation
and check with zero throughput. Not sure it makes any measurable sense
at this moment, but in the future it might help de-duplicating some
extra logic here.
It is not necessary to add MOTO library dependency when we use
WITH_IK_SOLVER (now it uses Eigen) or we use WITH_MOD_BOOLEAN (it was
used by bsp intern library some time ago but it is not present in the
code anymore).
Reviewers: mont29, sergey
Subscribers: mont29, sergey
Differential Revision: https://developer.blender.org/D2477
We (the Microsoft C++ team) use the Blender project as part of our "Real world code" tests.
I noticed a place in WIN32 specific code (dvpapi.cpp:85) where a string literal is losing
its const-ness when being passed to BLI_dynlib_open(). This is not permitted when using the
/permissive- conformance compiler switch (see our blog
https://blogs.msdn.microsoft.com/vcblog/2016/11/16/permissive-switch/)
My suggested fix is to add const and propagate it where needed. Another possible fix would be
to explicitly cast away the const.
Reviewers: mont29, sergey, LazyDodo
Subscribers: Blendify, sergey, mont29, LazyDodo
Tags: #platform:_windows
Differential Revision: https://developer.blender.org/D2495
The order was wrong from the semantic point of view, caused
by some legacy workarounds in Libmv. Didn't realize it's was
not how things were expected to be used.
This is a speed up option which is mainly useful for viewport. Gives nice speedup in
the barbershop scene of 2x when replacing GI with AO after 2nd bounce without loosing
too much details.
Reviewers: brecht
Subscribers: eyecandy, venomgfx
Differential Revision: https://developer.blender.org/D2383
Currently the tests don't run on windows for the following reasons
1) render_graph_finalize has an linking issue due missing a bunch of libraries (not sure why this is not an issue for linux)
2) This one is more interesting, in test/python/cmakelists.txt ${TEST_BLENDER_EXE_BARE} and ${TEST_BLENDER_EXE} are flat out wrong, but for some reason this doesn't matter for most tests, cause ctest will actually go out and look for the executable and fix the path for you *BUT* only for the command, if you use them in any of the parameters it'll happily pass on the wrong path.
3) on linux you can just run a .py file, windows is not as awesome and needs to be told to run it with pyton.
4) had to use the NAME/COMMAND long form of add_test otherwise $<TARGET_FILE:blender> doesn't get expanded, why? beats me.
5) missing idiff.exe for msvc2015/x64 in the libs folder.
This patch addresses 1-4 , but given I have no working Linux build environment, I'm unsure if it'll break anything there
5 has been fixed in rBL61751
Reviewers: juicyfruit, brecht, sergey
Reviewed By: sergey
Subscribers: Blendify
Tags: #cycles, #automated_testing
Differential Revision: https://developer.blender.org/D2367
Blenders baking system currently doesn't support the topology used by
adaptive subdivision and primitive ids will be wrong or out of range
leading to crashes. Updating the baking system to support other
topologies would be a bit involved, so for now we simply disable
subdivision while baking to avoid crashes.
We started to run out of bits there, so now we separate flags
which came from __object_flags and which are either runtime or
coming from __shader_flags.
Rule now is: SD_OBJECT_* flags are to be tested against new
object_flags field of ShaderData, all the rest flags are to
be tested against flags field of ShaderData.
There should be no user-visible changes, and time difference
should be minimal. In fact, from tests here can only see hardly
measurable difference and sometimes the new code is somewhat
faster (all within a noise floor, so hard to tell for sure).
Reviewers: brecht, dingto, juicyfruit, lukasstockner97, maiself
Differential Revision: https://developer.blender.org/D2428
Cycles add-on did not actually support reloading correctly.
When you want to correctly reload sub-modules (i.e. modules of an add-on
which is a package), you need to use importlib, a mere import will do
nothing with already loaded modules (RNA classes are sort of
pre-registered when they are evaluated, through the meta-class system).
This way we can stop traversing BVH node early on.
Gives about 2-2.5x times render time improvement with 3 BVH steps.
Hopefully this gives no measurable performance loss for scenes with
single BVH step.
Traversal is currently only implemented for QBVH, meaning old CPUs
and GPU do not benefit from this change.
Similar to the previous commit, the statistics goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 46 260
1 27 373
2 18 598
3 15 826
Scene used for the tests is the agent's body from one of the barber
shop scenes (no textures or anything, just a diffuse material).
Once again this is limited to regular (non-spatial split) BVH,
Support of spatial split to this feature will come later.
The idea is to create several smaller BVH nodes for each of the motion
curve primitives. This acts as a forced spatial split for the single
primitive.
This gives up render time speedup of motion blurred hair in the cost
of extra memory usage. The numbers goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 258 191
1 123 278
2 69 453
3 43 627
Scene used for the tests is the agent's hair from one of the barber
shop scenes.
Currently it's only limited to scenes without spatial split enabled,
since the spatial split builder requires some changes to work properly
with motion steps coordinates.
Also fixed some issues with motion keys calculation:
- Clamp lower and upper limits of curves so we can safely call those
functions for the very first and very last curve segment.
- Fixed wrong indexing for the curve radius array.
- Fixed wrong motion attribute offset calculation.
Mimics how regular triangles are working and makes it more clear where
the stuff is located in the kernel.
Needed to have some forward declarations because of the current placement
of things in the kernel.