`RootPChanMap` will be nullptr when building DEG object-level
constraints (as as opposed to bone constraints where this map is built
prior), and in that case we don't need to check for the common chains in
`bone_target_opcode`.
Thx @sergey for additional confirmation
Pull Request: https://projects.blender.org/blender/blender/pulls/118745
On lower end hardware the film accumulation has bad performance. Sometimes
upto 10ms. This PR improves the performance somewhat by adding a
specialization constant around the renderpasses that are actually needed for
rendering, the number of samples and if reprojection is enabled.
`enabled_categories`: Based on the enabled render passes some outer loops are
enabled/disabled that handle the specific render passes. This improves the performance
as no memory will be reserved for branches that are never accessed.
`samples_len` & `use_reprojection`: GPU compilers tend to optimize texture fetches
when they to the outer loop. This is only possible when the inner loop can be unrolled.
In the case of the film accumulation the inner loop couldn't be unrolled. By adding a
specialization constant would allow unrolling of the inner loop.
On old or low-end devices the improvement is around 40%. On newer devices
the improvement is 50+%. Performance of this shader is similar to
the godot.
| GPU | Before | New |
|----------------------|--------|-------|
| NVIDIA GTX 760 | 3.5ms | 2.4ms |
| GFX1036 (RDNA2 iGPU) | 9.9ms | 6.2ms |
| AMD Radeon Pro W7500 | 2.1ms | 0.9ms |
Pull Request: https://projects.blender.org/blender/blender/pulls/118385
When implementing film accumulation specialization constants we came
across a missing implementation for uint as specialization constant.
This is a split-off from the original patch to add support for uint.
When using it is important to compile with asserts on. uint can be casted
to int without knowning. There are assert mechanism that point you to
these cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/118750
Currently there are two vertex buffers that contain mesh normals. First, the
normals are extracted and stored interleaved with positions. Then there is
a second vertex buffer for just normals. Interleaving them makes some
sense, since they change together, but it fights with the contiguous storage
benefits of `Mesh` and generally makes code more difficult to optimize.
This PR removes the normals interleaved with the positions and changes
the code for extracting positions and normals from meshes to be simpler
and faster, mainly by not using the "extract iterators" as described by the
#116901 design task. That moves most of the branching outside of hot
loops, so we don't do the same work for every mesh element. This also
gives us the option of not calculating or extracting normals in more
situations like wireframe display in the future.
This is only a small part of the work for #116901, so the state of the code
after this PR will have more design inconsistencies. I'll keep working to
resolve those in the future.
In general I observed at least a 5-40% improvement in FPS in playback
of files with large meshes.
Pull Request: https://projects.blender.org/blender/blender/pulls/116902
When building a non portable build or when not using the precompiled
libraries, do not enable the CPU checker.
Make the cmake configure step error out when building with
WITH_STRICT_BUILD_OPTIONS if the LIBDIR can not be found.
Pull Request: https://projects.blender.org/blender/blender/pulls/118519
This is a migration of the current Line Art modifier to GPv3.
Note:
- The modifier is using the exact same DNA structure as the old one, it's re-defined in a different name. At the moment all the variable names and placement after the `ModifierData` part should stay exactly the same until we do proper versioning of the modifier data and completely remove the GPv2 support.
- Vertex weight transfer feature no longer supports name initial matching ("group" used to match "group1","group2" etc). Now it will only transfer vertex weight from source vertex groups that has the exact same name as specified.
Pull Request: https://projects.blender.org/blender/blender/pulls/117028
The Viewport Compositors crashes when there are many nodes that are not
connected to the compositor or viewer outputs.
That's because those sockets were wrongly added to the shader operation,
even though they will not be used, which surpasses the limit for the
maximum image units per shader.
This adds support by just reusing the GGX reflection LTC
look-up table. This avoid more memory usage for another
table.
This is quite a hack and has no real physical ground.
We already have a roughness remapping function for
reusing sphere-probe for refraction and matching the
blur level. We can reuse this function and use it
for sampling the reflection LUT.
Then getting the theta LUT parameter is done by
computing the angle between the refraction direction
and the reversed normal.
This works because the table is parametrized using the
angle between the view vector and the normal. This angle
is the same as the angle between the reflection vector
and the normal. So to get the equivalent lobe in the
refraction direction we get the angle between the
refraction direction and the reversed normal.
Note: This has issues shadow-map tagging but it should
be fixed separately.
Pull Request: https://projects.blender.org/blender/blender/pulls/118589
Unit tests were assuming that creating a catalog from a path would not
create catalogs for the parent path elements if missing. I'd argue this
should not be unit tested since it's internal behavior that isn't
visible to API users. But for now I'll keep the test working as is, also
to avoid indirect recursive calls of `create_missing_catalogs()`.
Metal uses an union to store the `gl_WorkGroupSize` the union needs to
be unpacked. We first unpack to uvec3 before in order to work around an
NVIDIA driver bug.
Issue introduced by: e3ac2ac93e6ffdf704adad93a9f59ec2c716989f
Pull Request: https://projects.blender.org/blender/blender/pulls/118749
So far this would include commits committed by the given user, but
authored by someone else. Unfotunately we can't use email addresses to
filter these out, since we can't get the email addresses associated with
an account from gitea, or do a user lookup by email. In my testing the
commit author email and the publicly visible account email would
mismatch in most cases.
NVIDIA fails with segmentation fault when compiling shaders due to recent changes.
This PR tweaks the shader code to work around the segmentation fault.
Issue introduced by: 7f43699ebf50a7c1f4d8432529ea3cfe2abb78f9
Pull Request: https://projects.blender.org/blender/blender/pulls/118744
Rebuilding the tree immediately after changes could cause the tree to be
rebuilt multiple times. More importantly, it made it harder to reason
about thread safety, since we would touch the tree within a whole bunch
of API functions. Now tree building is simplified and managed in a
single place, so making the tree building thread safe can be made
trivially in a follow-up.
Note, this means the initial catalog tree building doesn't happen in a
background thread together with loading the asset library and catalogs
anymore. But we would already do all further rebuilds on the main thread
anyway, this shouldn't have any notable impact.
Line art used to not calculate edges where both ends are outside image
frame, this will lead to missing edges in some cases where the model is
scaled up pretty big. Now it ensures those edges are still preserved.
Pull Request: https://projects.blender.org/blender/blender/pulls/118448
Line art shadow projection will cut lines indefinitely when it
encounters a edge segment with 0 length. In the case of #118547, it was
caused by the combination bevel modifier and the view angle. This fix
ensures that no such edge is worked on further.
Pull Request: https://projects.blender.org/blender/blender/pulls/118613
The objective is to be able to create your own GLSL shaders in Blender.
This improves the workflow since all shader programming can be done
directly in Blender. In addition, the GLSL language is a very popular
language in the video games industry and even in general.
Ref !116793
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
This optimizes a few loops that become significant bottlenecks during
viewport rendering of scenes with large numbers of curves.
To render a curves object, Blender needs to generate a potentially
very large (but trivial) index buffer. As previously implemented,
this index buffer is generated in an extremely inefficient manner,
with a single-threaded loop and an explicit function call per entry.
The buffer then needs to be pushed onto the GPU, which is also a fairly
slow task.
The PR generates the index buffer directly on the GPU with compute
shader.
Pull Request: https://projects.blender.org/blender/blender/pulls/116617
The standard `threading::parallel_for` function tries to split the range into
uniformly sized subranges. This is great if each element takes approximately
the same amount of time to compute.
However, there are also situations where the time required to do the work for
a single index differs significantly between different indices. In such a case,
it's better to split the tasks into segments while taking the size of each task into
account.
This patch implements `threading::parallel_for_weighted` which allows passing
in an additional callback that returns the size of each task.
Pull Request: https://projects.blender.org/blender/blender/pulls/118348