This is part of T57829.
Reduce the number of batches used to only one by shader type. This reduces GPU overhead and increase a lot the FPS. As the number of batches is small, the time to allocate and free memory was reduced in 90% or more.
Also the code has been simplified and all batch management has been removed because this is not necessary. Now, all shading groups are created after all vertex buffer data for all strokes has been created using DRW_shgroup_call_range_add().
All batch cache data has been moved to the Object runtime struct and not as before where some parts (derived data) were saved inside GPD datablock.
For particles, now the code is faster and cleaner and gets better FPS.
Thanks to Clément Foucault for his help and advices to improve speed.
Second part of the fix: do not try at all to compute normals in degenerated
geometry. Just loss of time and potential issues later with weird
invalid computed values.
Previously each of the objects which has armature modifier will
request deformation matricies from bbones. Thing is, all those
deformations are the same and do not depend on object which is
being modified. What's even worse is that this calculation is
not cheap.
This change makes it so bbones deformation is calculated once
and stored in the armature object. After this armature modifiers
simply use it.
With a rigs we've got here dependency graph evaluation time
goes down from 0.02 sec to 0.012 sec.
Possible further optimization is to make bbone deformation
calculated at the time when bone is calculated. This will avoid
an extra threaded loop over all bones.
The goal is to address performance regression when going from
few threads to 10s of threads. On a systems with more than 32
CPU threads the benefit of threaded loop was actually harmful.
There are following tweaks now:
- The chunk size is adaptive for the number of threads, which
minimizes scheduling overhead.
- The number of tasks is adaptive to the list size and chunk
size.
Here comes performance comparison on the production shot:
Number of threads DEG time before DEG time after
44 0.09 0.02
32 0.055 0.025
16 0.025 0.025
8 0.035 0.033
Seems all the usecases of derived mesh are if-defed already,
so no need to have API for it in place, and definitely no
need to waste CPU ticks on converting evaluated mesh to
derived mesh.
Currently we never return CCGDM from the modifier stack,
so the optimization was doing pretty much nothing.
Removing it completely for now, it needs to be re-done
with the new evaluated Mesh/Subdiv.
The whole point is to avoid the need to manually hunt for the
bone, so it makes more sense to unhide it automatically.
If the bone is on multiple layers, just the first one is enabled.
Also, ED_pose_bone_select already checks PBONE_SELECTABLE.
Complex rigs are built from many bones (often overlapping)
connected by constraints.
When investigating or debugging such rigs one often wants to switch to
the target of a constraint, or a parent bone, but it is difficult to do
manually due to overlap confusion.
This adds a right click menu option that automatically selects
and makes the target object or bone active for UI fields where a
suitable reference is readily available.
Use the same data format and loader that the default key-maps use.
This supports nested properties (needed for macros)
and fixes modal key-maps which weren't supported.
This format still needs to be documented.