Everything was fine if one batch is always used with instancing. But problem arise if the next drawcall for this batch is not using instancing as the attrib divisor stays set to 1 in th VAO.
As instancing is less used than normal drawing I prefer to reset the divisor after drawing as it is reset before drawing instances.
Tried 101 but it gives colisions.
I think 257 is enough now that we dont have thousands of uniforms.
This gives some noticeable performance improvement.
Could be refined further.
This changes quite a few things:
- Drops the allocation of inputs as a chunk.
- Merge the linked list system into the Gwn_ShaderInput.
- Put name buffer into another memory block, easily resizable.
- Use offset instead of char* to direct to input name.
- Add only requested uniforms dynamicaly to the Shader Interface.
This drops some minor optimisation and use a bit more memory for small shaders (which are fixed count).
But this saves a lot of memory when using UBOs because the names and the Gwn_ShaderInput were alloc'ed for every UBO variable.
This also reduce the Shader Interface initial generation.
The lookup time is left unchanged.
This is an internal structure, and we don't put it to a list for anything else
that hash collision resolution. No need to have dedicated entry here, saves us
from extra allocation and pointer dereference.
This way we reduce number of loops from look-over-all-inputs to
loop-over-collision, which is expected to be much less CPU ticks.
There is still possible optimization: use memory pool of some sort
to manage memory needed for hash entries, but that will only speedup
shader interface construction / deconstruction time.
There are also some trickery happening to speed up process even more
in the case there is no hash collisions detected when constructing
shader interface.
Flag ownership for each index array & vbo's
so we don't have to manually keep track of this and use the right free call.
Instead this can be passed on creation.
See D2676
This avoids using GWN_vertbuf_attr_set which needs to calculate the
offset and perform a memcpy every call.
Exposing the data directly allows us to avoid a memcpy in some cases
and means we can write to the vertex buffer's memory directly.
UNIFORM_NONE should never match a valid uniform (builtin or custom).
The logic for UNIFORM_CUSTOM was just wrong, since it returned the first custom uniform. This function should only accept builtin (non-custom) uniforms.
Quick hash rejection instead of string comparison. Uniform lookups already work this way. I don't expect a major overall speedup since attributes are looked up less frequently than uniforms.
Before this change Gawain was doing list lookup twice,
doing string comparison of every and each input which
is not efficient and not friendly for CPUs with small
cache size.
Now we store hash of input name together with actual
name and compare hashes first. Additionally, we do
everything in a single pass which is much better from
cache coherency point of view.
This brings Eevee cache population time from 80ms to
60ms on my desktop and from 800ms to 400ms for Clement
when navigating in a file from T50027.
Reviewers: merwin, dfelinto
Subscribers: fclem
Differential Revision: https://developer.blender.org/D2697
This function is not performance critical, but I prefer the branch-free code and no hack needed to appease gcc.
Follow-up to recent 23035cf46fb4dd6a0bf7e688b0f15128030c77d1 and f637145450010d14660fcb029d41560a138eae14.
Goal is to make most of the API independent of OpenGL, Vulkan, any other backend.
Able to remove default case from ElementList_size because IndexType only covers index types. Not that and *everything else* like GLenum.
There is no more point of keep those around. ES20 may need special case
when/if we dabble with it again. Meanwhile no point on polluting the
code with this.
(ghost still has reference for the PROFILE, but that's reasonable)
Revert 7a18ee62eb4d6c6028d05f1da259fe8695f49a3f and 1ff97bbfff78a0c375fb5256a9d9d37cd3973bbe after discussing with @fclem.
VertexBuffer_size should always report the same buffer size, but without asking/calling OpenGL.