The assembler version in Windows used to return the previous value
of the variable while all the other versions return the new value.
This is now fixed for consistency.
Note: this bug had no effect on blender because no part of the code
use the return value of these functions, but the future BGE DeckLink
module makes use of it to implement reference counter.
This commit:
* Removes most of all dirty internal details from public atomi_ops.h file, and move them into /intern private subdir.
* Removes unused 'architectures' (__apple__ and jemalloc).
* Split each implementation into its own file.
* Makes use of C99's limits.h system header to determine pointer and int size, instead of using fix hardcoded list of architectures.
* Introduces new 'faked' atomics ops for floats.
Note that we may add a lot more real and 'faked' atomic operations over integers and floats
(multiplication, division, bitshift, bitwise booleans, etc.), as needs arise.
Reviewers: sergey, campbellbarton
Differential Revision: https://developer.blender.org/D1982
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_n are not defined on this architecture
for some reason, however those functions are available.
Adding a workaround for newly used __sync_fetch_and_and() function.
Needed by incomming changes in pbvh.c.
Note that we make it much simpler than for other primitives in this file - think
we could revise its content to make it simpler one day...
For now assume sizeof(int) == 4 for all supported platforms, could be
changed in the future.
Added an assert to functions which depends on this this, so we'll
easily notice bad things happening.
- Re-arrange locks, so no actual memory allocation
(which is relatively slow) happens from inside
the lock. operation system will take care of locks
which might be needed there on it's own.
- Use spin lock instead of mutex, since it's just
list operations happens from inside lock, no need
in mutex here.
- Use atomic operations for memory in use and total
used blocks counters.
This makes guarded allocator almost the same speed
as non-guarded one in files from Tube project.
There're still MemHead/MemTail overhead which might
be bad for CPU cache utilization