This gives around 30% of speedup for gaussian blur node.
Pretty much straightforward implementation inside the node
itself, but needed to implement some additional things:
- Aligned malloc. It's needed to load data onto SSE registers
faster. based on the aligned_malloc() from Libmv with
some additional trickery going on to support arbitrary
alignment (this magic is needed because of MemHead).
In the practice only 16bit alignment is supported because
of the lack of aligned malloc with arbitrary alignment
for OSX. Not a bit deal for now because we need 16 bytes
alignment at this moment only. Could be tweaked further
later.
- Memory buffers in compositor are now aligned to 16 bytes.
Should be harmless for non-SSE cases too. just mentioning.
Reviewers: campbellbarton, lukastoenne, jbakker
Reviewed By: campbellbarton
CC: lockal
Differential Revision: https://developer.blender.org/D564
On Windows we can only do mmap memory allocation up to 4 GB, which causes a
crash when doing very large renders on 64 bit systems with a lot of memory.
As far as I can tell the reason to use mmap is to get around address space
limitation on some 32 bit operating systems, and I can't see a reason to use
it on 64 bit. For the original explanation see here:
http://orange.blender.org/blog/stupid-memory-problems
Fixes T37841.
Release builds will now use lock-free allocator by
default without any internal locks happening.
MemHead is also reduces to as minimum as it's possible.
It still need to be size_t stored in a MemHead in order
to make us keep track on memory we're requesting from
the system, not memory which system is allocating. This
is probably also faster than using a malloc's usable
size function.
Lock-free guarded allocator will say you whether all
the blocks were freed, but wouldn't give you a list
of unfreed blocks list. To have such a list use a
--debug or --debug-memory command line arguments.
Debug builds does have the same behavior as release
builds. This is so tools like valgrind are not
screwed up by guarded allocator as they're currently
are.
--
svn merge -r59941:59942 -r60072:60073 -r60093:60094 \
-r60095:60096 ^/branches/soc-2013-depsgraph_mt
- Tweaked typedefs in stdint so they match
what we've got in BLI_sys_types (needed to
explicitly tell sign to MSVC).
Not so much harmful to be more explicit here,
but we really better to have single stdint
int blender.
- Tweaked allocations macros so MSVC is happy
with structures allocation.
Also made libmv-capi use guarded objetc allocation.
Run into some suspecious cases when it was not so
clear whether memory is being freed or not.
Now we'll know for sure whether there're leaks or not :)
Having this macros in a guardedalloc header helps
using them in other areas (for now it's OCIO and libmv,
but in the future it'll be more places).
- replace numbers with defines for allocation increments and default array size.
- move array reallocation into a static function (deduplicate 2x).
also fix own mistake with uninitialized slop-space var in memory printing statistics.
This commit attempts to fix the following error:
intern\guardedalloc\intern\mallocn.c: In function 'rem_memblock':
intern\guardedalloc\intern\mallocn.c:977:48: error: conversion to 'intptr_t' from 'size_t' may change the sign of the result [-Werror=sign-conversion]
From the references I've managed to find, it appears that
the second arg to munmap() should be size_t not intptr_t.
Fortunately though, we don't use this arg anyways atm, so
this should be quite harmless...
- Re-arrange locks, so no actual memory allocation
(which is relatively slow) happens from inside
the lock. operation system will take care of locks
which might be needed there on it's own.
- Use spin lock instead of mutex, since it's just
list operations happens from inside lock, no need
in mutex here.
- Use atomic operations for memory in use and total
used blocks counters.
This makes guarded allocator almost the same speed
as non-guarded one in files from Tube project.
There're still MemHead/MemTail overhead which might
be bad for CPU cache utilization
Added an option to show backtrace from where
non-freed datablock was allocated from.
To enable this feature, simply enable DEBUG_BACKTRACE
in mallocn.c file and all unfreed datablocks will
be followed up by a backtrace.
Currently works on linux and osx only,
windows support is on TODO.
This feature is for sure disabled by default,
so does not affect any builds which don't
explicitly define DEBUG_BACKTRACE.