Also extended the size of buf[] in print_error() to prevent mem_printmemlist_pydict_script[]
from getting truncated when MEM_printmemlist_pydict() is used.
Differential revision: https://developer.blender.org/D675
Reviewed by: Campbell Barton
This gives around 30% of speedup for gaussian blur node.
Pretty much straightforward implementation inside the node
itself, but needed to implement some additional things:
- Aligned malloc. It's needed to load data onto SSE registers
faster. based on the aligned_malloc() from Libmv with
some additional trickery going on to support arbitrary
alignment (this magic is needed because of MemHead).
In the practice only 16bit alignment is supported because
of the lack of aligned malloc with arbitrary alignment
for OSX. Not a bit deal for now because we need 16 bytes
alignment at this moment only. Could be tweaked further
later.
- Memory buffers in compositor are now aligned to 16 bytes.
Should be harmless for non-SSE cases too. just mentioning.
Reviewers: campbellbarton, lukastoenne, jbakker
Reviewed By: campbellbarton
CC: lockal
Differential Revision: https://developer.blender.org/D564
On Windows we can only do mmap memory allocation up to 4 GB, which causes a
crash when doing very large renders on 64 bit systems with a lot of memory.
As far as I can tell the reason to use mmap is to get around address space
limitation on some 32 bit operating systems, and I can't see a reason to use
it on 64 bit. For the original explanation see here:
http://orange.blender.org/blog/stupid-memory-problems
Fixes T37841.
Release builds will now use lock-free allocator by
default without any internal locks happening.
MemHead is also reduces to as minimum as it's possible.
It still need to be size_t stored in a MemHead in order
to make us keep track on memory we're requesting from
the system, not memory which system is allocating. This
is probably also faster than using a malloc's usable
size function.
Lock-free guarded allocator will say you whether all
the blocks were freed, but wouldn't give you a list
of unfreed blocks list. To have such a list use a
--debug or --debug-memory command line arguments.
Debug builds does have the same behavior as release
builds. This is so tools like valgrind are not
screwed up by guarded allocator as they're currently
are.
--
svn merge -r59941:59942 -r60072:60073 -r60093:60094 \
-r60095:60096 ^/branches/soc-2013-depsgraph_mt
- Tweaked typedefs in stdint so they match
what we've got in BLI_sys_types (needed to
explicitly tell sign to MSVC).
Not so much harmful to be more explicit here,
but we really better to have single stdint
int blender.
- Tweaked allocations macros so MSVC is happy
with structures allocation.
Also made libmv-capi use guarded objetc allocation.
Run into some suspecious cases when it was not so
clear whether memory is being freed or not.
Now we'll know for sure whether there're leaks or not :)
Having this macros in a guardedalloc header helps
using them in other areas (for now it's OCIO and libmv,
but in the future it'll be more places).
- replace numbers with defines for allocation increments and default array size.
- move array reallocation into a static function (deduplicate 2x).
also fix own mistake with uninitialized slop-space var in memory printing statistics.
This commit attempts to fix the following error:
intern\guardedalloc\intern\mallocn.c: In function 'rem_memblock':
intern\guardedalloc\intern\mallocn.c:977:48: error: conversion to 'intptr_t' from 'size_t' may change the sign of the result [-Werror=sign-conversion]
From the references I've managed to find, it appears that
the second arg to munmap() should be size_t not intptr_t.
Fortunately though, we don't use this arg anyways atm, so
this should be quite harmless...
- Re-arrange locks, so no actual memory allocation
(which is relatively slow) happens from inside
the lock. operation system will take care of locks
which might be needed there on it's own.
- Use spin lock instead of mutex, since it's just
list operations happens from inside lock, no need
in mutex here.
- Use atomic operations for memory in use and total
used blocks counters.
This makes guarded allocator almost the same speed
as non-guarded one in files from Tube project.
There're still MemHead/MemTail overhead which might
be bad for CPU cache utilization
Added an option to show backtrace from where
non-freed datablock was allocated from.
To enable this feature, simply enable DEBUG_BACKTRACE
in mallocn.c file and all unfreed datablocks will
be followed up by a backtrace.
Currently works on linux and osx only,
windows support is on TODO.
This feature is for sure disabled by default,
so does not affect any builds which don't
explicitly define DEBUG_BACKTRACE.
non-threadsafe usage of guarded allocator.
Also added small chunk of code to check consistency of begin/end
threaded malloc.
All this additional checks are commented and wouldn't affect on
builds, however found them helpful to troubleshoot issues so
decided to commit it to SVN.
This implements AO baking directly from multi-resolution mesh with much
less memory overhead than regular baker.
Uses rays distribution implementation from Morten Mikkelsen, raycast
is based on RayObject also used by Blender Internal.
Works in single-thread yet, multi-threading would be implemented later.
http://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Personal%20Builds/ray_linn/GCC-4.7.0-with-ada/mingw-w64-gcc-4.7.0-runtime-2.0.1-static-ada-20120330.7z/download
Other builds may also work but due to the constantly changing nature of the compiler this cannot be guaranteed. I often had to change compilers while building the libraries and this one is the one that did the job for most of them.
This first support is experimental and considered "advanced". To enable pass -DWITH_MINGW64 during cmake configuration. Also make sure to extract the compiler on C:/MinGW and that MinGW/bin is in your path. To build check out lib/mingw64.
Initially the support is lacking until I get every library compiled correctly. For now you should disable WITH_CYCLES(sorry, I know some people are dying to do benchmarks, but still a few libs to go), WITH_IMAGE_OPENEXR, WITH_OPENCOLLADA, WITH_LIBMV and WITH_CODEC_FFMPEG(links but hangs on startup).
Still the tools are working, the memory limit is increased and due to the experimental nature of the setup, full optimization with SSE2 is available, which makes the build quite fast. Also the compiler and especially, the linker are way faster than regular MinGW.
The wiki docs have also updated. Happy testing!
- define __BIG_ENDIAN__ or __LITTLE_ENDIAN__ with cmake & scons.
- ENDIAN_ORDER is now a define rather than a global short.
- replace checks like this with single ifdef: #if defined(__sgi) || defined (__sparc) || defined (__sparc__) || defined (__PPC__) || defined (__ppc__) || defined (__hppa__) || defined (__BIG_ENDIAN__)
- remove BKE_endian.h which isn't used
blender_add_lib now takes a separate include argument to suppress warnings in system includes (mostly ffmpeg & python).
also only build wm_apple.c on apple+carbon configuration.
- modifier code was using sizeof() without knowing the sizeof the array when clearing the modifier type array.
- use BLI_snprintf rather then sprintf where the size of the string is known.
- particle drawing code kept a reference to stack float values (not a problem at the moment but would crash if accessed later).
- enabling/disabling no longer prints in the terminal unless in debug mode.
- remove 'header' struct from BLI_storage_types.h, from revision 2 and is not used.
- Add GCC property to guardedalloc to warn if the return value from allocation functions isn't used.
double click didnt check mouse distance moved so you could click twice in different areas of the screen very fast and generate a double click event which had old mouse coords copied into it but was sent to an operator set to run on single click (because the double click wasnt handled).
Also added MEM_name_ptr function (included in debug mode only), prints the name of allocated memory.
used for debugging where events came from.