blender/intern
Sergey Sharybin a87fb34eda Use advantage of SSE2 instructions in gaussian blur node
This gives around 30% of speedup for gaussian blur node.

Pretty much straightforward implementation inside the node
itself, but needed to implement some additional things:

- Aligned malloc. It's needed to load data onto SSE registers
  faster. based on the aligned_malloc() from Libmv with
  some additional trickery going on to support arbitrary
  alignment (this magic is needed because of MemHead).

  In the practice only 16bit alignment is supported because
  of the lack of aligned malloc with arbitrary alignment
  for OSX. Not a bit deal for now because we need 16 bytes
  alignment at this moment only. Could be tweaked further
  later.

- Memory buffers in compositor are now aligned to 16 bytes.
  Should be harmless for non-SSE cases too. just mentioning.

Reviewers: campbellbarton, lukastoenne, jbakker

Reviewed By: campbellbarton

CC: lockal

Differential Revision: https://developer.blender.org/D564
2014-06-14 00:38:07 +06:00
..
atomic Fix compilation on unofficial 64bit archs 2014-06-02 16:27:09 +06:00
audaspace Fix T40280: sequencer sound strips with an end at a negative time kept playing 2014-05-20 23:01:56 +02:00
container Rework carve integration into boolean modifier 2014-02-13 17:16:53 +06:00
cycles Cycles: Support builtin images for OSL shading backend 2014-06-13 20:42:28 +06:00
dualcon quiet double-promotion warnings, change octree.cpp to use a float (vector accumulated into a float anyway) 2013-08-06 06:38:52 +00:00
elbeem Cleanup some useless/unneeded #ifdefs for MSVC2013. 2014-03-09 00:25:08 +01:00
ffmpeg Fix video FFmpeg nt being able to produce video files due to usage of deprecated settings 2014-04-15 00:15:09 +06:00
ghost Code cleanup: use const for mouse location arg 2014-06-14 00:47:12 +10:00
guardedalloc Use advantage of SSE2 instructions in gaussian blur node 2014-06-14 00:38:07 +06:00
iksolver Code cleanup: white space and cmake was broken on all platforms 2014-02-03 13:56:34 +11:00
itasc Adapt KDL for compile with clang 3.4, which is stricter with friend classes, 2014-02-17 16:39:03 +01:00
locale Code cleanup: don't use unnecessary .exe extension in scons, simplify code. 2014-04-29 14:03:08 +02:00
memutil Fix T37898: blenderplayer painfully slow in recent builds 2013-12-22 15:26:59 +06:00
mikktspace Style Cleanup: remove preprocessor indentation (updated wiki style guide too) 2013-12-22 14:12:19 +11:00
moto Added GPL header to sconscripts! 2012-12-17 08:01:43 +00:00
opencl Possible fix for [#36086] Activating the opencl option in the compositor causes blender crash 2013-07-17 12:57:03 +00:00
opencolorio Report to the console when custom ocio config is used 2014-05-23 13:48:35 +02:00
opennl remove duplicate sys-types headers. 2013-05-29 21:38:23 +00:00
raskter code cleanup: use NULL rather then 0 for pointers, and make vars static where possible. 2013-03-22 05:34:10 +00:00
rigidbody Cleanup: Use doxy for more structured comments 2014-05-29 21:17:48 +10:00
smoke Code Cleanup: WIN32 defines, check for _MSC_VER instead of !FREE_WINDOWS 2014-01-03 20:46:12 +11:00
string fix for buffer out-of-bounds reading for STR_String comparisons with char arrays. 2013-03-22 21:26:59 +00:00
utfconv Warning cleanup: 2014-03-22 14:41:38 +02:00
CMakeLists.txt Rework carve integration into boolean modifier 2014-02-13 17:16:53 +06:00
SConscript Totally remove BSP from SConscript 2014-02-19 15:46:44 +06:00