Commit Graph

214 Commits

Author SHA1 Message Date
Thomas Dinges
de28a4d4b2 Cycles: Add an AVX kernel for CPU rendering.
* AVX is available on Intel Sandy Bridge and newer and AMD Bulldozer and newer.
* We don't use dedicated AVX intrinsics yet, but gcc auto vectorization gives a 3% performance improvement for Caminandes. Tested on an i5-3570, Linux x64.
* No change for Windows yet, MSVC 2008 does not support AVX.

Reviewed by: brecht
Differential Revision: https://developer.blender.org/D216
2014-01-16 17:04:11 +01:00
d9e52ac98b Code cleanup: move half float functions to separate header file. 2014-01-15 15:29:22 +01:00
8af782ad22 Code cleanup: some reshuffling of SIMD defines moving more code to util_optimization.h. 2014-01-15 15:11:50 +01:00
Thomas Dinges
9e3ddd70d4 Cycles: Disable SSE41 kernel on 32bit, we don't use intrinsics here anyway. Also disable it for Visual Studio < 2012, broken blendv instruction. 2014-01-14 23:51:59 +01:00
Thomas Dinges
9351ac0d85 Cycles: Skip the compilation of the dedicated SSE2 kernel on x86-64, we can assume SSE2 here, so just re-use the regular one. Saves 500kb in the blender binary.
Reviewed by: brecht
Differential Revision: https://developer.blender.org/D199
2014-01-14 20:39:54 +01:00
d980c3eccb Further fix for T37817: non-ascii paths fix in Cycles broke OSL rendering.
Not quite sure yet what is going on here, but this works for me.
2014-01-14 20:01:26 +01:00
Sv. Lockal
1c49eb0072 Cycles, Code cleanup: simplify code for color linear interpolation and float math
Reviewed By: brecht

Differential Revision: https://developer.blender.org/D215
2014-01-14 22:55:02 +04:00
Sv. Lockal
47c5898fa1 Cycles: SSE for Voronoi textures (targeted for Haswell CPUs)
Gives up to 15% speedup scenes with voronoi-based textures (up to 25% with volumes) on Haswell. The performance change for other CPUs is much smaller: 1-2%.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D203
2014-01-12 18:14:00 +04:00
Sv. Lockal
da3fdf0b4b Code Cleanup: in Cycles SSE replace macros with templates, skip unused code with preprocessor, simplify casts 2014-01-11 22:20:03 +04:00
4d72a5e34a Fix T38129: cycles viewport render display with very bright colors turning black.
This happened when exceeding the maximum value representable by half floats.
2014-01-11 00:50:53 +01:00
241fccaf6a Fix T37817: cycles CUDA detection problem on Windows with non-ascii paths. 2014-01-11 00:47:58 +01:00
Thomas Dinges
5022d9f81b Cycles: Don't include SIMD util file for OpenCL/CUDA, this fixes OpenCL compilation. 2014-01-06 21:28:18 +01:00
Sv. Lockal
96903508bc Cycles: SSE optimization for sRGB conversion (gives 7% speedup on CPU for pavillon_barcelone scene)
Thanks brecht/dingto/juicyfruit et al. for testing and reviewing this patch in T38034.
2014-01-06 20:03:30 +04:00
Campbell Barton
a288644b1e Code Cleanup: WIN32 defines, check for _MSC_VER instead of !FREE_WINDOWS 2014-01-03 20:46:12 +11:00
Martijn Berger
1c8a12ee61 Fix T37987: MSVC 2013 has C99 headers and warns for out define hypot _hypot for good reason it seems 2014-01-02 22:19:10 +01:00
Martijn Berger
114284b1fb creation of path from std::string on Windows (VC10/11/12) crashes (access error)
Summary:
Seems to be a known problem in boost: https://svn.boost.org/trac/boost/ticket/6320

Applied the solution but do not know if this is the right place.

Reviewers: dingto, brecht

Reviewed By: dingto

Differential Revision: https://developer.blender.org/D140
2013-12-27 22:38:46 +01:00
Thomas Dinges
1578b55c27 Cycles: Move SIMD utility functions into its own file.
Recently added SSE macros for noise texture can be moved here as well, but I leave this for later.
2013-12-27 21:30:21 +01:00
1e045a2417 Cycles: CUDA runtime kernel compilation can now find CUDA 6.0. 2013-12-13 19:12:07 +01:00
Martijn Berger
85a0c5d4e1 Cycles: network render code updated for latest changes and improved
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.

* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code

This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.

Reviewers: brecht

Differential Revision: http://developer.blender.org/D36
2013-12-07 12:26:58 +01:00
89cfeefab5 Cycles: experimental OSL ptex reading code.
This code can't actually be enabled for building and is incomplete, but it's
here because we know we want to support this at some point and there's not much
reason to have it in a separate branch if a simple #ifdef can disable it.
2013-11-28 02:11:42 +01:00
58ec292fd8 Fix cycles build error with visual studio, apparently the windows ABI does not
like 16 bit alignment on 32 bit.
2013-11-23 05:47:57 +01:00
Martijn Berger
e3a79258d1 Cycles: test code for sse 4.1 kernel and alignment for some vector types.
This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.

WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.

Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
2013-11-22 14:42:41 +01:00
c18712e868 Cycles: change __device and similar qualifiers to ccl_device in kernel code.
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.

Ref T37477.
2013-11-18 08:48:15 +01:00
Campbell Barton
48c1e0c0fc spelling: use American spelling for canceled 2013-10-26 01:06:19 +00:00
Brecht Van Lommel
e9d03296c7 Better fix for #36935 and 36316:
* 32 bit GCC builds now have the SSE BVH optimizations turned off, but still
  compile with SSE flags for better performance.

* White color when rendering on Windows seems to have been unrelated to SSE,
  rather it was a graphics driver not supporting half float textures, added a
  check for that now.
2013-10-05 19:56:34 +00:00
Brecht Van Lommel
6737a04061 Attempt to fix #36935: disable SSE optimizations on 32 bit windows too. Something
strange is going on here, but I don't think it can be fixed before the release,
if it is worth at all spending time on this.
2013-10-04 14:47:37 +00:00
Brecht Van Lommel
e308c2f166 Fix #36316: dots in cycles render on certain CPUs with 32 bit linux builds.
There is some sort of problem with the SSE2 code path, but I couldn't find
the cause, maybe a compiler bug due to the large amount of inlining? For
now I've disabled SSE2 optimizatons in 32 bit GCC builds.
2013-10-02 19:00:16 +00:00
Brecht Van Lommel
cbb783f1d6 Fix cycles OpenCL compile error on AMD, and fix assert in debug builds. 2013-10-02 14:41:04 +00:00
Brecht Van Lommel
60e5abe71f Fix a few issues reported by coverity scan. 2013-09-03 22:39:21 +00:00
Brecht Van Lommel
29f6616d60 Cycles: viewport render now takes scene color management settings into account,
except for curves, that's still missing from the OpenColorIO GLSL shader.

The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
2013-08-30 23:49:38 +00:00
Brecht Van Lommel
6785874e7a Fix #36137: cycles render not using all GPU's when the number of GPU's is larger
than the number of CPU threads
2013-08-30 23:09:22 +00:00
Thomas Dinges
a51f8e4353 Cycles / Standalone:
* Standalone can now be compiled without the GUI, making the glut dependency optional. 

Added WITH_CYCLES_STANDALONE_GUI cmake flag.
2013-08-30 17:34:27 +00:00
Thomas Dinges
ff4e018753 Cycles / Standalone:
* Rename test to standalone.

Note: New CMAKE flag is WITH_CYCLES_STANDALONE.
2013-08-27 02:37:48 +00:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Brecht Van Lommel
d43682d51b Cycles: Subsurface Scattering
New features:

* Bump mapping now works with SSS
* Texture Blur factor for SSS, see the documentation for details:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#Subsurface_Scattering

Work in progress for feedback:

Initial implementation of the "BSSRDF Importance Sampling" paper, which uses
a different importance sampling method. It gives better quality results in
many ways, with the availability of both Cubic and Gaussian falloff functions,
but also tends to be more noisy when using the progressive integrator and does
not give great results with some geometry. It works quite well for the
non-progressive integrator and is often less noisy there.

This code may still change a lot, so unless you're testing it may be best to
stick to the Compatible falloff function.

Skin test render and file that takes advantage of the gaussian falloff:
http://www.pasteall.org/pic/show.php?id=57661
http://www.pasteall.org/pic/show.php?id=57662
http://www.pasteall.org/blend/23501
2013-08-18 14:15:57 +00:00
Campbell Barton
b8c3efc8c3 code cleanup: compiler warnings 2013-07-21 16:40:34 +00:00
Thomas Dinges
636b314677 Code cleanup / Cycles:
* Remove useless else branch, after recent changes.
2013-07-21 10:30:22 +00:00
Thomas Dinges
9732c6283e Cycles / CPU Rendering:
* "Auto Detect" now again uses the umber of cores, instead number of cores + 1.

This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.
2013-07-20 00:40:03 +00:00
Brecht Van Lommel
3d847ed6e6 Fix #36064: cycles direct/indirect light passes with materials that have zero
RGB color components gave non-grey results when you might no expect it.

What happens is that some of the color channels are zero in the direct light
pass because their channel is zero in the color pass. The direct light pass is
defined as lighting divided by the color pass, and we can't divide by zero. We
do a division after all samples are added together to ensure that multiplication
in the compositor gives the exact combined pass even with antialiasing, DoF, ..

Found a simple tweak here, instead of setting such channels to zero it will set
it to the average of other non-zero color channels, which makes the results look
like the expected grey.
2013-07-08 23:31:45 +00:00
Thomas Dinges
c6ce8de20e Code cleanup / Cycles:
* Some cleanup for castings.
2013-06-27 15:48:16 +00:00
Brecht Van Lommel
7902fa57b6 Code cleanup: cycles
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
2013-06-26 23:29:33 +00:00
Thomas Dinges
15f5da4cd4 Cycles / SSE2:
* kernel_sse2 was built without actual SSE2 intrinsics on x86 systems.
2013-06-26 22:12:23 +00:00
Brecht Van Lommel
240fb6fa26 Cycles: ensure any SSE data is allocated 16 byte aligned, happens automatically
on many platforms but is not assured everywhere.
2013-06-22 14:35:09 +00:00
Brecht Van Lommel
8d6e5e2fee Cycles: update build configurations to include CUDA sm_35 architecture. When using
a compiler older than CUDA 5.0 it will give a warning and skip this architecture.
2013-06-20 13:10:47 +00:00
Brecht Van Lommel
f811e6e3ae Cycles: optimized SSE BVH traversal now also works with SSE2 CPUs, so all the
way back to Pentium 4, using a slightly less efficient instruction.

Also ensure /Ox is used for Visual Studio for RelWithDebInfo builds.
2013-06-19 17:54:26 +00:00
Brecht Van Lommel
16204bd647 Cycles: prepare to make CUDA 5.0 the official version we use
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
  is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
  from Martijn Berger and confirmed here), and also for NVidia OpenCL.

Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
2013-06-19 17:54:23 +00:00
Brecht Van Lommel
649dd6f648 Fix cycles crash on some processors. We actually need S-SSE3 support for this
new BVH traversal code, not just SSE3.
2013-06-18 16:52:02 +00:00
Brecht Van Lommel
484d765bd4 Cycles: attempt to fix internal compile error with some visual studio builds 2013-06-18 13:19:16 +00:00
Jürgen Herrmann
5fc1d9205a Cycles BVH Build fix for MSVC 2012.
needs to include intrin.h for _BitScanForward and _BitScanReverse.
2013-06-18 12:32:43 +00:00
Brecht Van Lommel
d57c6748c4 Cycles: optimization for BVH traveral on CPU's with SSE3, using code from Embree.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.

This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
2013-06-18 09:36:06 +00:00