This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.
WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.
Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
Instead of having ifdef __GNUC__ all over the headers
to use special compiler's hints use a special file where
all things like this are concentrated.
Makes code easier to follow and allows to manage special
attributes in more efficient way.
Thanks Campbell for review!
precompiled cubins instead,
Logic here is following now:
- If there're precompiled cubins, assume CUDA compute is available,
otherwise
- If cuda toolkit found, assume CUDA compute is available
- In all other cases CUDA compute is not available
For windows there're still check for only precompiled binaries,
no runtime compilation is allowed.
Ended up with such decision after discussion with Brecht. The thing
is, if we'll support runtime compilation on windows we'll end up
having lots of reports about different aspects of something doesn't
work (you need particular toolkit version, msvc installed, environment
variables set properly and so) and giving feedback on such reports
will waste time.
Initial support of OSL builds using SCons build system. Only tested on Linux now.
No changes to configuration files themselves -- for now check how it's configured
for linux buildbot (it was already horror to make all this changes and verify them,
changes to linux-config.py could easily be done later).
Currently WITH_BF_STATICOSL and WITH_BF_STATICLLVM are more like rudiments because
linking against oslexec requires special trick with --whole-archive. We woul either
need to find a way dealing with this oslexec less hackish or drop STATICOSL and
STATICLLVM flags. Will keep dropping this flags for until we have "final" build
rules for OSL.
Still can not make 32bit linux rendering with OSL -- blender simply crashes when
starting rendering. So for time being this issues are solving disabled OSL for
32bit build slaves.
* SSE/SSE2 is an unknown option for the compiler (Command line warning D9002 : ignoring unknown option '/arch:SSE2'), so it can be left out because on x64 it automatically builds with SSE and SSE2.
* Compile all of cycles with -ffast-math again
* Add scons compilation of cuda binaries, tested on mac/linux.
* Add UI option for supported/experimental features, to make it
more clear what is supported, opencl/subdivision is experimental.
* Remove cycles xml exporter, was just for testing.
* Fix excessive fireflies in Velvet BSDF (patch by David).
* Disable some unused SSE code
* Remove RTTI disabling flags for now, this is giving some compile issues and
was only needed of OSL which we're not using yet.