* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
* Added a node to convert wavelength (in nanometers, from 380nm to 780nm) to RGB values. This can be useful to match real world colors easier.
* Code cleanup:
** Moved color functions (xyz and hsv) into dedicated utility files.
** Remove svm_lerp(), use interp() instead.
Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Wavelength
Example render:
http://www.pasteall.org/pic/show.php?id=53202
This is part of my GSoC 2013. (revisions 57322, 57326, 57335 and 57367 from soc-2013-dingto).
instead of sobol. So far one doesn't seem to be consistently better or worse than
the other for the same number of samples but more testing is needed.
The random number generator itself is slower than sobol for most number of samples,
except 16, 64, 256, .. because they can be computed faster. This can probably be
optimized, but we can do that when/if this actually turns out to be useful.
Paper this implementation is based on:
http://graphics.pixar.com/library/MultiJitteredSampling/
Also includes some refactoring of RNG code, fixing a Sobol correlation issue with
the first BSDF and < 16 samples, skipping some unneeded RNG calls and using a
simpler unit square to unit disk function.
* Revert r57203 (len() renaming)
There seems to be a problem with nVidia OpenCL after this and I haven't figured out the real cause yet.
Better to selectively enable native length() later, after figuring out what's wrong.
This fixes [#35612].
* Rename some math functions:
len -> length
len_squared -> length_squared
normalize_len -> normalize_length
* This way OpenCL uses its inbuilt length() function, rather than our own. The other two functions have been renamed for consistency.
* Tested CPU, CUDA and OpenCL compile, should be no functional changes.
the second time, as for example Intel CPU startup time is 9 seconds.
* Adds an cache for contexts and programs for each platform and device pair,
which also ensure now no two threads try to compile and write the binary cache
file at the same time.
* Change clFinish to clFlush so we don't block until the result is done, instead
it will block at the moment we copy back memory.
* Fix error in Cycles time_sleep implementation, does not affect any active code
though.
* Adds some (disabled) debugging code in the task scheduler.
Patch #35559 by Doug Gale.
* Support using devices from all OpenCL platforms, so that you can use e.g. both
Intel and NVidia OpenCL implementations if you have them installed.
* Fix compile error due to missing fmodf after recent math node change.
* Enable advanced shading for Intel OpenCL.
* CYCLES_OPENCL_DEBUG environment variable for generating debug symbols so you
can debug with gdb. This crashes the compiler with Intel OpenCL on Linux though.
To make this work the preprocessed kernel source code is written out, as gdb
needs this.
* Show OpenCL compiler warnings even if the build succeeded.
* Some small fixes to initialize cdDevice to NULL, add missing NULL check when
creating buffer and add missing space at end of build options for Apple OpenCL.
* Fix crash with multi device + opencl, now e.g. CPU + GPU render should work.
I did a few tweaks to the code and also:
* Fix viewport render failing sometimes with Apple CPU OpenCL, was not taking
workgroup size limits into account properly.
* Add compile error when advanced shading in the Blender binary and OpenCL kernel
are not in sync.
for Apple OpenCL on OS X 10.8 and simple AO render.
Also environment variable CYCLES_OPENCL_TEST can now be set to CPU, GPU,
ACCELERATOR, DEFAULT or ALL values to test particuler devices.
well as I would like, but it works, just add a subsurface scattering node and
you can use it like any other BSDF.
It is using fully raytraced sampling compatible with progressive rendering
and other more advanced rendering algorithms we might used in the future, and
it uses no extra memory so it's suitable for complex scenes.
Disadvantage is that it can be quite noisy and slow. Two limitations that will
be solved are that it does not work with bump mapping yet, and that the falloff
function used is a simple cubic function, it's not using the real BSSRDF
falloff function yet.
The node has a color input, along with a scattering radius for each RGB color
channel along with an overall scale factor for the radii.
There is also no GPU support yet, will test if I can get that working later.
Node Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#BSSRDF
Implementation notes:
http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/Subsurface_Scattering
* Make Cycles aware of sm_35 (Tesla K20, GeForce GTX TITAN).
The CUDA Toolkit 5.0 is needed for that and this is not officially used yet, but people with access to such cards can start testing. (just build sm_35 kernels).
precompiled cubins instead,
Logic here is following now:
- If there're precompiled cubins, assume CUDA compute is available,
otherwise
- If cuda toolkit found, assume CUDA compute is available
- In all other cases CUDA compute is not available
For windows there're still check for only precompiled binaries,
no runtime compilation is allowed.
Ended up with such decision after discussion with Brecht. The thing
is, if we'll support runtime compilation on windows we'll end up
having lots of reports about different aspects of something doesn't
work (you need particular toolkit version, msvc installed, environment
variables set properly and so) and giving feedback on such reports
will waste time.
big lamps and sharp glossy reflections. This was already supported for mesh
lights and the background, so lamps should do it too.
This is not for free and it's a bit slower than I hoped even though there is
no extra BVH ray intersection. I'll try to optimize it more later.
* Area lights look a bit different now, they had the wrong shape before.
* Also fixes a sampling issue in the non-progressive integrator.
* Only enabled for the CPU, will test on the GPU later.
* An option to disable this will be added for situations where it does not help.
Same time comparison before/after:
http://www.pasteall.org/pic/show.php?id=43313http://www.pasteall.org/pic/show.php?id=43314
should be no functional changes yet. UV, tangent and intercept are now stored
as attributes, with the intention to add more like multiple uv's, vertex
colors, generated coordinates and motion vectors later.
Things got a bit messy due to having both triangle and curve data in the same
mesh data structure, which also gives us two sets of attributes. This will get
cleaned up when we split the mesh class.
Patch [#33445] - Experimental Cycles Hair Rendering (CPU only)
This patch allows hair data to be exported to cycles and introduces a new line segment primitive to render with.
The UI appears under the particle tab and there is a new hair info node available.
It is only available under the experimental feature set and for cpu rendering.
a size parameter between 0.0 and 1.0 that gives a angle of reflection between
0° and 90°, and a smooth parameter that gives and angle over which a smooth
transition from full to no reflection happens.
These work with global illumination and do importance sampling of the area within
the angle. Note that unlike most other BSDF's these are not energy conserving in
general, in particular if their weight is 1.0 and size > 2/3 (or 60°) they will
add more energy in each bounce.
Diffuse: http://www.pasteall.org/pic/show.php?id=42119
Specular: http://www.pasteall.org/pic/show.php?id=42120
Also some simple OSL optimization, passing thread data pointer directly instead
of via thread local storage, and creating ustrings for attribute lookup.