blender/intern/cycles/kernel
Sergey Sharybin 17e7454263 Cycles: Reduce memory usage by de-duplicating triangle storage
There are several internal changes for this:

First idea is to make __tri_verts to behave similar to __tri_storage,
meaning, __tri_verts array now contains all vertices of all triangles
instead of just mesh vertices. This saves some lookup when reading
triangle coordinates in functions like triangle_normal().

In order to make it efficient needed to store global triangle offset
somewhere. So no __tri_vindex.w contains a global triangle index which
can be used to read triangle vertices.

Additionally, the order of vertices in that array is aligned with
primitives from BVH. This is needed to keep cache as much coherent as
possible for BVH traversal. This causes some extra tricks needed to
fill the array in and deal with True Displacement but those trickery
is fully required to prevent noticeable slowdown.

Next idea was to use this __tri_verts instead of __tri_storage in
intersection code. Unfortunately, this is quite tricky to do without
noticeable speed loss. Mainly this loss is caused by extra lookup
happening to access vertex coordinate.

Fortunately, tricks here and there (i,e, some types changes to avoid
casts which are not really coming for free) reduces those losses to
an acceptable level. So now they are within couple of percent only,

On a positive site we've achieved:

- Few percent of memory save with triangle-only scenes. Actual save
  in this case is close to size of all vertices.

  On a more fine-subdivided scenes this benefit might become more
  obvious.

- Huge memory save of hairy scenes. For example, on koro.blend
  there is about 20% memory save. Similar figure for bunny.blend.

This memory save was the main goal of this commit to move forward
with Hair BVH which required more memory per BVH node. So while
this sounds exciting, this memory optimization will become invisible
by upcoming Hair BVH work.

But again on a positive side, we can add an option to NOT use Hair
BVH and then we'll have same-ish render times as we've got currently
but will have this 20% memory benefit on hairy scenes.
2016-07-07 17:25:48 +02:00
..
closure Fix T48732: New GGX breaks OpenCL kernel 2016-06-28 17:15:35 +05:00
geom Cycles: Reduce memory usage by de-duplicating triangle storage 2016-07-07 17:25:48 +02:00
kernels Cycles: reduce CUDA stack memory access for Maxwell and up, increasing max registers. 2016-06-19 20:17:26 +02:00
osl Fix T48783: OSL render errors after recent refactoring. 2016-07-03 13:08:21 +02:00
shaders Fix T48783: OSL render errors after recent refactoring. 2016-07-03 13:08:21 +02:00
split Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
svm Fix Cycles OpenCL not taking Extend and Clip extension types into account. 2016-07-01 23:48:31 +02:00
CMakeLists.txt Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_accumulate.h Fix T47461: Different results on CPU and GPU when using Branched Path Tracing 2016-02-18 01:23:38 +01:00
kernel_bake.h Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_camera.h Cycles: Cleanup, indent nested preprocessor directives 2016-03-25 13:55:42 +01:00
kernel_compat_cpu.h Cycles: Support half and half4 textures. 2016-06-19 17:31:16 +02:00
kernel_compat_cuda.h Cycles: Add support for bindless textures. 2016-05-19 13:14:37 +02:00
kernel_compat_opencl.h Cycles: Cleanup, indent nested preprocessor directives 2016-03-25 13:55:42 +01:00
kernel_debug.h Cycles: Add debug pass which shows number of instance pushes during camera ray intersection 2015-06-12 00:12:03 +02:00
kernel_differential.h Cycles: OpenCL kernel split 2015-05-09 19:52:40 +05:00
kernel_emission.h Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_film.h Cycles: Use native saturate function for CUDA 2015-04-28 00:38:32 +05:00
kernel_globals.h Cycles: Support half and half4 textures. 2016-06-19 17:31:16 +02:00
kernel_jitter.h Fix T48301: Cycles incorrect render with CMJ and viewport samples 0. 2016-04-28 23:57:20 +02:00
kernel_light.h Cleanup: comment blocks 2016-07-02 10:08:33 +10:00
kernel_math.h Cleanup: Move texture definitions to util, to avoid bad level include. 2016-04-15 23:02:44 +02:00
kernel_montecarlo.h Cycles code refactor: minor refactoring and comments for volume code. 2014-03-29 13:03:49 +01:00
kernel_passes.h Cycles: OpenCL kernel split 2015-05-09 19:52:40 +05:00
kernel_path_branched.h Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_path_common.h Cycles: OpenCL kernel split 2015-05-09 19:52:40 +05:00
kernel_path_state.h Cycles CUDA: reduce stack memory by reusing ShaderData. 2016-05-23 22:29:24 +02:00
kernel_path_surface.h Cycles CUDA: reduce stack memory by reusing ShaderData. 2016-05-23 22:29:24 +02:00
kernel_path_volume.h Cycles CUDA: reduce stack memory by reusing ShaderData. 2016-05-23 22:29:24 +02:00
kernel_path.h Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_projection.h Cycles: Pole merging for spherical stereo 2016-05-18 10:56:57 +02:00
kernel_queues.h Cycles: Code cleanup in split kernel, whitespaces 2015-07-03 11:03:56 +02:00
kernel_random.h Fix T48732: New GGX breaks OpenCL kernel 2016-06-28 17:15:35 +05:00
kernel_shader.h Fix T48732: New GGX breaks OpenCL kernel 2016-06-28 17:15:35 +05:00
kernel_shadow.h Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_subsurface.h Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs 2016-06-23 22:57:26 +02:00
kernel_textures.h Cycles: Reduce memory usage by de-duplicating triangle storage 2016-07-07 17:25:48 +02:00
kernel_types.h Cleanup: comment blocks 2016-07-02 10:08:33 +10:00
kernel_volume.h Cycles: Fix two numerical issues in the volume code 2016-06-08 03:17:19 +02:00
kernel_work_stealing.h Cycles: Cleanup, indent nested preprocessor directives 2016-03-25 13:55:42 +01:00
kernel.h Cycles: Deduplicte CPU kernel declaration and definition code 2015-12-30 17:54:02 +05:00