blender

History

Lukas Stockner fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes Previously, the NLM kernels would be launched once per offset with one thread per pixel. However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown. Therefore, the kernels are now launched in a single call that handles all offsets at once. This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory. On the other hand, of course, the smaller tiles significantly reduce the size of the memory. The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum. I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere. To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now. Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.		2017-11-30 07:37:08 +01:00
..
bvh	Code refactor: rename subsurface to local traversal, for reuse.	2017-11-07 22:35:12 +01:00
closure	Cycles: Fix wrong behavior of sharpness in Cubic SSS	2017-11-20 11:40:55 +01:00
filter	Cycles: Improve denoising speed on GPUs with small tile sizes	2017-11-30 07:37:08 +01:00
geom	Cycles: Make per-object random value output also work for Lamps	2017-11-14 04:17:54 +01:00
kernels	Cycles: Improve denoising speed on GPUs with small tile sizes	2017-11-30 07:37:08 +01:00
osl	Fix build with OSL 1.9.x, automatically aligns to 16 bytes now.	2017-11-20 23:24:24 +01:00
shaders	Cycles: Fix OSL brick node after recent fix	2017-11-21 04:30:12 -05:00
split	Cycles: Fix crash with split branched path tracing	2017-11-16 04:59:31 -05:00
svm	Fix T53348: Cycles difference between gradient texture on CPU and GPU.	2017-11-23 17:14:04 +01:00
CMakeLists.txt	Cycles: Improve denoising speed on GPUs with small tile sizes	2017-11-30 07:37:08 +01:00
kernel_accumulate.h	Cycles: Add Volume Direct and Volume Indirect passes for volume-scattered light	2017-11-17 16:39:45 +01:00
kernel_bake.h	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable	2017-11-09 01:04:06 -05:00
kernel_camera.h	Cycles: Remove ccl_fetch and SOA	2017-03-08 00:52:41 -05:00
kernel_compat_cpu.h	Code refactor: make texture code more consistent between devices.	2017-10-07 14:53:14 +02:00
kernel_compat_cuda.h	Code refactor: make texture code more consistent between devices.	2017-10-07 14:53:14 +02:00
kernel_compat_opencl.h	Code refactor: make texture code more consistent between devices.	2017-10-07 14:53:14 +02:00
kernel_differential.h	Cycles: OpenCL kernel split	2015-05-09 19:52:40 +05:00
kernel_emission.h	Cycles: reduce closure memory usage for emission/shadow shader data.	2017-11-05 20:48:33 +01:00
kernel_film.h	Cycles: Use native saturate function for CUDA	2015-04-28 00:38:32 +05:00
kernel_globals.h	Code refactor: make texture code more consistent between devices.	2017-10-07 14:53:14 +02:00
kernel_jitter.h	Cycles: Use more stable version of integer square root function	2017-05-09 17:07:17 +02:00
kernel_light.h	Fix incorrect MIS weights in Cycles with multiple lights.	2017-11-07 22:35:12 +01:00
kernel_math.h	Cycles: Make all #include statements relative to cycles source directory	2017-03-29 13:41:11 +02:00
kernel_montecarlo.h	Cycles: Cleanup, indendation	2017-10-06 19:33:59 +05:00
kernel_passes.h	Cycles: Add Volume Direct and Volume Indirect passes for volume-scattered light	2017-11-17 16:39:45 +01:00
kernel_path_branched.h	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable	2017-11-09 01:04:06 -05:00
kernel_path_common.h	Code refactor: remove rng_state buffer and compute hash on the fly.	2017-10-04 21:11:14 +02:00
kernel_path_state.h	Code cleanup: remove hack to avoid seeing transparent objects in noise.	2017-09-20 19:38:08 +02:00
kernel_path_subsurface.h	Code refactor: rename subsurface to local traversal, for reuse.	2017-11-07 22:35:12 +01:00
kernel_path_surface.h	Cycles: reduce subsurface stack memory usage.	2017-09-28 15:18:43 +02:00
kernel_path_volume.h	Cycles: reduce subsurface stack memory usage.	2017-09-28 15:18:43 +02:00
kernel_path.h	Fix T53349: AO bounces not working correct with OpenCL.	2017-11-26 15:53:00 +01:00
kernel_projection.h	Cycles: Implement denoising option for reducing noise in the rendered image	2017-05-07 14:40:58 +02:00
kernel_queues.h	Cycles: Add function to dequeue a ray	2017-06-10 03:51:18 -04:00
kernel_random.h	Cycles: restore SOBOL_SKIP hack, for some cases where it helps still.	2017-10-29 16:44:20 +01:00
kernel_shader.h	Cycles: Make per-object random value output also work for Lamps	2017-11-14 04:17:54 +01:00
kernel_shadow.h	Cycles: reduce closure memory usage for emission/shadow shader data.	2017-11-05 20:48:33 +01:00
kernel_subsurface.h	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable	2017-11-09 01:04:06 -05:00
kernel_textures.h	Code refactor: make texture code more consistent between devices.	2017-10-07 14:53:14 +02:00
kernel_types.h	Cycles: Add per-tile render time debug pass	2017-11-17 16:40:24 +01:00
kernel_volume.h	Cycles: better distance sampling for chromatic volume extinction.	2017-11-10 01:37:10 +01:00
kernel_work_stealing.h	Code refactor: add WorkTile struct for passing work to kernel.	2017-10-04 21:11:14 +02:00
kernel.h	Code refactor: device memory cleanups, preparing for mapped host memory.	2017-11-05 15:22:04 +01:00