blender

Author	SHA1	Message	Date
Hristo Gueorguiev	6bf4115c13	Cycles: Split kernel - sort shaders Reduce thread divergence in kernel_shader_eval. Rays are sorted in blocks of 2048 according to shader->id. On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster. No sorting for CUDA split kernel. Reviewers: sergey, maiself Reviewed By: maiself Differential Revision: https://developer.blender.org/D2598	2017-05-03 15:30:45 +02:00
Mai Lavelle	915766f42d	Cycles: Branched path tracing for the split kernel This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611	2017-05-02 14:26:46 -04:00
Hristo Gueorguiev	9d26e32ea2	Workaround for AMD GPU OpenCL compiler.	2017-04-25 20:08:14 +02:00
Sergey Sharybin	f970e859cf	Cycles: Cleanup, style	2017-04-18 11:39:21 +02:00
Lukas Stockner	ef816f9cff	Cycles: Fix the AO replacement option in the split kernel Currently the code for it was inside the hair-specific part, so it wouldn't be enabled in hairless renders.	2017-04-11 01:07:49 +02:00
Sergey Sharybin	0579eaae1f	Cycles: Make all #include statements relative to cycles source directory The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586	2017-03-29 13:41:11 +02:00
Hristo Gueorguiev	e07ffcbd1c	Cycles: Add OpenCL support for shadow catcher feature The title says it all actually.	2017-03-27 10:46:59 +02:00
Hristo Gueorguiev	8ada7f7397	Cycles: Remove ccl_addr_space from RNG passed to functions Simplifies code quite a bit, making it shorter and easier to extend. Currently no functional changes for users, but is required for the upcoming work of shadow catcher support with OpenCL.	2017-03-27 10:46:28 +02:00
Sergey Sharybin	d14e39622a	Cycles: First implementation of shadow catcher It uses an idea of accumulating all possible light reachable across the light path (without taking shadow blocked into account) and accumulating total shaded light across the path. Dividing second figure by first one seems to be giving good estimate of the shadow. In fact, to my knowledge, it's something really similar to what is happening in the denoising branch, so we are aligned here which is good. The workflow is following: - Create an object which matches real-life object on which shadow is to be catched. - Create approximate similar material on that object. This is needed to make indirect light properly affecting CG objects in the scene. - Mark object as Shadow Catcher in the Object properties. Ideally, after doing that it will be possible to render the image and simply alpha-over it on top of real footage.	2017-03-27 10:46:03 +02:00
Sergey Sharybin	d6b4fb6429	Cycles: Fix mistake in previous split kernel commits Own stupid mistake. Reported by nirved in IRC, thanks!	2017-03-17 11:55:59 +01:00
Mai Lavelle	60a344b43d	Cycles: Fix handling of barriers	2017-03-17 01:54:04 -04:00
Sergey Sharybin	1cad64900e	Cycles: Define ccl_local variables in kernel functions Declaring ccl_local in a device function is not supported by certain compilers.	2017-03-16 11:27:17 +01:00
Sergey Sharybin	26620f3f87	Cycles: Avoid some ccl_local in various kernels	2017-03-16 11:27:17 +01:00
Sergey Sharybin	76acaefdd7	Cycles: Cleanup, wipe obviously outdated parts of split kernel comments	2017-03-13 17:16:16 +01:00
Sergey Sharybin	aa36c73c33	Cycles: Add missing header in the file	2017-03-13 16:59:09 +01:00
Hristo Gueorguiev	f169ff8b88	Fix T50925: Add AO approximation to split kernel	2017-03-13 11:15:58 +01:00
Mai Lavelle	96868a3941	Fix T50888: Numeric overflow in split kernel state buffer size calculation Overflow led to the state buffer being too small and the split kernel to get stuck doing nothing forever.	2017-03-11 05:39:28 -05:00
Hristo Gueorguiev	06c051363b	Cycles: split kernel_shadow_blocked to AO & DL parts Reduces memory allocation for split kernel. This allows for faster rendering due to bigger global size, specially when GPU memory is limited. Perfromance results: R9 290 total render time Before After Change BMW 4:37 4:34 -1.1 % Classroom 14:43 14:30 -1.5 % Fishy Cat 11:20 11:04 -2.4 % Koro 12:11 12:04 -1.0 % Pabellon Barcelona 22:01 20:44 -5.8 % Pabellon Barcelona() 15:32 15:09 -2.5 % () without glossy connected to volume	2017-03-09 17:09:37 +01:00
Hristo Gueorguiev	57e26627c4	Cycles: SSS and Volume rendering in split kernel Decoupled ray marching is not supported yet. Transparent shadows are always enabled for volume rendering. Changes in kernel/bvh and kernel/geom are from Sergey. This simiplifies code significantly, and prepares it for record-all transparent shadow function in split kernel.	2017-03-09 17:09:37 +01:00
Sergey Sharybin	712f7c3640	Cycles: Make it possible to access KernelGlobals from split data initialization function	2017-03-08 11:02:54 +01:00
Mai Lavelle	64751552f7	Cycles: Fix indentation	2017-03-08 01:31:32 -05:00
Mai Lavelle	fe7cc94dfa	Cycles: Fix strict warning about unused variable	2017-03-08 01:31:32 -05:00
Mai Lavelle	306034790f	Cycles: Calculate size of split state buffer kernel side By calculating the size of the state buffer in the kernel rather than the host less code is needed and the size actually reflects the requested features. Will also be a little faster in some cases because of larger global work size.	2017-03-08 01:31:30 -05:00
Mai Lavelle	223f45818e	Cycles: Initialize rng_state for split kernel Because the split kernel can render multiple samples in parallel it is necessary to have everything initialized before rendering of any samples begins. The code that normally handles initialization of `rng_state` (`kernel_path_trace_setup()`) only does so for the first sample, which was causing artifacts in the split kernel due to uninitialized `rng_state` for some samples. Note that because the split kernel can render samples in parallel this means that the split kernel is incompatible with the LCG.	2017-03-08 01:31:09 -05:00
Mai Lavelle	cd7d5669d1	Cycles: Remove sum_all_radiance kernel This was only needed for the previous implementation of parallel samples. As we don't have that any more it can be removed. Real reason for removal tho is this: `per_sample_output_buffers` was being calculated too small and artifacts resulted. The tile buffer is already the correct size and calculating the size for `per_sample_output_buffers` is a bit difficult with the current layout of the code. As `per_sample_output_buffers` was only needed for `sum_all_radiance`, removing that kernel and writing output to the tile buffer directly fixes the artifacts.	2017-03-08 01:31:07 -05:00
Mai Lavelle	4cf501b835	Cycles: Split path initialization into own kernel This makes it easier to initialize things correctly in the data_init kernel before they are needed by path tracing.	2017-03-08 01:30:43 -05:00
Mai Lavelle	817873cc83	Cycles: CUDA implementation of split kernel	2017-03-08 01:24:53 -05:00
Mai Lavelle	0892352bfe	Cycles: CPU implementation of split kernel	2017-03-08 00:52:41 -05:00
Mai Lavelle	352ee7c3ef	Cycles: Remove ccl_fetch and SOA	2017-03-08 00:52:41 -05:00
Mai Lavelle	230c00d872	Cycles: OpenCL split kernel refactor This does a few things at once: - Refactors host side split kernel logic into a new device agnostic class `DeviceSplitKernel`. - Removes tile splitting, a new work pool implementation takes its place and allows as many threads as will fit in memory regardless of tile size, which can give performance gains. - Refactors split state buffers into one buffer, as well as reduces the number of arguments passed to kernels. Means there's less code to deal with overall. - Moves kernel logic out of OpenCL kernel files so they can later be used by other device types. - Replaced OpenCL specific APIs with new generic versions - Tiles can now be seen updating during rendering	2017-03-08 00:52:41 -05:00
Sergey Sharybin	bc096e1eb8	Cycles: Split ShaderData object and shader flags We started to run out of bits there, so now we separate flags which came from __object_flags and which are either runtime or coming from __shader_flags. Rule now is: SD_OBJECT_* flags are to be tested against new object_flags field of ShaderData, all the rest flags are to be tested against flags field of ShaderData. There should be no user-visible changes, and time difference should be minimal. In fact, from tests here can only see hardly measurable difference and sometimes the new code is somewhat faster (all within a noise floor, so hard to tell for sure). Reviewers: brecht, dingto, juicyfruit, lukasstockner97, maiself Differential Revision: https://developer.blender.org/D2428	2017-01-23 12:56:55 +01:00
Sergey Sharybin	b9311b5e5a	Cycles: Make object flag names more obvious that hey are object and not shader	2017-01-23 12:14:17 +01:00
Sergey Sharybin	53fa389802	Cycles: Use dedicated debug passes for traversed nodes and intersection tests This way it's more clear whether some issue is caused by lots of geometry in the node or by lots of "transparent" BVH nodes.	2017-01-12 13:44:35 +01:00
Lukas Stockner	26bf230920	Cycles: Add optional probabilistic termination of light samples based on their expected contribution In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways. To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased. This is the same approach that's also used in path termination based on length. Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise. Reviewers: #cycles Subscribers: sergey, brecht Differential Revision: https://developer.blender.org/D2217	2016-10-30 11:31:28 +01:00
Hristo Gueorguiev	8905c5c874	Cycles: OpenCL 3d textures support. Note that volume rendering is not supported yet, this is a step towards that. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2299	2016-10-22 23:49:29 +02:00
Lukas Stockner	2dccf5a6e8	Cycles: Fix OpenCL split kernel compilation after recent CUDA 8 performance fix	2016-10-07 18:50:43 +02:00
Sergey Sharybin	100b2ad775	Cycles: Cleanup code style in split kernel	2016-09-19 16:05:12 +02:00
Lukas Stockner	b459d9f46c	Cycles: Stop lamp sampling if the lamp isn't visible Both spot and area light have large areas where they're not visible. Therefore, this patch stops the light sampling code when one of these cases (outside of the spotlight cone or behind the area light) occurs, before the lamp shader is evaluated. In the case of the area light, the solid angle sampling can also be skipped. In a test scene with Sample All Lights and 18 Area lamps and 9 Spot lamps that all point away from the area that the camera sees, render time drops from 12sec to 5sec. Reviewers: brecht, sergey, dingto, juicyfruit Differential Revision: https://developer.blender.org/D2216	2016-09-14 19:45:12 +02:00
Sergey Sharybin	4355603790	Cycles: Move BVK kernel files to own directory BVH traversal is not really that much a geometry and we've got quite some traversals now. Makes sense to keep them separate in the name of source structure clarity.	2016-07-11 13:58:47 +02:00
Lukas Stockner	23c276832b	Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model". Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until the ray leaves it again, which ensures perfect energy conservation. In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing roughness - is solved in a physically correct and efficient way. The downside of this model is that it has no (known) analytic expression for evalation. However, it can be evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the balance heuristic guarantee an unbiased result at the cost of slightly higher noise. Reviewers: dingto, #cycles, brecht Reviewed By: dingto, #cycles, brecht Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel Differential Revision: https://developer.blender.org/D2002	2016-06-23 22:57:26 +02:00
Brecht Van Lommel	999d5a6785	Cycles CUDA: reduce stack memory by reusing ShaderData. 57% less for path and 48% less for branched path.	2016-05-23 22:29:24 +02:00
Sergey Sharybin	3aa74828ab	Cycles: Cleanup, indentation and braces	2016-02-03 15:00:55 +01:00
Sergey Sharybin	9815f8a623	Cycles: Cleanup of OpenCL split kernel routines The idea is to switch from allocating separate buffers for shader data's structure of arrays to allocating one huge memory block and do some index trickery to make it accessed as SOA. This saves quite reasonable amount of lines of code in device_opencl and also makes it possible to get rid of special declaration of ShaderData structure. As a side effect it also makes it easier to experiment with SOA vs. AOS for split kernel. Works fine here on NVidia GTX580, Intel CPU amd AMD Fiji cards. Reviewers: #cycles, brecht, juicyfruit, dingto Differential Revision: https://developer.blender.org/D1593	2016-01-30 00:23:06 +01:00
Sergey Sharybin	e7915ea6eb	Cycles: Remove code which was commented out for ages now It was mainly unfinished code for volume in a split kernel which should be done differently anyway to avoid such a code copy-paste. The code didn't really work, so likely nobody will cry.	2016-01-29 18:59:37 +01:00
Sergey Sharybin	25aea19323	Cycles: Remove some unused variables from split kernel function	2016-01-29 18:54:46 +01:00
Sergey Sharybin	e2161ca854	Cycles: Remove few function arguments needed only for the split kernel Use KernelGlobals to access all the global arrays for the intermediate storage instead of passing all this storage things explicitly. Tested here with Intel OpenCL, NVIDIA GTX580 and AMD Fiji, didn't see any artifacts, so guess it's all good. Reviewers: juicyfruit, dingto, lukasstockner97 Differential Revision: https://developer.blender.org/D1736	2016-01-28 18:59:27 +01:00
Thomas Dinges	83e73a2100	Cycles: Refactor how we pass bounce info to light path node. This commit changes the way how we pass bounce information to the Light Path node. Instead of manualy copying the bounces into ShaderData, we now directly pass PathState. This reduces the arguments that we need to pass around and also makes it easier to extend the feature. This commit also exposes the Transmission Bounce Depth to the Light Path node. It works similar to the Transparent Depth Output: Replace a Transmission lightpath after X bounces with another shader, e.g a Diffuse one. This can be used to avoid black surfaces, due to low amount of max bounces. Reviewed by Sergey and Brecht, thanks for some hlp with this. I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split and Mega kernel. Hopefully this covers all devices. :)	2016-01-06 23:43:29 +01:00
Sergey Sharybin	9bce104c8c	Cycles: Partially revert previous commit Apparently removing kernel arguments broke NVidia OpenCL. Needs more investigation, for the time being revering changes which caused problem.	2015-11-01 21:01:12 +05:00
Sergey Sharybin	dc9e0b819b	Cycles: Remove unused argument from the split kernel functions Should be no functional changes, just simplifies operation with kernels.	2015-11-01 17:22:42 +05:00
Sergey Sharybin	537f41250f	Cycles: Fix typo in split kernel Shadow blocked kernel was using wrong array for storing intersection.	2015-10-29 21:52:56 +05:00

1 2

61 Commits