blender

Author	SHA1	Message	Date
Sergey Sharybin	700722f686	Cycles: Cleanup, indent nested preprocessor directives Quite straightforward, main trick is happening in path_source_replace_includes(). Reviewers: brecht, dingto, lukasstockner97, juicyfruit Differential Revision: https://developer.blender.org/D1794	2016-03-25 13:55:42 +01:00
Sergey Sharybin	0e47e0cc9e	Cycles: Use dedicated BVH for subsurface ray casting This commit makes it so casting subsurface rays will totally ignore all the BVH nodes and primitives which do not belong to a current object, making it much simpler traversal code and reduces number of intersection tests. Reviewers: brecht, juicyfruit, dingto, lukasstockner97 Differential Revision: https://developer.blender.org/D1823	2016-03-25 13:42:13 +01:00
Sergey Sharybin	3aa74828ab	Cycles: Cleanup, indentation and braces	2016-02-03 15:00:55 +01:00
Sergey Sharybin	8bca34fe32	Cysles: Avoid having ShaderData on the stack This commit introduces a SSS-oriented intersection structure which is replacing old logic of having separate arrays for just intersections and shader data and encapsulates all the data needed for SSS evaluation. This giver a huge stack memory saving on GPU. In own experiments it gave 25% memory usage reduction on GTX560Ti (722MB vs. 946MB). Unfortunately, this gave some performance loss of 20% which only happens on GPU. This is perhaps due to different memory access pattern. Will be solved in the future, hopefully. Famous saying: won in memory - lost in time (which is also valid in other way around).	2015-11-25 13:01:22 +05:00
George Kyriazis	7f4479da42	Cycles: OpenCL kernel split This commit contains all the work related on the AMD megakernel split work which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely someone else which we're forgetting to mention. Currently only AMD cards are enabled for the new split kernel, but it is possible to force split opencl kernel to be used by setting the following environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1. Not all the features are supported yet, and that being said no motion blur, camera blur, SSS and volumetrics for now. Also transparent shadows are disabled on AMD device because of some compiler bug. This kernel is also only implements regular path tracing and supporting branched one will take a bit. Branched path tracing is exposed to the interface still, which is a bit misleading and will be hidden there soon. More feature will be enabled once they're ported to the split kernel and tested. Neither regular CPU nor CUDA has any difference, they're generating the same exact code, which means no regressions/improvements there. Based on the research paper: https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf Here's the documentation: https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit Design discussion of the patch: https://developer.blender.org/T44197 Differential Revision: https://developer.blender.org/D1200	2015-05-09 19:52:40 +05:00
Thomas Dinges	4eab0e72b3	Cleanup: Update some comments and add ToDo.	2015-04-29 23:56:46 +02:00
Thomas Dinges	b3def11f5b	Cycles: Record all possible volume intersections for SSS and camera checks This replaces sequential ray moving followed with scene intersection with single BVH traversal, which gives us all possible intersections. Only implemented for CPU, due to qsort and a bigger memory usage on GPU which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code. This improves render performance for scenes with: a) Camera inside volume mesh b) SSS mesh intersecting a volume mesh/domain In simple volume files (not much geometry) performance is roughly the same (slightly faster). In files with a lot of geometry, the performance increase is larger. bmps.blend with a volume shader and camera inside the mesh, it renders ~10% faster here. Patch by Sergey and myself. Differential Revision: https://developer.blender.org/D1264	2015-04-29 23:31:06 +02:00
Sergey Sharybin	03f28553ff	Cycles: Implement QBVH tree traversal This commit implements traversal for QBVH tree, which is based on the old loop code for traversal itself and Embree for node intersection. This commit also does some changes to the loop inspired by Embree: - Visibility flags are only checked for primitives. Doing visibility check for every node cost quite reasonable amount of time and in most cases those checks are true-positive. Other idea here would be to do visibility checks for leaf nodes only, but this would need to be investigated further. - For minimum hair width we extend all the nodes' bounding boxes. Again doing curve visibility check is quite costly for each of the nodes and those checks returns truth for most of the hierarchy anyway. There are number of possible optimization still, but current state is good enough in terms it makes rendering faster a little bit after recent watertight commit. Currently QBVH is only implemented for CPU with SSE2 support at least. All other devices would need to be supported later (if that'd make sense from performance point of view). The code is enabled for compilation in kernel. but blender wouldn't use it still.	2014-12-25 02:50:49 +05:00
Sergey Sharybin	30b12b1b27	Cycles: Code cleanup, de-duplicate definition of FEATURE Previously every BVH traversal file was defining macro to check which features should be compiled in, now this macro is defined in the parent header.	2014-12-25 02:50:49 +05:00
Sergey Sharybin	0476e2c87a	Cycles: Rework BVH functions calls a little bit Basic idea is to allow multiple implementation per feature-set, meaning this commit tries to make it easier to hook new algorithms for BVH traversal.	2014-12-25 02:50:49 +05:00
Thomas Dinges	dde740bcd7	Cycles / CUDA: Change inline rules for BVH intersection functions. * On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba407 * On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.	2014-10-05 03:53:51 +02:00
Sergey Sharybin	15969e8a30	Cycles: Fix wrong ifdef check around shadows record all	2014-10-04 16:21:05 +02:00
Thomas Dinges	6359c36ba4	Cycles: Remove a workaround for Titan GPUs, not needed anymore with the latest CUDA compiler.	2014-10-04 01:29:08 +02:00
Thomas Dinges	cdbac018a2	Cycles, some tweaks to scene_intersect_shadow_all() * Function returns a bool, not an uint. * Remove GPU ifdefs, this is CPU only due to malloc / qsort.	2014-10-03 20:41:38 +02:00
Thomas Dinges	dc1ca0c94f	Cycles: Fix OpenCL compile after new Volume BVH introduction and add some comments.	2014-10-03 17:23:45 +02:00
Sergey Sharybin	7dabfb2048	Cycles: Speedup of kernel side camera-in-volume detection The idea is to only count intersections with objects which has volumetric shader and ignore all other objects. This is probably as fast as we can go without involving some forth level magic.	2014-10-03 12:55:31 +06:00
Thomas Dinges	1b5ec32ed9	Cleanup: Avoid some defines for scene_intersect(), related to Min Width.	2014-09-24 11:32:29 +02:00
Brecht Van Lommel	9ab259f55b	Cycles: shadow function optimization for transparent shadows (CPU only). Old algorithm: Raytrace from one transparent surface to the next step by step. To minimize overhead in cases where we don't need transparent shadows, we first trace a regular shadow ray. We check if the hit primitive was potentially transparent, and only in that case start marching. this gives extra ray cast for the cases were we do want transparency. New algorithm: We trace a single ray. If it hits any opaque surface, or more than a given number of transparent surfaces is hit, then we consider the geometry to be entirely blocked. If not, all transparent surfaces will be recorded and we will shade them one by one to determine how much light is blocked. This all happens in one scene intersection function. Recording all hits works well in some cases but may be slower in others. If we have many semi-transparent hairs, one intersection may be faster because you'd be reinteresecting the same hairs a lot with each step otherwise. If however there is mostly binary transparency then we may be recording many unnecessary intersections when one of the first surfaces blocks all light. We found that this helps quite nicely in some scenes, on koro.blend this can give a 50% reduction in render time, on the pabellon barcelona scene and a forest scene with transparent leaves it was 30%. Some other files rendered maybe 1% or 2% slower, but this seems a reasonable tradeoff. Differential Revision: https://developer.blender.org/D473	2014-04-21 19:34:25 +02:00
Brecht Van Lommel	393216a6df	Cycles code refactor: move more code to geom folder, add some comments.	2014-03-29 13:03:48 +01:00
Brecht Van Lommel	e2184c653e	Cycles: add support for curve deformation motion blur.	2014-03-29 13:03:47 +01:00
Brecht Van Lommel	6020d00990	Cycles: add support for mesh deformation motion blur.	2014-03-29 13:03:47 +01:00
Brecht Van Lommel	41d1675053	Cycles code refactor: move more geometry code into per primitive files.	2014-03-29 13:03:45 +01:00
Brecht Van Lommel	84470a1190	Cycles code refactor: move geometry related kernel files into own directory.	2014-03-29 13:03:45 +01:00

23 Commits