Commit Graph

7139 Commits

Author SHA1 Message Date
d0892a6648 Fix issue with moving CUDA memory to host and multiple devices.
This is not expected to fix all issues. Also adds some more details
to error reporting to investigate failures.
2018-01-11 00:00:48 +01:00
0f4b46cee6 Fix T53692: OpenCL multi GPU rendering not using all GPUs.
Ensure each OpenCL device has a unique ID even if the hardware ID is not
unique for some reason.
2018-01-11 00:00:48 +01:00
c621832d3d Cycles: CUDA support for rendering scenes that don't fit on GPU.
In that case it can now fall back to CPU memory, at the cost of reduced
performance. For scenes that fit in GPU memory, this commit should not
cause any noticeable slowdowns.

We don't use all physical system RAM, since that can cause OS instability.
We leave at least half of system RAM or 4GB to other software, whichever
is smaller.

For image textures in host memory, performance was maybe 20-30% slower
in our tests (although this is highly hardware and scene dependent). Once
other type of data doesn't fit on the GPU, performance can be e.g. 10x
slower, and at that point it's probably better to just render on the CPU.

Differential Revision: https://developer.blender.org/D2056
2018-01-02 23:50:18 +01:00
6699454fb6 Cycles: make CUDA code a bit more robust to host/device alloc failures.
Fixes a few corner cases found while stress testing host mapped memory.
2018-01-02 23:46:19 +01:00
7a6967cbe6 Fix mistake in previous fix for T53600, shows we really need a smarter solution. 2017-12-29 00:07:49 +01:00
948515c21a Fix T53600: Cycles shader mixing issue with principled BSDF and zero weights.
SVM nodes need to read all data to get the right offset for the following node.
This is quite weak, a more generic solution would be good in the future.
2017-12-25 23:59:20 +01:00
e8e92dffed Fix T53607: Cycles normal map baking problem when there is no bump. 2017-12-25 23:05:45 +01:00
Lukas Stockner
bf1dc39679 Fix T53567: Negative pixel values causing artifacts with denoising
Now negative color values are clamped to zero before the actual denoising.
2017-12-21 14:24:23 +01:00
Sergey Sharybin
5650fe77e4 Cycles: Cleanup, indentation 2017-12-20 17:42:50 +01:00
Sergey Sharybin
ab1af38c74 Cycles: Fix crash opening user preferences after adding extra GPU
We can not store pointers to elements of collection property in the
case we modify that collection. This is like storing pointers to
elements of array before calling realloc().
2017-12-19 15:51:28 +01:00
Sergey Sharybin
4895bd6ace Libmv: Add C-API function to set all markers within AutoTrack structure 2017-12-15 12:51:17 +01:00
Sergey Sharybin
2e8914549b Cycles: Fix difference in image Clip extension method between CPU and GPU
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.

This partially fixes issue reported in T53452.

Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
2017-12-08 12:03:11 +01:00
Sergey Sharybin
f31fb4a014 Cycles: Cleanup, split 2D interpolation function 2017-12-08 11:22:04 +01:00
Lukas Stockner
2069102c56 Cycles: Fix constness for load_kernels in device_cpu.cpp 2017-12-06 00:00:18 +01:00
d64d8b5be5 Fix Cycles standalone crash when saving output, after recent refactoring. 2017-12-02 05:45:09 +01:00
Campbell Barton
28d2148b09 Haiku OS Support
D2860 by @miqlas

Even though Haiku is a niche OS, only minor changes are needed.
2017-11-30 18:05:21 +11:00
Lukas Stockner
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
e4b54f44c1 Cycles: add object level holdout property.
This works the same as the holdout shader and Z mask layer. Combined with
overrides in 2.8 this is intended to replace the Z mask layer bits.
2017-11-29 18:11:40 +01:00
Maxym Dmytrychenko
7e349f2745 Cycles: improve triangle intersection performance.
Reduces render time by about 1-2% in benchmark scenes.

Differential Revision: https://developer.blender.org/D2911
2017-11-29 18:11:40 +01:00
Mathieu Menuet
83e80db56e Fix T53349: AO bounces not working correct with OpenCL. 2017-11-26 15:53:00 +01:00
Bastien Montagne
cf6e8edda5 atomic_ops: add atomic_cas_float helper. 2017-11-23 21:17:16 +01:00
Bastien Montagne
ff9eab7926 atomic_ops: Copy/adapt static assert macro from BLI_utildefines, and use it.
Checking for type sizes is much nicer with a static assert!
2017-11-23 20:25:55 +01:00
6be95f8778 Fix T53357: harmless assert after recent addition of render time pass. 2017-11-23 17:14:35 +01:00
e50ed90e4d Fix T53348: Cycles difference between gradient texture on CPU and GPU. 2017-11-23 17:14:04 +01:00
Bastien Montagne
e704d8a616 Moar attempt to fix bloody MSVC intrinsic mess... 2017-11-23 16:58:20 +01:00
Bastien Montagne
df06f1c816 Attempt to fix bloody MSVC atomic intrinsic mess... 2017-11-23 16:53:03 +01:00
Bastien Montagne
580b34e52b atomic_ops: add char versions of uint8_t atomic primitives. 2017-11-23 16:24:34 +01:00
Bastien Montagne
105b95835f atomic_ops: add signed versions of primitives.
Reason is motsly that dealing with type conversion in calling code is
not great, makes it less readable, and can generate hidden bugs in case
original type changes and atomic primitive calls are not updated
accordingly...
2017-11-23 16:24:33 +01:00
d77f1d6538 Fix T53313: bevel shader with transmission render artifacts. 2017-11-22 01:59:21 +01:00
Stefan Werner
58a15b2bfe Cycles: Fixed compilation of CUDA kernels. Follow-up fix for my last commit. 2017-11-21 10:43:40 +01:00
Mai Lavelle
d8f80fbe72 Cycles: Fix OSL brick node after recent fix 2017-11-21 04:30:12 -05:00
Stefan Werner
1febc85855 Cycles: Workaround for performance loss with the CUDA 9.0 SDK.
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
2017-11-21 10:29:11 +01:00
Mai Lavelle
9325b9bf15 Fix T53365: OpenCL has wrong shading of brick texture
Looks like some weird compiler difference with signed vs unsigned ints.
2017-11-21 00:42:55 -05:00
d089875c4c Fix build with OSL 1.9.x, automatically aligns to 16 bytes now. 2017-11-20 23:24:24 +01:00
Sergey Sharybin
51e2844387 Cycles: Fix wrong behavior of sharpness in Cubic SSS
Was giving difference when using sharpness of 1.0 and 0.999 even though the
result was expected to be really close to each other.

This SSS profile will probably be removed in the future in favor of more
physically bases Burley, but for the time being don't see anything wrong
fixing an existing code.
2017-11-20 11:40:55 +01:00
Lukas Stockner
119846a6bb Mikktspace: Speed up the merging of identical vertices
Previously, Mikktspace just bucketed the vertices based on one spatial coordinate and then ran full pairwise comparisons inside each bucket.
However, since models are three-dimensional, the bucketing has a massive false-positive rate, and since pairwise comparison is O(n^2), the merging process is very slow.

But, since we only care about exactly identical vertices, there is a much more efficient approach - we can just hash all values belonging to each vertex and form buckets based on the hash.
Since the hash has 32 bits and considers all values, false-positives are very unlikely - and since both hashing and the radixsort that's used for bucketing are O(n), both asymptotical and
real-world performance (as well as code complexity) are significantly improved.
2017-11-17 18:34:53 +01:00
Lukas Stockner
40f528a7da Cycles: Add per-tile render time debug pass
Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2920
2017-11-17 16:40:24 +01:00
Lukas Stockner
a0c02e4d1b Cycles: Add Volume Direct and Volume Indirect passes for volume-scattered light
No color pass because it's hard to define what to use as color in a volume.

Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2903
2017-11-17 16:39:45 +01:00
Lukas Stockner
f78e963858 Cycles: Refactor PassType from bitflag to index in order to allow for more passes 2017-11-17 16:34:19 +01:00
Mai Lavelle
470b4cb62f Cycles: Fix crash with split branched path tracing
ShaderData memory was getting clobbered in the branched path code paths.

Was caused by 087331c495b04ebd37903c0dc0e46262354cf026
2017-11-16 04:59:31 -05:00
Sergey Sharybin
67ddc28055 Smoke: Pass non-trivial arguments by const reference 2017-11-14 17:11:48 +01:00
Sergey Sharybin
2868dcbe2b Fix compilation error with clang-5 2017-11-14 17:11:48 +01:00
Lukas Stockner
212a8d9e5a Cycles: Make per-object random value output also work for Lamps 2017-11-14 04:17:54 +01:00
Lukas Stockner
d8066fb0f1 Cycles: Refactor closure roughness detection to fix a potential bug with Denoising of specular shaders 2017-11-14 04:17:54 +01:00
Sergey Sharybin
d1a761c4d4 Cycles: Fix compilation error of standalone application 2017-11-13 10:49:05 +01:00
Sergey Sharybin
42dff6cc2e Cycles: Fix compilation error with OIIO compiled against system PugiXML 2017-11-13 10:42:29 +01:00
e568c1a975 Fix T53289: CUDA missing textures not showing pink, after recent changes. 2017-11-12 20:45:47 +01:00
Mai Lavelle
e389ae9dca Cycles: Set error if a split kernel fails to load
To help catch cases where adding a new kernel is missed for one of the
device implementations.
2017-11-11 01:01:14 -05:00
Sergey Sharybin
db7a78a2be Cycles: Fix compilation error with latest OIIO
There was some changes about namespaces, which causes ambiguities.

Replaces using namespace with an explicit symbols we need. Is good idea to NOT
pull in the whole namespace anyway!
2017-11-10 10:04:33 +01:00
a466d7ae24 Cycles: better distance sampling for chromatic volume extinction.
Previously we picked one of the RGB channels with equal probability, but this
works poorly in a dense volume after many bounces. Now we take into account
the throughput and single scattering albedo.

This makes it a little more practical to do brute force SSS with volumes, but
is still very inefficient because we do direct light sampling at every volume
bounce even when inside an opaque mesh. In theory there could be a light inside
the mesh so we can't automatically disable direct lighting.
2017-11-10 01:37:10 +01:00