Debug flags are to be controlling render behavior, nothing to do with low level
system utilities.
it was simple to hack, but logically is wrong. Lets do things where they are
supposed to be done!
The displacement shared was running before particle data was copied to the
device causing bad memory access when the particle info node was used. Fix
is simply to move particle update before mesh update so the data is
available to displacement shaders.
(Altho this fixes the crash the particle info node is still mostly useless
with displacement for now...)
Adds the code to get screen size of a point in world space, which is
used for subdividing geometry to the correct level. The approximate
method of treating the point as if it were directly in front of the
camera is used, as panoramic projections can become very distorted
near the edges of an image. This should be fine for most uses.
There is also no support yet for offscreen dicing scale, though
panorama cameras are often used for rendering 360° renders anyway.
Fixes T49254.
Differential Revision: https://developer.blender.org/D2468
This can be enabled in the Film panel, with an option to control the
transmisison roughness below which glass becomes transparent.
Differential Revision: https://developer.blender.org/D2904
The offscreen dicing scale helps to significantly reduce memory usage,
by reducing the dicing rate for objects the further they are outside of
the camera view.
The dicing camera can be specified now, to keep the geometry fixed and
avoid crawling artifacts in animation. It is also useful for debugging,
to see the tesselation from a different camera location.
Differential Revision: https://developer.blender.org/D2891
In that case it can now fall back to CPU memory, at the cost of reduced
performance. For scenes that fit in GPU memory, this commit should not
cause any noticeable slowdowns.
We don't use all physical system RAM, since that can cause OS instability.
We leave at least half of system RAM or 4GB to other software, whichever
is smaller.
For image textures in host memory, performance was maybe 20-30% slower
in our tests (although this is highly hardware and scene dependent). Once
other type of data doesn't fit on the GPU, performance can be e.g. 10x
slower, and at that point it's probably better to just render on the CPU.
Differential Revision: https://developer.blender.org/D2056
SVM nodes need to read all data to get the right offset for the following node.
This is quite weak, a more generic solution would be good in the future.
We can not store pointers to elements of collection property in the
case we modify that collection. This is like storing pointers to
elements of array before calling realloc().
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.
This partially fixes issue reported in T53452.
Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
Reason is motsly that dealing with type conversion in calling code is
not great, makes it less readable, and can generate hidden bugs in case
original type changes and atomic primitive calls are not updated
accordingly...
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.