This converts object space height to world space displacement, to be
linked to the new vector displacement material output.
Differential Revision: https://developer.blender.org/D3015
This was we can introduce other types of BVH, for example, wider ones, without
causing too much mess around boolean flags.
Thoughs:
- Ideally device info should probably return bitflag of what BVH types it
supports.
It is possible to implement based on simple logic in device/ and mesh.cpp,
rest of the changes will stay the same.
- Not happy with workarounds in util_debug and duplicated enum in kernel.
Maybe enbum should be stores in kernel, but then it's kind of weird to include
kernel types from utils. Soudns some cyclkic dependency.
Reviewers: brecht, maxim_d33
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3011
Debug flags are to be controlling render behavior, nothing to do with low level
system utilities.
it was simple to hack, but logically is wrong. Lets do things where they are
supposed to be done!
The displacement shared was running before particle data was copied to the
device causing bad memory access when the particle info node was used. Fix
is simply to move particle update before mesh update so the data is
available to displacement shaders.
(Altho this fixes the crash the particle info node is still mostly useless
with displacement for now...)
Adds the code to get screen size of a point in world space, which is
used for subdividing geometry to the correct level. The approximate
method of treating the point as if it were directly in front of the
camera is used, as panoramic projections can become very distorted
near the edges of an image. This should be fine for most uses.
There is also no support yet for offscreen dicing scale, though
panorama cameras are often used for rendering 360° renders anyway.
Fixes T49254.
Differential Revision: https://developer.blender.org/D2468
This can be enabled in the Film panel, with an option to control the
transmisison roughness below which glass becomes transparent.
Differential Revision: https://developer.blender.org/D2904
The offscreen dicing scale helps to significantly reduce memory usage,
by reducing the dicing rate for objects the further they are outside of
the camera view.
The dicing camera can be specified now, to keep the geometry fixed and
avoid crawling artifacts in animation. It is also useful for debugging,
to see the tesselation from a different camera location.
Differential Revision: https://developer.blender.org/D2891
In that case it can now fall back to CPU memory, at the cost of reduced
performance. For scenes that fit in GPU memory, this commit should not
cause any noticeable slowdowns.
We don't use all physical system RAM, since that can cause OS instability.
We leave at least half of system RAM or 4GB to other software, whichever
is smaller.
For image textures in host memory, performance was maybe 20-30% slower
in our tests (although this is highly hardware and scene dependent). Once
other type of data doesn't fit on the GPU, performance can be e.g. 10x
slower, and at that point it's probably better to just render on the CPU.
Differential Revision: https://developer.blender.org/D2056
SVM nodes need to read all data to get the right offset for the following node.
This is quite weak, a more generic solution would be good in the future.
We can not store pointers to elements of collection property in the
case we modify that collection. This is like storing pointers to
elements of array before calling realloc().
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.
This partially fixes issue reported in T53452.
Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.