This save a little memory and copying in the kernel by storing only a 4x3
matrix instead of a 4x4 matrix. We already did this in a few places, and
those don't need to be special exceptions anymore now.
This is in preparation of making Transform affine only, and also gives us
a little extra type safety so we don't accidentally treat it as a regular
4x4 matrix.
The purpose of the previous code refactoring is to make the code more readable,
but combined with this change benchmarks also render about 2-3% faster with an
NVIDIA Titan Xp.
This is an issue with which value to trust: fps vs. tbr. They both cam be
somewhat broken. Currently the idea is:
- If file was saved with FFmpeg AND we are decoding with FFmpeg we trust tbr.
- If we are decoding with Libav we use fps (there does not seem to be tbr in
Libav, unless i'm missing something).
- All other cases we use fps.
Seems to work all good for files from T53857, T54148 and T51153. Ideally we
would need to collect some amount of regression files to make further tweaks
more scientific.
Reviewers: mont29
Reviewed By: mont29
Differential Revision: https://developer.blender.org/D3083
around the volume.
We generate a tight mesh around the active voxels of the volume in order
to effectively skip empty space, and start volume ray marching as close
to interesting volume data as possible. See code comments for details on
how the mesh generation algorithm works.
This gives up to 2x speedups in some scenes.
Reviewed by: brecht, dingto
Reviewers: #cycles
Subscribers: lvxejay, jtheninja, brecht
Differential Revision: https://developer.blender.org/D3038
This is more important now that we will have tigther volume bounds that
we hit multiple times. It also avoids some noise due to RR previously
affecting these surfaces, which shouldn't have been the case and should
eventually be fixed for transparent BSDFs as well.
For non-volume scenes I found no performance impact on NVIDIA or AMD.
For volume scenes the noise decrease and fixed artifacts are worth the
little extra render time, when there is any.
This is used to determine which voxels are to be considered empty space.
Previously it was hardcoded for converting dense grids to OpenVDB grids
to reduce disk space usage.
This value is also useful for rendering engines to know, i.e. to
optimize ray marching.
Some other software cannot handle grid names with spaces in them. We still check for names with spaces so as to not break old
files.
This fixes T53802.
Similar to the Principled BSDF, this should make it easier to set up volume
materials. Smoke and fire can be rendererd with just a single principled
volume node, the appropriate attributes will be used when available. The node
also works for simpler homogeneous volumes like water or mist.
Differential Revision: https://developer.blender.org/D3033
We now continue transparent paths after diffuse/glossy/transmission/volume
bounces are exceeded. This avoids unexpected boundaries in volumes with
transparent boundaries. It is also required for MIS to work correctly with
transparent surfaces, as we also continue through these in shadow rays.
The main visible changes is that volumes will now be lit by the background
even at volume bounces 0, same as surfaces.
Fixes T53914 and T54103.
It seems to be useful still in cases where the particle are distributed in
a particular order or pattern, to colorize them along with that. This isn't
really well defined, but might as well avoid breaking backwards compatibility
for now.
This is like the only way to add variety to hair which is created
using simple children. Used here for the hair.
Maybe not ideal, but the time will show.
Burley SSS uses a bit of strange thing where the albedo and closure weight are
different, which makes the subsurface color act a bit like a subsurface radius
indirectly by the way the Burley SSS profile works.
This can't work for random walk SSS though, and it's not clear to me that this
is actually a good idea since it's really the subsurface radius that is supposed
to control this. For now I'll leave Burley SSS working the same to not break
backwards compatibility.
It is basically brute force volume scattering within the mesh, but part
of the SSS code for faster performance. The main difference with actual
volume scattering is that we assume the boundaries are diffuse and that
all lighting is coming through this boundary from outside the volume.
This gives much more accurate results for thin features and low density.
Some challenges remain however:
* Significantly more noisy than BSSRDF. Adding Dwivedi sampling may help
here, but it's unclear still how much it helps in real world cases.
* Due to this being a volumetric method, geometry like eyes or mouth can
darken the skin on the outside. We may be able to reduce this effect,
or users can compensate for it by reducing the scattering radius in
such areas.
* Sharp corners are quite bright. This matches actual volume rendering
and results in some other renderers, but maybe not so much real world
objects.
Differential Revision: https://developer.blender.org/D3054
This brings separate initialization for libcuda and libnvrtc, which
fixes Cycles nvrtc compilation not working on build machines without
CUDA hardware available.
Differential Revision: https://developer.blender.org/D3045
We should actually be using CL_DEVICE_MEM_BASE_ADDR_ALIGN for sub buffers,
previous change in this code was incorrect. Renamed the function now to
make the specific purpose of this alignment clear, it's not required for
data types in general.
This patch changes the huge list of projects in visual studio into a nice tree matching the source folder structure. see D2823 for details.
Differential Revision: http://developer.blender.org/D2823
nvcc is very picky regarding compiler versions, severely limiting the compiler we can use, this commit adds a nvrtc based compiler that'll allow us to build the cubins even if the host compiler is unsupported. for details see D2913.
Differential Revision: http://developer.blender.org/D2913
This adds midlevel and object/world space for displacement, and a
vector displacement node with tangent/object/world space, midlevel
and scale.
Note that tangent space vector displacement still is not exactly
compatible with maps created by other software, this will require
changes to the tangent computation.
Differential Revision: https://developer.blender.org/D1734
This was disabled to avoid updating the geometry every time when the
material includes displacement, because there was no way to distinguish
between surface shader and displacement updates.
As a solution, we now compute an MD5 hash of the nodes linked to the
displacement socket, and only update the mesh if that changes.
Differential Revision: https://developer.blender.org/D3018
This reverts commit 3c852ba0741f794a697f95073b04921e9ff94039. This is breaking
the regression tests, and maybe requires some deeper changes to really fix.
There was a check for volume bounces at every surface intersection. That could lead to a volume scattered path being terminated
when passing through a transparent surface. This check was superfluous, as the volume shader evaluation already checks the
number of volume bounces and once it passes the max, volume shaders will not return scatter events any more.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: brecht, #cycles
Tags: #cycles
Maniphest Tasks: T53914
Differential Revision: https://developer.blender.org/D3024
Previously we stored each color channel in a single closure, which was
convenient for sampling a closure and channel together. But this doesn't
work so well for algorithms where we want to render multiple color
channels together.
Previously only scalar displacement along the normal was supported,
now displacement can go in any direction. For backwards compatibility,
a Displacement node will be automatically inserted in existing files.
This will make it possible to support vector displacement maps in the
future. It's already possible to use them to some extent, but requires
a manual shader node setup. For tangent space maps the right tangent
may also not be available yet, depends on the map.
Differential Revision: https://developer.blender.org/D3015
This converts object space height to world space displacement, to be
linked to the new vector displacement material output.
Differential Revision: https://developer.blender.org/D3015
This was we can introduce other types of BVH, for example, wider ones, without
causing too much mess around boolean flags.
Thoughs:
- Ideally device info should probably return bitflag of what BVH types it
supports.
It is possible to implement based on simple logic in device/ and mesh.cpp,
rest of the changes will stay the same.
- Not happy with workarounds in util_debug and duplicated enum in kernel.
Maybe enbum should be stores in kernel, but then it's kind of weird to include
kernel types from utils. Soudns some cyclkic dependency.
Reviewers: brecht, maxim_d33
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3011
Debug flags are to be controlling render behavior, nothing to do with low level
system utilities.
it was simple to hack, but logically is wrong. Lets do things where they are
supposed to be done!
The displacement shared was running before particle data was copied to the
device causing bad memory access when the particle info node was used. Fix
is simply to move particle update before mesh update so the data is
available to displacement shaders.
(Altho this fixes the crash the particle info node is still mostly useless
with displacement for now...)
Adds the code to get screen size of a point in world space, which is
used for subdividing geometry to the correct level. The approximate
method of treating the point as if it were directly in front of the
camera is used, as panoramic projections can become very distorted
near the edges of an image. This should be fine for most uses.
There is also no support yet for offscreen dicing scale, though
panorama cameras are often used for rendering 360° renders anyway.
Fixes T49254.
Differential Revision: https://developer.blender.org/D2468
This can be enabled in the Film panel, with an option to control the
transmisison roughness below which glass becomes transparent.
Differential Revision: https://developer.blender.org/D2904
The offscreen dicing scale helps to significantly reduce memory usage,
by reducing the dicing rate for objects the further they are outside of
the camera view.
The dicing camera can be specified now, to keep the geometry fixed and
avoid crawling artifacts in animation. It is also useful for debugging,
to see the tesselation from a different camera location.
Differential Revision: https://developer.blender.org/D2891
In that case it can now fall back to CPU memory, at the cost of reduced
performance. For scenes that fit in GPU memory, this commit should not
cause any noticeable slowdowns.
We don't use all physical system RAM, since that can cause OS instability.
We leave at least half of system RAM or 4GB to other software, whichever
is smaller.
For image textures in host memory, performance was maybe 20-30% slower
in our tests (although this is highly hardware and scene dependent). Once
other type of data doesn't fit on the GPU, performance can be e.g. 10x
slower, and at that point it's probably better to just render on the CPU.
Differential Revision: https://developer.blender.org/D2056
SVM nodes need to read all data to get the right offset for the following node.
This is quite weak, a more generic solution would be good in the future.
We can not store pointers to elements of collection property in the
case we modify that collection. This is like storing pointers to
elements of array before calling realloc().
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.
This partially fixes issue reported in T53452.
Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
Reason is motsly that dealing with type conversion in calling code is
not great, makes it less readable, and can generate hidden bugs in case
original type changes and atomic primitive calls are not updated
accordingly...
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
Was giving difference when using sharpness of 1.0 and 0.999 even though the
result was expected to be really close to each other.
This SSS profile will probably be removed in the future in favor of more
physically bases Burley, but for the time being don't see anything wrong
fixing an existing code.
Previously, Mikktspace just bucketed the vertices based on one spatial coordinate and then ran full pairwise comparisons inside each bucket.
However, since models are three-dimensional, the bucketing has a massive false-positive rate, and since pairwise comparison is O(n^2), the merging process is very slow.
But, since we only care about exactly identical vertices, there is a much more efficient approach - we can just hash all values belonging to each vertex and form buckets based on the hash.
Since the hash has 32 bits and considers all values, false-positives are very unlikely - and since both hashing and the radixsort that's used for bucketing are O(n), both asymptotical and
real-world performance (as well as code complexity) are significantly improved.
No color pass because it's hard to define what to use as color in a volume.
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D2903
There was some changes about namespaces, which causes ambiguities.
Replaces using namespace with an explicit symbols we need. Is good idea to NOT
pull in the whole namespace anyway!
Previously we picked one of the RGB channels with equal probability, but this
works poorly in a dense volume after many bounces. Now we take into account
the throughput and single scattering albedo.
This makes it a little more practical to do brute force SSS with volumes, but
is still very inefficient because we do direct light sampling at every volume
bounce even when inside an opaque mesh. In theory there could be a light inside
the mesh so we can't automatically disable direct lighting.
In fact this was an existing issue when exceeding the number of available
closure, but it's more common now that we set the number to 0 for shadows
and emission
Goal is to reduce OpenCL kernel recompilations.
Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2775
The algorithm averages normals from nearby surfaces. It uses the same
sampling strategy as BSSRDFs, casting rays along the normal and two
orthogonal axes, and combining the samples with MIS.
The main concern here is that we are introducing raytracing inside
shader evaluation, which could be quite bad for GPU performance and
stack memory usage. In practice it doesn't seem so bad though.
Note that using this feature can easily slow down renders 20%, and
that if you care about performance then it's better to use a bevel
modifier. Mainly this is useful for baking, and for cases where the
mesh topology makes it difficult for the bevel modifier to work well.
Differential Revision: https://developer.blender.org/D2803
This causes some difference in the classroom scene, where ray visibility
tricks are used and break the MIS balance. Otherwise there doesn't seem
to be much effect, but better to use the right formulas. Problem originally
identified by Lukas.
With a Titan Xp, reduces path trace local memory from 1092MB to 840MB.
Benchmark performance was within 1% with both RX 480 and Titan Xp.
Original patch was implemented by Sergey.
Differential Revision: https://developer.blender.org/D2249
This is a prequisite for getting host memory allocation to work. There appears
to be no support for 3D textures using host memory. The original version of
this code was written by Stefan Werner for D2056.
Some drivers may report very large allocation sizes, which could cause
unnecessary memory usage. This is now limited to 2gb which should
still be enough to get the needed performance benefits without waste.
The idea is to make it possible to report extra meta data from
render engine to the file writing. This way we can provide
additional information such as number of samples rendered by
resumable Cycles rendering so we can easily combine files back.
Currently only report number of samples from Cycles when rendering
a single render-layer scene. This is something what was required
here at the studio. We can easily extend that further.
Ideally we would also need to support non-string metadata, but
that's for later.
Reviewers: mont29, campbellbarton
Reviewed By: mont29, campbellbarton
Subscribers: sybren, candreacchio
Differential Revision: https://developer.blender.org/D2502
We are already using the AO distance, so might as well offer this extra
control over the intensity. Useful when an interior scene is supposed to
be significantly darker than the background shader.
The test will leak CPU devices, but is all passing other than that.
Leak will be fixed shortly.
P.S. Committing code refactor without running regression tests, tsk ;)
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
through device_only_memory, device_vector and device_pixels.
Progressive refine undoes memory saving from save buffers, so enabling
both does not make much sense. Previously enabling progressive refine
would disable denoising, but it should be the other way around since
denoise actually affects the render result.
Includes some code refactor for progressive refine render buffers, and
avoids recomputing tiles for each progressive sample.
CPU rendering will be restricted to a BVH2, which is not ideal for raytracing
performance but can be shared with the GPU. Decoupled volume shading will be
disabled to match GPU volume sampling.
The number of CPU rendering threads is reduced to leave one core dedicated to
each GPU. Viewport rendering will also only use GPU rendering still. So along
with the BVH2 usage, perfect scaling should not be expected.
Go to User Preferences > System to enable the CPU to render alongside the GPU.
Differential Revision: https://developer.blender.org/D2873
This is fully unpredictable for artists when one damaged object makes the whole
scene to render incorrectly. This involves two main changes:
- It is not enough to check triangle bounds to be valid when building BVH.
This is because triangle might have some finite vertices and some non-finite.
- We shouldn't add non-finite triangle area to the overall area for MIS.
Was a mistake in optimization commit which was disconnecting closures and nodes
which does not make sense for volume output.
OSL script we can't ignore and can't currently know in advance if it's a proper
volume shader or not. So we never disconnect OSL nodes from volume output.
This is a good candidate for corrective release.
This patch goes away form using C++ RNA during tangent space calculation which
avoids quite a bit of overhead. Now all calculation is done using data which
already exists in ccl::Mesh. This means, tangent space is now calculated from
triangles, which doesn't seem to be any different (at least as far as regression
tests are concerned).
One of the positive sides is that this change makes it possible to move tangent
space calculation from blender/ to render/ so we will have Cycles standalone
supporting tangent space.
Reviewers: brecht, lukasstockner97, campbellbarton
Differential Revision: https://developer.blender.org/D2810
This change affects CUDA GPUs not connected to a display or connected to a
display but supporting compute preemption so that the display does not
freeze. I couldn't find an official list, but compute preemption seems to be
only supported with GTX 1070+ and Linux (not GTX 1060- or Windows).
This helps improve small tile rendering performance further if there are
sufficient samples x number of pixels in a single tile to keep the GPU busy.
Best guess is that cuInit() somehow interferes with the AMD graphics driver
on Windows, and switching the initialization order to do OpenCL first seems
to solve the issue.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
Two issues here:
- Checking table size to be non-zero is not a proper way to go here. This is
because we first resize the table and then fill it in. So it was possible that
non-initialized table was used.
Trickery with using temporary memory and then doing table.swap() might work,
but we can not guarantee that table size will be set after the data pointer.
- Mutex guard was useless, because every thread was using own mutex. Need to
make mutex guard static so all threads are using same mutex.
The issue was caused by light sample being evaluated to nan at some point.
This is root of the cause which is to be fixed, but is very hard to trace down
especially via ssh (the issue only happens on AVX2 release build). Will give it
a closer look when back to my AVX2 machine.
For until then this is a good check to have anyway, it corresponds to what's
happening in regular radiance sum.
The work size is still very conservative, and this doesn't help for progressive
refine. For that we will need to render multiple tiles at the same time. But this
should already help for denoising renders that require too much memory with big
tiles, and just generally soften the performance dropoff with small tiles.
Differential Revision: https://developer.blender.org/D2856
This was originally done with the first sample in the kernel for better
performance, but it doesn't work anymore with atomics. Any benefit was
very minor anyway, too small to measure it seems.
This removes a bunch of code that is no longer needed, and running
"make update" will now automatically download the new libraries.
Differential Revision: https://developer.blender.org/D2861