The key indices were wrong: need to offset curve key index
by first curve key index. Also corrected calculation of the
interpolation step.
Annoyingly, can not reproduce this on a simple file, need
production rig. For the possible future look the following
file from Spring was used: 03_005_A.lighting.debug.blend
There are some changes in API of OpenImageIO, but those are quite
simple to keep working with older and newer library versions.
Reviewers: brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D4064
It used to be used for some sort of ignoring automatically
generated bump nodes. But nowadays it causes one of the shaders
in Classroom demo file to be compiled wrong.
This is unfortunate, but the number of bugs in this configuration
keeps growing, and almost all of them are caused by bug in OpenCL
compiler.
The compiler is not likely to be fixed, since Apple declared OpenCL
deprecated.
This evil commit is aimed to keep officially supported features
of Blender in a good working and stable state.
While atomics library was trying to use "user-space" defined
LIKELY() and UNLIKELY(), this is not always true that user
code was checking for those macro coming from an unrelated
area.
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.
The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").
Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.
Reviewers: brecht, sergey, swerner
Reviewed By: brecht, swerner
Differential Revision: https://developer.blender.org/D3892
In some instances, the number of control vertices of a hair could change mid-frame.
Cycles would then be unable to calculate proper motion blur for those hairs. This adds
interpolated CVs to fill in for the missing data. While this will not necessarily result in
a fully accurate reconstruction of the guide hair, it preserves motion blur instead of disabling it.
Reviewers: #cycles, sergey
Reviewed By: #cycles, sergey
Subscribers: sergey, brecht, #cycles
Tags: #cycles
Differential Revision: https://developer.blender.org/D3695
For some reason when building with gcc-7.2 (which is default
in previous Ubuntu LTS) the guarded allocator is not being
properly instantiated.
Doesn't happen with newer version of gcc-7 which is 7.3, and
also doesn't happen with gcc-6 and gcc-8.
Would be nice to know what is wrong, but for the time being
committing workaround which keeps Blender users happy.
While we shouldn't have logic in an entry point, and since one should
not be making typos when moving lines around, there is bigger entanglement
issue with BVH host code using kernel function. This is bad violation,
but is tricky to get solved moments before the weekly.
In order to keep things in a (less) broken state than before own cleanup
reverting the changes.
This reverts commit 2bad10be96540ff50a149230d656e599775b3f47.
This reverts commit ddabb21d0584e9874e8e5c62c04abe496ec7334b
Note that this is turned off by default and must be enabled at build time with the CMake WITH_CYCLES_EMBREE flag.
Embree must be built as a static library with ray masking turned on, the `make deps` scripts have been updated accordingly.
There, Embree is off by default too and must be enabled with the WITH_EMBREE flag.
Using Embree allows for much faster rendering of deformation motion blur while reducing the memory footprint.
TODO: GPU implementation, deduplication of data, leveraging more of Embrees features (e.g. tessellation cache).
Differential Revision: https://developer.blender.org/D3682
Mainly useful for debugging. Previously, when AVX2 was disabled
in the debug panel but BVH layout was kept on BVH8 nothing was
rendered.
Needed to make it so supported BVH layout mask for devices is
queried in "dynamic", so it is possible to use DebugFlags there.
Needed for the animation denoiser since the denoising filter is done separately there.
Reviewers: brecht, sergey
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3833
This allows for extra output passes that encode automatic object and material masks
for the entire scene. It is an implementation of the Cryptomatte standard as
introduced by Psyop. A good future extension would be to add a manifest to the
export and to do plenty of testing to ensure that it is fully compatible with other
renderers and compositing programs that use Cryptomatte.
Internally, it adds the ability for Cycles to have several passes of the same type
that are distinguished by their name.
Differential Revision: https://developer.blender.org/D3538
Apparently quite a few users would like to have the noisy pass available when using the denoiser, and since it's being generated anyways we might as well expose it by default.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D3608
This function is supposed to prevent the black artifacts caused by strong normal- or bumpmapping, but failed in some cases.
Now the code correctly handles all test files and previous issues I am aware of and also has extensive comments describing
the algorithm and the math behind it.
Basically, the main problem was that there can be multiple valid solutions that fulfil the reflection angle criterium,
but I had assumed that only one would exist and therefore simply picked the first solution with a positive term in srqt().
Now, the code uses additional validity checks and a simple heuristic to pick the best valid solution.
Additionally, the code messed up very shallow reflections even if the normal map strength was zero due to the constant
limit for the outgoing ray angle, which caused shallow incoming rays to fail the initial test even when reflected directly
on Ng. Now, the code accounts for this by reducing the threshold in the case of a shallow incoming ray, ensuring that at
least N=Ng is always a valid solution.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D3816
Since my temporary buffer commit (about a month ago), the OpenCL device was zeroing the wrong buffer, leading to
completely wrong filtered feature passes and therefore significantly lower-quality results than CPU and CUDA.
With new jemalloc versions memory allocated by threads that then become
inactive is not longer automatically freed. Instead we have to enable a
background thread to do it.
Some testing is needed to find out of this is sufficient, because the
background thread only runs periodically.
The crash was caused by BVH traversal stack being overflowed.
That overflow was caused by lots of false-positive intersections
for rays originating on a non-finite location.
Not sure why those rays will be existing in the first place,
this is to be investigated separately.
This commit moves pre-SSE4.1 check to a higher level function
and enables it for all miroarchitectures.
While building the AVX kernel, util_avxf.h/avxb.h were using some AVX2 intrinsics,
these were never called, so it wasn't a run-time issue, but the intrinsics headers
on centos excluded the AVX2 prototypes when building the AVX kernel causing build errors.
This commit cleans up the improper usage of the AVX2 intrinsics and provides AVX
fallback implementations for future use.
Differential Revision: https://developer.blender.org/D3696
This isn't really possible to do the shuffle which was attempted to do.
While it's possible to achieve expected behavior, the function needs to
be rewritten. Since it's not used anyway, it's simpler to remove it for
now.
Failed as in did not allocate due to possibly weight cutoff.
Tryign to allocated Extra storage for closure in such situation
will consfuse Cycles and cause crashes later one due to obscure
values in ShaderData.
* WITH_SYSTEM_OPENJPEG is removed and is now always on, this was already
the case for macOS and Windows.
* This should not break existing Linx builds. If there is no new enough
OpenJPEG installed, CMake will no find libopenjp2 and WITH_IMAGE_OPENJPEG
will be disabled.
* install_deps.sh was updated with new package names, since distributions
put this version in a new package.
Differential Revision: https://developer.blender.org/D3663
The assembler template was backing up and restoring ebx, which is
fair enough. However, this did not prevent compiler for putting
result variables to ebx. This was causing data corruption.
In order to prevent this easiest solution is to list ebx in clobbers
for the assembly.
The script clearly states:
This makes the presumption that you are include al.h like
#include "al.h"
and not
#include <AL/al.h>
The reason for this is that the latter is not entirely portable.
Windows/Creative Labs does not by default put their headers in AL/ and
OS X uses the convention <OpenAL/al.h>.
This commit makes default precompiled OpenAL to be properly detected
and also removes hack on MacOS which was finding the OpenAL package but
then was overwriting include directory.
Note, that new audaspace in 2.8 is using expected #include <al.h>.
This is an initial implementation of BVH8 optimization structure
and packated triangle intersection. The aim is to get faster ray
to scene intersection checks.
Scene BVH4 BVH8
barbershop_interior 10:24.94 10:10.74
bmw27 02:41.25 02:38.83
classroom 08:16.49 07:56.15
fishy_cat 04:24.56 04:17.29
koro 06:03.06 06:01.45
pavillon_barcelona 09:21.26 09:02.98
victor 23:39.65 22:53.71
As memory goes, peak usage raises by about 4.7% in a complex
scenes.
Note that BVH8 is disabled when using OSL, this is because OSL
kernel does not get per-microarchitecture optimizations and
hence always considers BVH3 is used.
Original BVH8 patch from Anton Gavrikov.
Batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
Building the CUDA kernels takes quite a bit of memory, and when building all of
them the combined usage can be too much on some systems (especially VMs).
Therefore, this patch adds an option to force the build system to build them
sequentially by making each build step depend on the previous kernel.
Reviewers: brecht, sergey
Differential Revision: https://developer.blender.org/D3623
This is in preparation of upgrading our library dependencies, some of which
need C++11. We already use C++11 in blender2.8 and for Windows and macOS, so
this just affects Linux.
On many distributions this will not require any changes, on some
install_deps.sh will need to be run again to rebuild libraries.
Differential Revision: https://developer.blender.org/D3568
Just basic algebra - because all vectors have the same z coordinate, a lot of terms end up cancelling out.
Not exactly a massive improvement, but it's measurable with Branched PT and a high sample count on the lamp.
Reviewers: brecht, sergey
Reviewed By: brecht
Subscribers: swerner
Differential Revision: https://developer.blender.org/D3540
Gathers information about object geometry and textures. Very basic at
this moment, but need to start somewhere.
Things which needs to be included still:
- "Runtime" information, like BVH. While it is not directly controllable
by artists, it's still important to know.
- Device array sizes. Again, not under artists control, but is added to
the overall size.
- Memory peak at different synchronization stages.
At this point it simply prints info to the stdout after F12 is done,
need better control over that too.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D3566