While building the AVX kernel, util_avxf.h/avxb.h were using some AVX2 intrinsics,
these were never called, so it wasn't a run-time issue, but the intrinsics headers
on centos excluded the AVX2 prototypes when building the AVX kernel causing build errors.
This commit cleans up the improper usage of the AVX2 intrinsics and provides AVX
fallback implementations for future use.
Differential Revision: https://developer.blender.org/D3696
This isn't really possible to do the shuffle which was attempted to do.
While it's possible to achieve expected behavior, the function needs to
be rewritten. Since it's not used anyway, it's simpler to remove it for
now.
Failed as in did not allocate due to possibly weight cutoff.
Tryign to allocated Extra storage for closure in such situation
will consfuse Cycles and cause crashes later one due to obscure
values in ShaderData.
* WITH_SYSTEM_OPENJPEG is removed and is now always on, this was already
the case for macOS and Windows.
* This should not break existing Linx builds. If there is no new enough
OpenJPEG installed, CMake will no find libopenjp2 and WITH_IMAGE_OPENJPEG
will be disabled.
* install_deps.sh was updated with new package names, since distributions
put this version in a new package.
Differential Revision: https://developer.blender.org/D3663
The assembler template was backing up and restoring ebx, which is
fair enough. However, this did not prevent compiler for putting
result variables to ebx. This was causing data corruption.
In order to prevent this easiest solution is to list ebx in clobbers
for the assembly.
This is an initial implementation of BVH8 optimization structure
and packated triangle intersection. The aim is to get faster ray
to scene intersection checks.
Scene BVH4 BVH8
barbershop_interior 10:24.94 10:10.74
bmw27 02:41.25 02:38.83
classroom 08:16.49 07:56.15
fishy_cat 04:24.56 04:17.29
koro 06:03.06 06:01.45
pavillon_barcelona 09:21.26 09:02.98
victor 23:39.65 22:53.71
As memory goes, peak usage raises by about 4.7% in a complex
scenes.
Note that BVH8 is disabled when using OSL, this is because OSL
kernel does not get per-microarchitecture optimizations and
hence always considers BVH3 is used.
Original BVH8 patch from Anton Gavrikov.
Batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
Building the CUDA kernels takes quite a bit of memory, and when building all of
them the combined usage can be too much on some systems (especially VMs).
Therefore, this patch adds an option to force the build system to build them
sequentially by making each build step depend on the previous kernel.
Reviewers: brecht, sergey
Differential Revision: https://developer.blender.org/D3623
This is in preparation of upgrading our library dependencies, some of which
need C++11. We already use C++11 in blender2.8 and for Windows and macOS, so
this just affects Linux.
On many distributions this will not require any changes, on some
install_deps.sh will need to be run again to rebuild libraries.
Differential Revision: https://developer.blender.org/D3568
Just basic algebra - because all vectors have the same z coordinate, a lot of terms end up cancelling out.
Not exactly a massive improvement, but it's measurable with Branched PT and a high sample count on the lamp.
Reviewers: brecht, sergey
Reviewed By: brecht
Subscribers: swerner
Differential Revision: https://developer.blender.org/D3540
Gathers information about object geometry and textures. Very basic at
this moment, but need to start somewhere.
Things which needs to be included still:
- "Runtime" information, like BVH. While it is not directly controllable
by artists, it's still important to know.
- Device array sizes. Again, not under artists control, but is added to
the overall size.
- Memory peak at different synchronization stages.
At this point it simply prints info to the stdout after F12 is done,
need better control over that too.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D3566