This patch adds initial support for compute shaders to
the vulkan backend. As the development is oriented to the test-
cases we have the implementation is limited to what is used there.
It has been validated that with this patch that the following test
cases are running as expected
- `GPUVulkanTest.gpu_shader_compute_vbo`
- `GPUVulkanTest.gpu_shader_compute_ibo`
- `GPUVulkanTest.gpu_shader_compute_ssbo`
- `GPUVulkanTest.gpu_storage_buffer_create_update_read`
- `GPUVulkanTest.gpu_shader_compute_2d`
This patch includes:
- Allocating VkBuffer on device.
- Uploading data from CPU to VkBuffer.
- Binding VkBuffer as SSBO to a compute shader.
- Execute compute shader and altering VkBuffer.
- Download the VkBuffer to CPU ram.
- Validate that it worked.
- Use device only vertex buffer as SSBO
- Use device only index buffer as SSBO
- Use device only image buffers
GHOST API has been changed as the original design was created before
we even had support for compute shaders in blender. The function
`GHOST_getVulkanBackbuffer` has been separated to retrieve the command
buffer without a backbuffer (`GHOST_getVulkanCommandBuffer`). In order
to do correct command buffer processing we needed access to the queue
owned by GHOST. This is returned as part of the `GHOST_getVulkanHandles`
function.
Open topics (not considered part of this patch)
- Memory barriers & command buffer encoding
- Indirect compute dispatching
- Rest of the test cases
- Data conversions when requested data format is different than on device.
- GPUVulkanTest.gpu_shader_compute_1d is supported on AMD devices.
NVIDIA doesn't seem to support 1d textures.
Pull-request: #104518
Contributed by Yulia Kuznetcova at Apple.
NanoVDB is patched to give add address spaces required by Metal. We hope that
in the future Metal will support the generic address space.
For AMD and Intel this is currently not available since it causes a performance
regression also on scenes without volumes.
Pull Request #104837
This is a revert of a revert, because the initial revert is only
supposed to be in the release branch.
This reverts commit 3eed00dc546b5930b255572901915ccbc26c0e69.
This reverts commit 19222627c6b57df77d3bc9ae60d7850ab7e23abb.
Something went wrong here, seems like this commit merged the main branch
into the release branch, which should never be done.
This reverts commit 68181c2560db25c1bd2b70beccf6022c5aa00f39.
I merged 3.6 into 3.5 by mistake. Basically I had a PR against main,
then changed it in the last minute to be against 3.5 via the
web-interface unaware that I shouldn't do it without updating the
patch.
Original Pull Request: #104889
Note that the node group has its sockets names
translated, while the built-in nodes don't.
So we need to use data_ for the built-in nodes names,
and the sockets of the created node groups.
Pull Request #104889
This better aligns with OSX/Linux warnings.
Although `__pragma(warning(suppress:4100))` is not the same as
`__attribute__((__unused__))` in gcc (which only affects the attribute
instead of the line), it still seems to be better to use it than to
hide the warning entirely.
`MEM_delete()` is designed for type safe destruction and freeing, void
pointers make that impossible.
Was reviewing a patch that was trying to free a C-style custom data
pointer this way. Apparently MSVC compiles this just fine, other
compilers error out. Make sure this is a build error on all platforms
with a useful message.
The proper fix (bb9eb262d4d3) caused compilation problems with HIP, so we're
delaying it until 3.6.
To fix the original bug report (#104586), this is a quick workaround that'll
hopefully not upset the compiler.
Pull Request #104723
The internal compiler error appears to be gone. Unclear why it appeared in the
first place and why it's gone now. Just random kernel code changes causing it.
Pull Request #104719
- Rename roughness variables for more clarity - before, the SVM/OSL code would
set s and v to the linear roughness values, and the setup function would over-
write them with the distribution parameters. This actually caused a bug in the
albedo code, since it intended to use the linear roughness value, but ended up
getting the remapped value.
- Deduplicate the evaluation and sample functions. Most of their code is the
same, only the middle part is different.
- Changed albedo computation to return the sum of the intensities of the four
BSDF lobes. Previously, the code applied the inverse of the color->sigma
mapping from the paper - this returns the color specified in the node, but
for very dark hair (e.g. when using the Melanin controls) the result is
extremely low (e.g. 0.000001) despite the hair still reflecting a significant
amount of light (since the R lobe is independent of sigma). This causes issues
with the light component passes, so this change fixes#104586.
- There's quite a few computations at the start of the evaluation function that
are needed for sampling, evaluation and albedo computation, but only depend on
the view direction. Therefore, just precompute them - we still have space in
PrincipledHairExtra after all.
- Fix a tiny bug - the direction sampling code did not account for the R lobe
roughness modifier.
Pull Request #104669
As a side effect of this change, more resolution divisions are now available.
Before this patch the possible resolution divisions were all powers of two.
Now the possible resolution divisions are the multiples of pixel_size.
This increase in possible resolution divisions is the same idea proposed in https://archive.blender.org/developer/D13590.
In that patch there were concerns that this will increase the time between a user navigating
and seeing the 1:1 render. To my knowledge this is a non-issue and there should be
little to no increase in time between those two events.
Pull Request #104450
(Follow on from D17043)
On AMD Navi2 devices the MetalRT checkbox was not hooked up properly and had no effect. This patch fixes it.
Co-authored-by: Michael Jones <michael_p_jones@apple.com>
Pull Request #104520
The required version numbers for various devices was hardcoded in the
UI messages. The result was that every time one of these versions was
bumped, every language team had to update the message in question.
Instead, the version numbers can be extracted, and injected into the
error messages using string formatting so that translation updates
need happen less frequently.
Pull Request #104488
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17173
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.
Patch authored by Marco Giordano.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17153
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16909
This patch removes the option to select both AMD and Intel GPUs on system that have both. Currently both devices will be selected by default which results in crashes and other poorly understood behaviour. This patch adds precedence for using any discrete AMD GPU over an integrated Intel one. This can be overridden with CYCLES_METAL_FORCE_INTEL.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17166
This patch fixes T103393 by undefining `__LIGHT_TREE__` on Metal/AMD as it has an unexpected & major impact on performance even when light trees are not in use.
Patch authored by Prakash Kamliya.
Reviewed By: brecht
Maniphest Tasks: T103393
Differential Revision: https://developer.blender.org/D17167
Mutex locks for manipulating GHOST_System::m_timerManager from
GHOST_SystemWayland relied on WAYLAND being the only user of the
timer-manager.
This isn't the case as timers are fired from
`GHOST_System::dispatchEvents`.
Resolve by using a separate timer-manager for wayland key-repeat timers.
Resolve a thread safety issue reported by valgrind's helgrind checker,
although I wasn't able to redo the error in practice.
NULL check on the key-repeat timer also needs to lock, otherwise it's
possible the timer is set in another thread before the lock is acquired.
Now all key-repeat timer access which may run from a thread
locks the timer mutex before any checks or timer manipulation.
This is both a cleanup and a preparation for the Principled v2 changes.
Notable changes:
- Clearcoat weight is now folded into the closure weight, there's no reason
to track this separately.
- There's a general-purpose helper for computing a Closure's albedo, which is
currently used by the denoising albedo and diffuse/gloss/transmission color
passes.
- The d/g/t color passes didn't account for closure albedo before, this means
that e.g. metallic shaders with Principled v2 now have their color texture
included in the glossy color pass. Also fixes T104041 (sheen albedo).
- Instead of precomputing and storing the albedo during shader setup, compute
it when needed. This is technically redundant since we still need to compute
it on shader setup to adjust the sample weight, but the operation is cheap
enough that freeing up the storage seems worth it.
- Future changes (Principled v2) are easier to integrate since the Fresnel
handling isn't all over the place anymore.
- Fresnel handling in the Multiscattering GGX code is still ugly, but since
removing that entirely is the next step, putting effort into cleaning it up
doesn't seem worth it.
- Apart from the d/g/t color passes, no changes to render results are expected.
Differential Revision: https://developer.blender.org/D17101
Cycles ignores the size of spot lights, therefore the illuminated area doesn't match the gizmo. This patch resolves this discrepancy.
| Before (Cycles) | After (Cycles) | Eevee
|{F14200605}|{F14200595}|{F14200600}|
This is done by scaling the ray direction by the size of the cone. The implementation of `spot_light_attenuation()` in `spot.h` matches `spot_attenuation()` in `lights_lib.glsl`.
**Test file**:
{F14200728}
Differential Revision: https://developer.blender.org/D17129
The image manager used to handle OSL textures on the GPU by
default loads images after displacement is evaluated. This is a
problem when the displacement shader uses any textures, hence
why the geometry manager already makes the image manager
load any images used in the displacement shader graph early
(`GeometryManager::device_update_displacement_images`).
This only handled Cycles image nodes however, not OSL nodes, so
if any `texture` calls were made in OSL those would be missed and
therefore crash when accessed on the GPU. Unfortunately it is not
simple to determine which textures referenced by OSL are needed
for displacement, so the solution for now is to simply load all of
them early if true displacement is used.
This patch also fixes the result of the displacement shader not
being used properly in OptiX.
Maniphest Tasks: T104240
Differential Revision: https://developer.blender.org/D17162
Also minor changes in comments:
- Reference BLENDER_HISTORY_FILE instead of the literal file-name
(simplifies looking up usage).
- Use usernames in tags, as noted in code-style.
The `MultiDevice` implementation of `get_cpu_osl_memory` returns a
nullptr when there is no CPU device in the mix. As such access to that
crashed in `update_osl_globals`. But that only updates maps that are not
currently used on the GPU anyway, so can just skip that when the CPU
is not used for rendering.
Maniphest Tasks: T104216
Paths to vulkan libraries, paths and related components were
hardcoded in the platform cmake file. This patch separates
this by using adding CMake modules for Vulkan and ShaderC.
This change has only been applied to the macOs configuration as
that is currently our main platform for development. Other platforms
will be added during the development of the Vulkan back-end.
Removing all OSL script nodes from the shader graph would cause that
graph to no longer report it using `KERNEL_FEATURE_SHADER_RAYTRACE`
via `ShaderManager::get_graph_kernel_features`, but the shader object
itself still would have the `has_surface_raytrace` field set.
This caused kernels to be reloaded without shader raytracing support, but
later the `DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE`
kernel would still be invoked since the shader continued to report it
requiring that through the `SD_HAS_RAYTRACE` flag set because of
`has_surface_raytrace`.
Fix that by ensuring `has_surface_raytrace` is reset on every shader update,
so that when all OSL script nodes are deleted it is set to false, and only
stays true when there are still OSL script nodes (or other nodes using it).
Maniphest Tasks: T104157
Differential Revision: https://developer.blender.org/D17140
Based on "Sampling the GGX Distribution of Visible Normals" by Eric Heitz
(https://jcgt.org/published/0007/04/01/).
Also, this removes the lambdaI computation from the Beckmann sampling code and
just recomputes it below. We already need to recompute for two other cases
(GGX and clearcoat), so this makes the code more consistent.
In terms of performance, I don't expect a notable impact since the earlier
computation also was non-trivial, and while it probably was slightly more
accurate, I'd argue that being consistent between evaluation and sampling is
more important than absolute numerical accuracy anyways.
Differential Revision: https://developer.blender.org/D17100
This gives closer results to what I've seen in papers and other renderers when
using the code to precompute albedo later (to replace MultiGGX).
It's usually a tiny difference, the only case where I've seen it matter is
in the `shader/node_group_float.blend` test - but that's a (single-scatter) GGX
closure with 0.9 roughness, so it's not too surprising. In any case, the new
result looks closer to Eevee, so that's good I guess.
Differential Revision: https://developer.blender.org/D17099
Speckles and missing lights were experienced in scenes with Nishita Sky
Texture and a Sun Size smaller than 1.5°, such as in Lone Monk and Attic
scenes.
Increasing the precision of cosf fixes it.
A noticeable (>5%) performance regression in oneAPI backend came with
a501a2dbff797829b61f21c5aeb9d19dba3e3874. Updating to latest graphics
compiler from driver 101.4032 fixes it.
I've tested it with current min-supported drivers and it runs well but
since compatibility of graphics compiler with older drivers isn't
guaranteed, I'm also bumping the min-supported driver versions.
If end-users consider latest drivers too fresh to switch to (version
isn't released as stable on Linux as of today but should be before
Blender 3.5 release), CYCLES_ONEAPI_ALL_DEVICES=1 env variable can be
used.
Intel Graphics Compiler on Linux will be updated in a later commit
so we can then close D16984.
Reviewed By: sergey, LazyDodo
Improve handling for cases where maximum in-flight command buffer count is exceeded. This can occur during light-baking operations. Ensures the application handles this gracefully and also improves workload pipelining by situationally stalling until GPU work has completed, if too much work is queued up.
This may have a tangible benefit for T103742 by ensuring Blender does not queue up too much GPU work.
Authored by Apple: Michael Parkin-White
Ref T96261
Ref T103742
Depends on D17018
Reviewed By: fclem
Maniphest Tasks: T103742, T96261
Differential Revision: https://developer.blender.org/D17019
This isn't correct as window activation is handled separately
from the cursor entering/leaving a window.
This would call de-activate when the cursor moved outside the window
even though the window remained focused.
Rely on focus changes which already handle activate/deactivate events.
Swap-buffers was being deferred (to prevent it being called
from the event handling thread) however when it was called the
OpenGL context might not be active (especially with multiple windows).
Moving the cursor between windows made eglSwapBuffers report:
EGL Error (0x300D): EGL_BAD_SURFACE.
Resolve this by removing swapBuffer calls and instead add a
GHOST_kEventWindowUpdateDecor event intended for redrawing
client-side-decoration.
Besides the warning, this results an error with LIBDECOR window frames
not redrawing when a window became inactive.
The background evaluation samples the sky discretely, so if the sun is
too small, it can be missed in the evaluation. To solve this, the sun is
ignored during the background evaluation and its contribution is
computed separately.
The lookup table method on CPU and the numerical root finding method on
GPU give quite different results. This commit deletes the Beckmann lookup
table and uses numerical root finding on all devices. For the numerical
root finding, a combined bisection-Newton method with precision control
is used.
Differential Revision: https://developer.blender.org/D17050
This makes it possible to use `texture` and `texture3d` in custom
OSL shaders with a constant image file name as argument on the
GPU, where previously texturing was only possible through Cycles
nodes.
For constant file name arguments, OSL calls
`OSL::RendererServices::get_texture_handle()` with the file name
string to convert it into an opaque handle for use on the GPU.
That is now used to load the respective image file using the Cycles
image manager and generate a SVM handle that can be used on
the GPU. Some care is necessary as the renderer services class is
shared across multiple Cycles instances, whereas the Cycles image
manager is local to each.
Maniphest Tasks: T101222
Differential Revision: https://developer.blender.org/D17032
This patch adds markup to specify that certain kernel data constants should not be specialised. Currently it is used for `tabulated_sobol_sequence_size` and `sobol_index_mask` which change frequently based on the aa sample count, trash the shader cache, and have little bearing on performance.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16968
This patch adds occupancy tuning for the newly announced high-end M2 machines, giving 10-15% render speedup over a pre-tuned build.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17037
These warnings can reveal errors in logic, so quiet them by checking
if the features are enabled before using variables or by assigning
empty strings in some cases.
- Check CMAKE_THREAD_LIBS_INIT is set before use as CMake docs
note that this may be left unset if it's not needed.
- Remove BOOST/OPENVDB/VULKAN references when disable.
- Define INC_SYS even when empty.
- Remove PNG_INC from freetype (not defined anywhere).
wi is the viewing direction, and wo is the illumination direction. Under this notation, BSDF sampling always samples from wi and outputs wo, which is consistent with most of the papers and mitsuba. This order is reversed compared with PBRT, although PBRT also traces from the camera.
While keeping SSE2, SSE4.1 and AVX2. This does not affect hardware support, it
only slightly reduces performance for some older CPUs.
To reduce maintenance cost and improve compile times.
Differential Revision: https://developer.blender.org/D16978
An alternative fix to [0] which caused an error with ASAN
(freeing an GHOST_ISystem instead of a GHOST_System).
Reported by @Baardaap in chat, I'm unable to reproduce the issue.
Instead of calling the destructor directly, add a private method that
deletes data before raising an exception.
[0]: fd36221930e35efd2c09af8fb91234a510e3b9dc
Use a consistent style for declaring the names of struct members
in their declarations. Note that this convention was already used in
many places but not everywhere.
Remove spaces around the text (matching commented arguments) with
the advantage that the the spell checking utility skips these terms.
Making it possible to extract & validate these comments automatically.
Also use struct names for `bAnimChannelType` & `bConstraintTypeInfo`
which were using brief descriptions.
Recently the event handling thread for Wayland sometimes used 100% of a
CPU core while idle.
Resolve by waiting for changes to the Wayland file-handle when
there are no events to read.
The previous fix from T100855 [0] no longer works on my system
(3.4.1 release also fails for GNOME/KDE/WLROOTS compositors).
Resolve by removing the loop from the wait-on-file handle check.
Also reduce locking/unlocking calls.
[0]: 37b256e26feb454d9febd84dac1b1ce8b8d84d90
User notifications in Blender were always annoying and therefore by default turned off.
- When tweaking compositor/material tree a notification was shown.
- When rendering an animation for each frame a notification was shown.
The reason for this was that it was automatically shown when a background job was
finished and Blender wasn't the top most application.
In stead of migrating user notification to UserNotification.framework it was decided
to remove it for now. If in the future notifications should be added back we should
start with a design to figure out where notifications makes sense.
Reviewed By: sergey, brecht
Maniphest Tasks: T103757
Differential Revision: https://developer.blender.org/D16955
Cycles converts internal links to converter nodes which don't do anything and
later on get collapsed by the graph optimization. However, the previous code
assumed that one Blender input socket maps to one Cycles input socket.
When a node is muted, there might be internal links from one input to multiple
outputs. In Cycles, this meant that one Blender input socket now mapped to
multiple input sockets on all the converter nodes, so only the last one survived.
THe fix is simple, just make the mapping a MultiMap.
The problem here is that whether an object is a shadow catcher or not affects the
visibility flags, but changes to the shadow catcher property did not trigger a
visibility flag update.
With the GPU API the sampler can not be set after texture binding, which caused
a delay of the actual change. Now do both in a single call for correctness and
performance.
OpenGL is deprecated by Apple and triggers a warning when used. The goal
is that OpenGL is replaced by Metal backend, but we are not there yet.
To improve tracability of new warnings we hide deprecation warnings
when the GHOST_ContextCGL.h file is included.
NOTE: This change silences other deprecation warnings as well.
The T103586 fix effectively ran the wl_surface_listener.leave callback
to as WLROOTS based compositors doesn't run them. Remove the workaround
since it's an error in WLROOTS to be fixed upstream.
Temporarily using the wrong window scale when disconnecting a monitor
on configurations that use different DPI per monitor is a minor enough
issue that I don't think it makes sense to workaround in GHOST.
Wayland requires the windows surface size is divisible by the surface
scale. This wasn't guaranteed when creating new temporary windows.
This meant opening the preferences could exit Blender with an error
with Hi-DPI configurations.
**Changes**
As described in T93602, this patch removes all use of the `MVert`
struct, replacing it with a generic named attribute with the name
`"position"`, consistent with other geometry types.
Variable names have been changed from `verts` to `positions`, to align
with the attribute name and the more generic design (positions are not
vertices, they are just an attribute stored on the point domain).
This change is made possible by previous commits that moved all other
data out of `MVert` to runtime data or other generic attributes. What
remains is mostly a simple type change. Though, the type still shows up
859 times, so the patch is quite large.
One compromise is that now `CD_MASK_BAREMESH` now contains
`CD_PROP_FLOAT3`. With the general move towards generic attributes
over custom data types, we are removing use of these type masks anyway.
**Benefits**
The most obvious benefit is reduced memory usage and the benefits
that brings in memory-bound situations. `float3` is only 3 bytes, in
comparison to `MVert` which was 4. When there are millions of vertices
this starts to matter more.
The other benefits come from using a more generic type. Instead of
writing algorithms specifically for `MVert`, code can just use arrays
of vectors. This will allow eliminating many temporary arrays or
wrappers used to extract positions.
Many possible improvements aren't implemented in this patch, though
I did switch simplify or remove the process of creating temporary
position arrays in a few places.
The design clarity that "positions are just another attribute" brings
allows removing explicit copying of vertices in some procedural
operations-- they are just processed like most other attributes.
**Performance**
This touches so many areas that it's hard to benchmark exhaustively,
but I observed some areas as examples.
* The mesh line node with 4 million count was 1.5x (8ms to 12ms) faster.
* The Spring splash screen went from ~4.3 to ~4.5 fps.
* The subdivision surface modifier/node was slightly faster
RNA access through Python may be slightly slower, since now we need
a name lookup instead of just a custom data type lookup for each index.
**Future Improvements**
* Remove uses of "vert_coords" functions:
* `BKE_mesh_vert_coords_alloc`
* `BKE_mesh_vert_coords_get`
* `BKE_mesh_vert_coords_apply{_with_mat4}`
* Remove more hidden copying of positions
* General simplification now possible in many areas
* Convert more code to C++ to use `float3` instead of `float[3]`
* Currently `reinterpret_cast` is used for those C-API functions
Differential Revision: https://developer.blender.org/D15982
When rendering in the viewport (or probably on instanced objects, but I didn't
test that), emissive objects whose scale is negative give the wrong value on the
"backfacing" input when multiple sampling is enabled.
The underlying problem was a corner case in how normal transformation is handled,
which is generally a bit messy.
From what I can tell, the pattern appears to be:
- If you first transform vertices to world space and then compute the normal from
them (as triangle light samping, MNEE and light tree do), you need to flip
whenever the transform has negative scale regardless of whether the transform
has been applied
- If you compute the normal in object space and then transform it to world space
(as the regular shader_setup_from_ray path does), you only need to flip if the
transform was already applied and was negative
- If you get the normal from a local intersection result (as bevel and SSS do),
you only need to flip if the transform was already applied and was negative
- If you get the normal from vertex normals, you don't need to do anything since
the host-side code does the flip for you (arguably it'd be more consistent to
do this in the kernel as well, but meh, not worth the potential slowdown)
So, this patch fixes the logic in the triangle emission code.
Also, turns out that the MNEE code had the same problem and was also having
problems in the viewport on negative-scale objects, this is also fixed now.
Differential Revision: https://developer.blender.org/D16952
The code that computes and inverts the shutter CDF had some issues that caused
the result to be asymmetric, this tweaks it to be more robust and produce
symmetric outputs for symmetric inputs.
At the first bounce, the diffuse/glossy/transmission weights are stored so that
contributions along the path can be split into the d/g/t indirect passes.
However, volume bounces always set the weight even at indirect bounces, so
even paths that had their first bounce on a purely glossy object would suddenly
start counting towards the diffuse indirect pass after a secondary volume bounce.
This was caused by rB0d73d5c1a2, which releases the scene mutex during kernel
loading. However, the reset mutex was still held, which can cause a deadlock
if another thread tries to reset the session, since it will acquire the
released scene mutex and then wait for the reset mutex.
Turns out there's no point in keeping the reset mutex locked after the delayed
reset section, so now we just release it earlier, which resolves the deadlock.
This breaks backwards compatibility some in that 3 sides will be mapped
differently now, but difficult to avoid and can be considered a bugfix.
Similar to rBdd8016f7081f.
Maniphest Tasks: T103615
Differential Revision: https://developer.blender.org/D16910
To better estimate light contribution. Note that estimating the texture
from the IES file is still missing.
Contributed by Alaska.
Differential Revision: https://developer.blender.org/D16901
This issue was introduced in rB78f28b55d39288926634d0cc.
The fix is to use a `std::shared_ptr` to ensure that the `Global` will live
long enough until all `Local` objects are destructed.
This patch adds a new "Kernel Optimization Level" dropdown menu to control Metal kernel specialisation. Currently this defaults to "full" optimisation, on the assumption that the changes proposed in D16371 will address usability concerns around app responsiveness and shader cache housekeeping.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16514
All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16371
Both, the guarded and lockfree allocator, are keeping track of current
and peak memory usage. Even the lockfree allocator used to use a
global atomic variable for the memory usage. When multiple threads
use the allocator at the same time, this variable is highly contended.
This can result in significant slowdowns as presented in D16862.
While specific cases could always be optimized by reducing the number
of allocations, having this synchronization point in functions used by
almost every part of Blender is not great.
The solution is use thread-local memory counters which are only added
together when the memory usage is actually requested. For more details
see in-code comments and D16862.
Differential Revision: https://developer.blender.org/D16862
WLROOTS compositors don't run surface leave callbacks,
while this may be considered a bug in WLROOTS, neither GTK/SDL crash
so workaround the crash too.
This also fixes a minor glitch where the cursor scale wasn't updated
when changing monitor scale at run-time.
This functionality is related only to debugging of SYCL implementation
via single-threaded CPU execution and is disabled by default.
Host device has been deprecated in SYCL 2020 spec and we removed it
in 305b92e05f748a0fd9cb62b9829791d717ba2d57.
Since this is still very useful for debugging, we're restoring a
similar functionality here through SYCL 2020 Host Task.
The "osl_texture" intrinsic was not implemented correctly. It should handle alpha
separately from color, the number of channels input parameter only counts color
channels.
This panel showed a duplication of options that were in the main light panel and only mistakenly shows up in the workbench engine where lights should have no options.
This panel was also used by the POV-Ray add-on but that was removed recently.
This reverts commit a3a9459050a96e75138b3441c069898f211f179c.
And fixes T103337.
a3a9459050a9 has some flaws and it needs to go through review (See D16803).
Conflicts:
intern/ghost/intern/GHOST_SystemWin32.cpp
Partially addresses T72011.
The problem here is that the previous barycentric clamping did not deal well
with skinny triangles and would end up generating "sub-pixel jittering"
locations that were actually >20 pixels away.
Differential Revision: https://developer.blender.org/D16727
Expands Color Mix nodes with new Exclusion mode.
Similar to Difference but produces less contrast.
Requested by Pierre Schiller @3D_director and
@OmarSquircleArt on twitter.
Differential Revision: https://developer.blender.org/D16543
Materials without connections to the output node would crash with OSL
in OptiX, since the Cycles `OSLCompiler` generates an empty shader
group reference for them, which resulted in the OptiX device
implementation setting an empty SBT entry for the corresponding direct
callables, which then crashed when calling those direct callables was
attempted in `osl_eval_nodes`. This fixes that by setting the SBT entries
for empty shader groups to a dummy direct callable that does nothing.
Switching viewport denoising causes kernels to be reloaded with a new
feature mask, which would destroy the existing OptiX pipelines. But OSL
kernels were not reloaded as well, leaving the shading pipeline
uninitialized and therefore causing an error when it is later attempted to
execute it. This fixes that by ensuring OSL kernels are always reloaded
when the normal kernels are too.
Recent reverting of changes to cursor grabbing intended to match
Blender 3.3 release. This is the case for 3.4x branch, however there is
an additional change to grabbing on WIN32 by Germano [0] which is a
significant improvement on old grabbing logic for Windows.
So instead of matching 3.3x behavior, restore logic that keeps
the cursor centered while grabbing & hidden.
This re-introduces T102792 issue displaying the paint-brush while
dragging buttons, this will have to be solved separately.
Re-apply [1] & [2], revert [3] & [4].
[4]: a3a9459050a96e75138b3441c069898f211f179c
[0]: 9fd6dae7939a65b67045749a0eadeb6864ded183
[1]: 4cac8025f00798938813f52dcb117be83db97f22
[2]: 230744d6fd96dcf5afe66a8f9b9f6f8bbe1f41bb
[3]: 0240b895994aa58258db6897ae0d6478da7fce5f
Replace ../lib/linux_centos7_x86_64 with ../lib/linux_x86_64_glibc_228,
built with Rocky8 Linux, compatible with the VFX platform CY2023,
see: T99618.
- Update build-bot configuration.
- Remove unnecessary check for Blosc, this is part of OpenVDB lib now.
- Remove WITH_CXX11_ABI, always use new C++11 ABI now
- Replace centos7 by glibc_228 everywhere
Note that existing builds with cached paths pointing to
"../lib/linux_centos7_x86_64" will need to be updated.
Includes contributions by Brecht.
Under Wayland the transform cursor wasn't displaying the warped cursor.
This worked on other platforms because cursor motion is warped where as
Wayland simulates cursor warping, so it's necessary to apply warping
when requesting the cursor location too.
This reverts commits
9fd6dae7939a65b67045749a0eadeb6864ded183,
4cac8025f00798938813f52dcb117be83db97f22 (minor cleanup).
Re-introducing T102346, which will be fixed in isolation.
Unfortunately even when the cursor is hidden & grabbed,
the underlying cursor coordinates are still shown in some cases.
This caused bug where dragging a button in the sculpt-context popup
would draw the brush at unexpected locations because internally
the cursor was warping in the middle of the window, reported as T102792.
Resolving this issue with the paint cursor is possible but tend towards
over-complicated solutions.
Revert this change in favor of a more localized workaround for walk-mode
(as was done prior [0] to fix T99021).
[0]: 4c4e8cc926a672ac60692b3fb8c20249f9cae679
This is a solution in response to the issues mentioned in comments on
rBe4f1d719080a and T103088.
Apparently the workaround of checking if the mouse is already inside
the area on the next event doesn't work for some tablets.
Perhaps the order of events or some very small jitter is causing this
issue on tablets. (Couldn't confirm).
Whatever the cause, the solution of checking the timestamp of the event
and thus ignoring the outdated ones is theoretically safer.
It is the same solution seen in MacOS.
Also calling `SendInput` 3 times every warp ensures that at least one
event is dispatched.
This adds a new mirror image extension type for shaders and
geometry nodes (next to the existing repeat, extend and clip
options).
See D16432 for a more detailed explanation of `wrap_mirror`.
This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`.
It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT`
flag must be set for the `MIRROR` flag to have an effect.
Differential Revision: https://developer.blender.org/D16432
The `render_color_index` skips attributes with different types
and domains in order to give the proper order for the UI list.
That is a different than an index in the group of all attributes.
The most solid solution I could think of is exposing the name of
the default color attribute. It's "solid" because we always address
attributes by name internally. Doing something different is bound
to create problems. It's also aligned with the design in T98366 and
D15169.
Another option would be to change the way the "attribute index"
is incremented in Cycles. That would be a valid solution, but would
be more complex and annoying.
For consistency, I also exposed the name of the active color attribute
the same way, though it isn't necessary to fix this particular bug.
The properties aren't editable, that can come in 3.5 as part of D15169.
Differential Revision: https://developer.blender.org/D16769
This is done based on the render sample count so that it doesn't impact
sampling quality. It's similar in spirit to the adaptive table size in D16561,
but in this case for performance rather than memory usage.
Differential Revision: https://developer.blender.org/D16726
The first two dimensions of scrambled, shuffled Sobol and shuffled PMJ02 are
equivalent, so this makes no real difference for the first two dimensions.
But Sobol allows us to naturally extend to more dimensions.
Pretabulated Sobol is now always used, and the sampling pattern settings is now
only available as a debug option.
This in turn allows the following two things (also implemented):
* Use proper 3D samples for combined lens + motion blur sampling. This
notably reduces the noise on objects that are simultaneously out-of-focus
and motion blurred.
* Use proper 3D samples for combined light selection + light sampling.
Cycles was already doing something clever here with 2D samples, but using
3D samples is more straightforward and avoids overloading one of the
dimensions.
In the future this will also allow for proper sampling of e.g. volumetric
light sources and other things that may need three or four dimensions.
Differential Revision: https://developer.blender.org/D16443
While it helps on many scenes, it can be disruptive for existing scenes and
for benchmarks the differences in timing can be confusing. So be a bit more
conservative and only it enable it for new scenes.
The use of a struct for device strings caused the CUDA compiler to
generate byte arrays as the argument type, whereas OSL generated
primitive integer types (for the hash). Fix that by using a typedef
instead so that the CUDA compiler too will use an integer type in the
PTX it generates.
Maniphest Tasks: T101222
Allow keyboard layouts which include "dead keys" to enter diacritics
by calling MapVirtualKeyW even when not key_down.
See D16770 for more details.
Differential Revision: https://developer.blender.org/D16770
Reviewed by Campbell Barton
Recently a new geometry node for splitting edges was added in D16399.
However, there was already a similar implementation in mesh.cc that was
mainly used to fake auto smooth support in Cycles by splitting sharp
edges and edges around sharp faces.
While there are still possibilities for optimization in the new code,
the implementation is safer and simpler, multi-threaded, and aligns
better with development plans for caching topology on Mesh and other
recent developments with attributes.
This patch removes the old code and moves the node implementation to
the geometry module so it can be used in editors and RNA. The "free
loop normals" argument is deprecated now, since it was only an internal
optimization exposed for Cycles.
The new mesh `editors` function creates an `IndexMask` of edges to
split by reusing some of the code from the corner normal calculation.
This change will help to simplify the changes in D16530 and T102858.
Differential Revision: https://developer.blender.org/D16732
There has been an attempt to reorganize this part, however, it seems that didn't compile on HIP, and is reverted in
rBc2dc65dfa4ae60fa5d2c3b0cfe86f99dcb5bf16f. This is another attempt of refactoring. as I have no idea why some things don't work on HIP, it's
best to check whether this compiles on other platforms.
The main changes are creating a new struct named `MeshLight` that is shared between `KernelLightDistribution` and `KernelLightTreeEmitter`,
and a bit of renaming, so that light sampling with or without light tree could call the same function.
Also, I noticed a patch D16714 referring to HIP compilation error. Not sure if it's related, but browsing
https://builder.blender.org/admin/#/builders/30/builds/7826/steps/7/logs/stdio, it didn't work on gfx1102, not gfx9*.
Differential Revision: https://developer.blender.org/D16722
The OpenVDB delay loading of voxel leaf data using mmap works poorly on network
drives. It has a mechanism to make a temporary local file copy to avoid this,
but we disabled that as it leads to other problems.
Now disable delay loading entirely. It's not clear that this has much benefit
in Blender. For rendering we need to load the entire grid to convert to NanoVDB,
and for geometry nodes there also are no cases where we only need part of grids.
To avoid issues with lights being either skipped or sampled unnecessarily
when the exposure is set low or high.
Contributed by Alaska.
Differential Revision: https://developer.blender.org/D16703
* preempt_attr was copied from CUDA, but not used in HIP.
* Remove shadowed variable before conditional in EnvironmentTextureNode code.
Differential Revision: https://developer.blender.org/D16741
Improve a few messages, but mostly fix typos in many areas of the UI.
See inline comments in the differential revisiion for the rationale
behind the various changes.
Differential Revision: https://developer.blender.org/D16716
Texture usage flags can now be provided during texture creation specifying
the ways in which a texture can be used. This allows the GPU backends to
perform contextual optimizations which were not previously possible. This
includes enablement of hardware lossless compression which can result in
a 15%+ performance uplift for bandwidth-limited scenes on hardware such
as Apple-Silicon using Metal.
GPU_TEXTURE_USAGE_GENERAL can be used by default if usage is not known
ahead of time. Patch will also be relevant for the Vulkan backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D15967
The PDF of mesh lights were not being scaled by `pdf_selection` when
the light tree was disable. This resulted in the mesh lights having
the wrong PDF and thus the wrong brightness.
Differential Revision: https://developer.blender.org/D16717
Compiling Cycles in Visual Studio 2022 yields the error:
C4146: unary minus operator applied to unsigned type, result still unsigned
Replacing it with explicit two's complement achieves the same result as signed
negation but avoids the error.
Differential Revision: https://developer.blender.org/D16616
For file formats like PNG, JPEG and TIFF. Eventually this should use
the OpenColorIO view transform, but this at least makes the image
closer to what it should be in most cases.
Differential Revision: https://developer.blender.org/D16482
**Problem**:
Area lights in Cycles have spread angle, in which case some part of the area light might be invisible to a shading point. The current implementation samples the whole area light, resulting some samples invisible and thus simply discarded. A technique is applied on rectangular light to sample a subset of the area light that is potentially visible (rB3f24cfb9582e1c826406301d37808df7ca6aa64c), however, ellipse (including disk) area lights remained untreated. The purpose of this patch is to apply a techniques to ellipse area light.
**Related Task**:
T87053
**Results**:
These are renderings before and after the patch:
|16spp|Disk light|Ellipse light|Square light (for reference, no changes)
|Before|{F13996789}|{F13996788}|{F13996822}
|After|{F13996759}|{F13996787}|{F13996852}
**Explanation**:
The visible region on an area light is found by drawing a cone from the shading point to the plane where the area light lies, with the aperture of the cone being the light spread.
{F13990078,height=200}
Ideally, we would like to draw samples only from the intersection of the area light and the projection of the cone onto the plane (forming a circle). However, the shape of the intersection is often irregular and thus hard to sample from directly.
{F13990104,height=200}
Instead, the current implementation draws samples from the bounding rectangle of the intersection. In this case, we still end up with some invalid samples outside of the circle, but already much less than sampling the original area light, and the bounding rectangle is easy to sample from.
{F13990125}
The above technique is only applied to rectangle area lights, ellipse area light still suffers from poor sampling. We could apply a similar technique to ellipse area lights, that is, find the
smallest regular shape (rectangle, circle, or ellipse) that covers the intersection (or maybe not the smallest but easy to compute).
For disk area light, we consider the relative position of both circles. Denoting `dist` as the distance between the centre of two circles, and `r1`, `r2` their radii. If `dist > r1 + r2`, the area light is completely invisible, we directly return `false`. If `dist < abs(r1 - r2)`, the smaller circle lies inside the larger one, and we sample whichever circle is smaller. Otherwise, the two circles intersect, we compute the bounding rectangle of the intersection, in which case `axis_u`, `len_u`, `axis_v`, `len_v` needs to be computed anew. Depending on the distance between the two circles, `len_v` is either the diameter of the smaller circle or the length of the common chord.
|{F13990211,height=195}|{F13990225,height=195}|{F13990274,height=195}|{F13990210,height=195}
|`dist > r1 + r2`|`dist < abs(r1 - r2)`|`dist^2 < abs(r1^2 - r2^2)`|`dist^2 > abs(r1^2 - r2^2)`
For ellipse area light, it's hard to find the smallest bounding shape of the intersection, therefore, we compute the bounding rectangle of the ellipse itself, then treat it as a rectangle light.
|{F13990386,height=195}|{F13990385,height=195}|{F13990387,height=195}
We also check the areas of the bounding rectangle of the intersection, the ellipse (disk) light, and the spread circle, then draw samples from the smallest shape of the three. For ellipse light, this also detects where one shape lies inside the other. I am not sure if we should add this measure to rectangle area light and sample from the spread circle when it has smaller area, as we seem to have a better sampling technique for rectangular (uniformly sample the solid angle). Maybe we could add [area-preserving parameterization for spherical
ellipse](https://arxiv.org/pdf/1805.09048.pdf) in the future.
**Limitation**:
At some point we switch from sampling the ellipse to sampling the rectangle, depending on the area of the both, and there seems to be a visible line (with |slope| =1) on the final rendering
which demonstrate at which point we switch between the two methods. We could see that the new sampling method clearly has lower variance near the boundaries, but close to that visible line,
the rectangle sampling method seems to have larger variance. I could not spot any bug in the implementation, and I am not sure if this happens because different sampling patterns for ellipse and rectangle are used.
|Before (256spp)|After (256spp)
|{F13996995}|{F13996998}
Differential Revision: https://developer.blender.org/D16694
Part of the workaround for NVIDIA driver issue got lost in the changes to
switch to the GPU module.
Differential Revision: https://developer.blender.org/D16709
This updates the libraries dependencies for VFX platform 2023, and adds various
new libraries. It also enables Python bindings and switches from static to
shared for various libraries.
The precompiled libraries for all platforms will be updated to these new
versions in the coming weeks.
New:
Fribidi 1.0.12
Harfbuzz 5.1.0
MaterialX 1.38.6 (shared lib with python bindings)
Minizipng 3.0.7
Pybind11 2.10.1
Shaderc 2022.3
Vulkan 1.2.198
Updated:
Boost 1.8.0 (shared lib)
Cython 0.29.30
Numpy 1.23.2
OpenColorIO 2.2.0 (shared lib with python bindings)
OpenImageIO 2.4.6.0 (shared lib with python bindings)
OpenSubdiv 3.5.0
OpenVDB 10.0.0 (shared lib with python bindings)
OSL 1.12.7.1 (enable nvptx backend)
TBB (shared lib)
USD 22.11 (shared lib with python bindings, enable hydra)
yaml-cpp 0.8.0
Includes contributions by Ray Molenkamp, Brecht Van Lommel, Georgiy Markelov
and Campbell Barton.
Ref T99618
Ensure the environment is set up for blender_test, idiff and oslc so that they
can find the required shared libraries.
Also deduplicate add_bundled_libraries() between Linux and macOS.
Includes contributions by Ray Molenkamp and Brecht Van Lommel.
Ref T99618
This patch adds a new `max_working_set_exceeded()` check on Metal so that we can display a "System is out of GPU memory" message to the user. Without this, we get obtuse "CommandBuffer failed" errors at render time due to exceeding the size limit of resident resources.
Likely fix for T101787 & T102786.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16713
Bugs that caused wrong renders should be fixed now, and tests that showed minor
floating point differences on platforms were tweaked to sidestep the problem.
Ref T77889
Cursor motion events on windows read the position from GetCursorPos()
which wasn't always the same location stored in `lParam`.
In situations where events were handled immediately this wasn't often a
problem, for heavier scenes or when updates between event handling was
slow - many in-between cursor events would be incorrect.
This behavior dates back to the initial commit, there doesn't seem to be
a good reason not to use the cursor coordinates from the event.
Noticed when investigating T102346.
Make OpenGL errors match formatting used by GCC & clang
(as well as Blender's logging), so utilities that recognize this
convention can be used to quickly access this location.
Uses a light tree to more effectively sample scenes with many lights. This can
significantly reduce noise, at the cost of a somewhat longer render time per
sample.
Light tree sampling is enabled by default. It can be disabled in the Sampling >
Lights panel. Scenes using light clamping or ray visibility tricks may render
different as these are biased techniques that depend on the sampling strategy.
The implementation is currently disabled on AMD HIP. This is planned to be fixed
before the release.
Implementation by Jeffrey Liu, Weizhen Huang, Alaska and Brecht Van Lommel.
Ref T77889
This was not working well in non-trivial scenes before the light tree, and now
it is even harder to make it work well with the light tree. It would average the
with equal weight for every light object regardless of intensity or distance, and
be quite noisy due to not working with multiple importance sampling.
We may restore this if were enough good use cases for the previous implementation,
but let's wait and see what the feedback is.
Some uses cases for this have been replaced by the shadow catcher passes, which
did not exist when this was added.
Ref T77889
It is not really used from any of the sources, including the
standalone app. Since we are moving to a more backend-independent
drawing it makes sense to remove header which was specific to
how Blender integrates Cycles into viewport.
There is probably some cleanup in CMake files is possible, but
there is some inter-dependency with USD.
Differential Revision: https://developer.blender.org/D16681
This change fixes issues with viewport rendering when Metal
GPU backend is used for drawing. This is not a default build
configuration and requires the following tweaks:
- Enable WITH_METAL_BACKEND CMake option (set it to on)
- Use `--gpu-backend metal` command line arguments
It also helps using the `--factory-startup` command line
argument to ensure Eevee is not used (it is not ported and
will crash).
The root of the problem was in the use of glViewport().
It is replaced with the GPU_viewport_size_get_i() which
is supposed to be portable equivalent form the GPU module.
Without this change the viewport size is detected to be 0
which backfired in few places.
The rest of the changes were to make the code more robust
in the extreme conditions instead of asserting or crashing.
Simplified and streamlined GPU resources creation in the
display driver. It was a bit convoluted mix of creation of
the GPU resources and resizing them to the proper size. It
even seemed to be done in the reverse order. Now it is as
simple as "just ensure GPU resources are there for the
given texture or buffer size".
Also avoid division by zero in the tile manager.
Differential Revision: https://developer.blender.org/D16679
To make GPU backends other than OpenGL work. Adds required pixel buffer and
fence objects to GPU module.
Authored by Apple: Michael Parkin-White
Ref T96261
Ref T92212
Reviewed By: fclem, brecht
Differential Revision: https://developer.blender.org/D16042
In this case the blocksize may not the one we requested, which was assumed to be
the case. Instead get the effective block size from the compiler as was already
done for Metal and OneAPI.
Unless using WITH_CYCLES_DEBUG.
This is convenient for investigating kernel performance, but too verbose to
always have in the buildbot logs especially now that we are also compiling HIP
and OneAPI kernels.
Materials now have an enum to set the emission sampling method, to be
either None, Auto, Front, Back or Front & Back. This replace the
previous "Multiple Importance Sample" option.
Auto is the new default, and uses a heuristic to estimate the emitted
light intensity to determine of the mesh should be considered as a light
for sampling. Shaders sometimes have a bit of emission but treating them
as a light source is not worth the memory/performance overhead.
The Front/Back settings are not important yet, but will help when a
light tree is added. In that case setting emission to Front only on
closed meshes can help ignore emission from inside the mesh interior that
does not contribute anything.
Includes contributions by Brecht Van Lommel and Alaska.
Ref T77889
* Split light types into own files, move light type specific code from
light tree and MNEE.
* Move flat light distribution code into own kernel file and host side
building function, in preparation of light tree addition. Add light/sample.h
as main entry point to kernel light sampling.
* Better separate calculation of pdf for selecting a light, and pdf for
sampling a point on the light. The selection pdf is now also stored in
LightSampling for MNEE to correctly recalculate the full pdf when the
shading position changes but the point on the light remains fixed.
* Improvement to kernel light storage, using packed_float3, better variable
names, etc.
Includes contributions by Brecht Van Lommel and Weizhen Huang.
Ref T77889
Caused by a8a454287a27 which assumed it was possible
to access the raw data of the edge creases layer. Also allow
processing vertex creases even if there aren't any edge creases.
The wrong guiding distribution was used when direct and indirect light
scattering happened at different locations. Now use a different distribution
for each location.
Recording is not quite correct since OpenPGL does not support spliting the
path like this, instead recording at the start of the volume ray. In practice
this seems to make little difference.
Differential Revision: https://developer.blender.org/D16448
When either initializing with a non-constant value, or using the standard
[[ string widget = "null" ]] metadata. This can be used for inputs like
normals and texture coordinates, where you don't want to default to a
constant value.
In previous OSL versions the input value was automatically ignore when it
was left unchanged for such inputs. However that's no longer the case in
the latest version, breaking existing nodes. There is no good entirely
backwards compatible fix, but I believe the new behavior is better and will
keep most existing cases working.
Fix T102450: OSL node with normal input not working
It is possible that the image editor redraw happens prior to the
"Loading render kernels" status is reported from status but after
the display driver is created. This will make the image editor to
wait on the scene mutex to update the display pass in the film.
If it happens to be that the kernels are actually to be compiled
then the Blender interface appears to be completely frozen, without
any information line in the image editor.
This change makes it so the amount of time the scene mutex is held
during the kernel compilation is minimal.
It is a bit unideal to unlock and re-lock the scene mutex in the
middle of update, while nested reset mutex is held, but this is
already what is needed for the OptiX denoiser optimization some
lines below. We can probably reduce the lifetime of some locks,
avoiding such potential out-of-order re-locking. Doing so is
outside of the scope of this patch.
The scene update only happens from the single place in the session,
which makes it easy to ensure the kernels are loaded prior the rest
of the scene update.
Not only this change makes it so that the "Loading render kernels"
status appears in the image editor, but also allows to pan and zoom
in the image editor, potentially allowing artists to re-adjust their
point of interest.
Differential Revision: https://developer.blender.org/D16581
This prevents Blender from crashing with an access violation when
stopping a VR session using the DirectX backend. The issue occurred for
any headset on Windows+Nvidia when using the SteamVR runtime and thus
affected a large number of users.
The workaround presented here is to simply skip unregistering the
shared resources on exit, as either of the calls to
`wglDXUnregisterObjectNV()` or `wglDXCloseDeviceNV()` will result in an
access violation. While not ideal, this avoids the crash and doesn't
present any issues when starting a new VR session.
Reviewed By: Severin
Differential Revision: https://developer.blender.org/D16569
MoltenVK is part of the vulkan SDK. Blender requires the vulkan SDK
to compile. This patch adds the MoltenVK includes and libraries to
the Vulkan includes and libraries.
This adds a vulkan backend to GHOST. The code was extracted from the
tmp-vulkan branch. The main difference with the original code is that
GHOST isn't responsible for fallback. For Metal backend there is already
an idea that the GPU module is responsible for the fallback, not the system.
For Blender we target Vulkan 1.2 at the time of this patch.
MoltenVK (needed to convert Vulkan calls to Metal) has been added as
a separate package.
This patch isn't useful for end-users, currently when starting blender with
`--gpu-backend vulkan` it would crash as the `VBBackend` doesn't initialize
the expected global structs in the GPU module.
Validated to be working on Windows and Apple. Linux still needs to be tested.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D13155
This resolves some issues with correlation artifacts at higher sample counts.
Fix T101356, correlation issues in new PMJ pattern.
Differential Revision: https://developer.blender.org/D16561
Commit c8dd33f5a37b6a6db0b6950d24f9a7cff5ceb799 in OSL changed behavior of
parameters that reference each other and are also overwritten with an
instance value. This is causing the "NormalIn" parameter of a few OSL nodes
in Cycles to be set to zero somehow, which should instead have received the
value from a "node_geometry" node Cycles generates and connects automatically.
I am not entirely sure why that is happening, but these parameters are
superfluous anyway, since OSL already provides the necessary data in the
global variable "N". So this patch simply removes those parameters (which
mimics SVM, where these parameters do not exist either), which also fixes
the rendering artifacts that occured with recent OSL.
While this fixes built-in shader nodes, custom OSL scripts can still have
this problem.
Ref T101222
Differential Revision: https://developer.blender.org/D16470
Regression in [0] which exposed a problem with GHOST_kGrabHide on Win32
and to some extent X11.
Prior to [0], walk mode used it's own warping logic (hiding the cursor
& recording the motion between events). Using GHOST's grabbing makes
sense in this case as it's not very convenient for operators to
implement their own cursor warping, however doing so exposed a problem
where the mouse cursor could leave the window.
This would happen because the cursor needed to be within 2px of the
screen edge before warping.
Resolve by warping within a small region in the middle of the window.
Note that warping to the window center on each motion would be ideal
but is more involved as the logic for Win32 & X11 doesn't work properly
when every motion warps, so this needs further investigation to support.
This problem doesn't apply to GHOST/Cocoa which warps every motion event
on the spot and GHOST/Wayland doesn't set the mouse position at all to
implement this functionality.
[0]: 4c4e8cc926a672ac60692b3fb8c20249f9cae679
For some pixels with transparent surfaces, no depth value would be written
when sampling chooses a reflection/refraction BSDF instead of transparent
BSDF. Now ensure we always write at some some depth value to the pass.
This is still not ideal as the resulting depth values are noisy same as they
are for depth of field and motion blur, but at least there should be no gaps.
The issue here was that the Barbershop benchmark scene was saved with a
custom OCIO config, which leads to some textures having a unknown
colorspace when loading with a default installation.
This is automatically fixed by Blender during image loading, but since
Cycles queried the colorspace before actually loading the image, it
didn't get the updated value in the first render.
To fix this, just re-query the colorspace after the image is loaded.
Note that non-packed images still get treated as raw data if the
colorspace is unknown, but this is at least consistent and doesn't
magically change when you press F12 a second time.
Differential Revision: https://developer.blender.org/D16427
* Sort Training Samples first, since it affects both Surface and Volume guiding.
* Remove "Guiding" from Surface and Volume entries (UI only, the property
still has Guiding in the name)
Change reviewed in the render-cycles module channel.
This was previously needed due to poor compatibility between Visual Studio and
NVCC. But it has not been used for a while now as compatibility seems to have
improved.
The issue here was that the Barbershop benchmark scene was saved with a
custom OCIO config, which leads to some textures having a unknown
colorspace when loading with a default installation.
This is automatically fixed by Blender during image loading, but since
Cycles queried the colorspace before actually loading the image, it
didn't get the updated value in the first render.
To fix this, just re-query the colorspace after the image is loaded.
Note that non-packed images still get treated as raw data if the
colorspace is unknown, but this is at least consistent and doesn't
magically change when you press F12 a second time.
Differential Revision: https://developer.blender.org/D16427
Some of these functions should use locks although they didn't show up
as needing locks at runtime (using valgrind's helgrind), as some aren't
called often or aren't used at all. Add locks for correctness & to
prevent errors in the future.
- GHOST_SystemWayland::disposeContext
- GHOST_SystemWayland::getAllDisplayDimensions
- GHOST_SystemWayland::getButtons
- GHOST_SystemWayland::getMainDisplayDimensions
- GHOST_SystemWayland::getNumDisplays
- GHOST_WindowWayland::setWindowCustomCursorShape
libdecor has a workaround where creating the window would loop until
the windows configure callback ran.
Simplify this workaround by setting the initial state on the underlying
xdg_toplevel struct.
Also correct mixup between bool / GHOST_TSuccess types.
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).
Differential Revision: https://developer.blender.org/D16502