The traversable handle of a BLAS may be zero when the relevant geometry
is empty (no triangles/curves/points/...), as no BLAS is built in such cases.
It is not correct to attach a zero handle to a TLAS, so filter out such instances.
Currently the shade smooth status for mesh faces is stored as part of
`MPoly::flag`. As described in #95967, this moves that information
to a separate boolean attribute. It also flips its status, so the
attribute is now called `sharp_face`, which mirrors the existing
`sharp_edge` attribute. The attribute doesn't need to be allocated
when all faces are smooth. Forward compatibility is kept until
4.0 like the other mesh refactors.
This will reduce memory bandwidth requirements for some operations,
since the array of booleans uses 12 times less memory than `MPoly`.
It also allows faces to be stored more efficiently in the future, since
the flag is now unused. It's also possible to use generic functions to
process the values. For example, finding whether there is a sharp face
is just `sharp_faces.contains(true)`.
The `shade_smooth` attribute is no longer accessible with geometry nodes.
Since there were dedicated accessor nodes for that data, that shouldn't
be a problem. That's difficult to version automatically since the named
attribute nodes could be used in arbitrary combinations.
**Implementation notes:**
- The attribute and array variables in the code use the `sharp_faces`
term, to be consistent with the user-facing "sharp faces" wording,
and to avoid requiring many renames when #101689 is implemented.
- Cycles now accesses smooth face status with the generic attribute,
to avoid overhead.
- Changing the zero-value from "smooth" to "flat" takes some care to
make sure defaults are the same.
- Versioning for the edge mode extrude node is particularly complex.
New nodes are added by versioning to propagate the attribute in its
old inverted state.
- A lot of access is still done through the `CustomData` API rather
than the attribute API because of a few functions. That can be
cleaned up easily in the future.
- In the future we would benefit from a way to store attributes as a
single value for when all faces are sharp.
Pull Request: https://projects.blender.org/blender/blender/pulls/104422
When running unit tests or other fast completing renders, forced crashes can occur if there are any slow, outstanding PSO compilation requests (due to the `std::terminate` fall-back case in `~ShaderCache`).
This patch eliminates the need for this shutdown hack by using of the async version of `newComputePipelineStateWithDescriptor` when creating a PSO for the first time. In doing so, we are able to explicitly respond to app shutdown instead of waiting for the pipeline to finish compiling (..and then timing out and force-crashing). We still use the blocking version of `newComputePipelineStateWithDescriptor` when loading from an archive, as this can handle loading from a corrupted archive gracefully. Finally, we move `addComputePipelineFunctionsWithDescriptor` to *after* the PSO is built (as this will trigger a full blocking compile if the PSO has not yet been built, which would bring back the original issue).
Pull Request: https://projects.blender.org/blender/blender/pulls/105506
Some UI functions have a "translate" argument, which if set to False
specifies that the message is not to be translated. This sometimes
means that it was already translated beforehands.
But many messages were still getting extracted, sometimes twice in
different contexts. Some featured errors because the arguments of
various functions would be concatenated, such as:
```
col.label(text=iface_("Branch: %s") % bpy.app.build_branch.decode('utf-8', 'replace'), translate=False)
```
which would get extracted as:
```
msgid "Branch: %sutf-8replace"
```
Pull Request #105417
This commit implements three OSL microfacet closures that are needed to support
MaterialX: dielectric_bsdf, conductor_bsdf and generalized_schlick_bsdf.
Internally these map to existing microfacet closures, only the Fresnel term is
different.
Currently, we use the closure type to encode the type of microfacet distribution
(GGX/Beckmann/Sharp/MultiGGX), the lobes we're interested in
(Reflection/Refraction/both) AND the Fresnel type (None or Principled v1).
This results in the mess of dozens of options that we currently have. Since
adding Principled v2 and the MaterialX OSL closures will involve adding more
Fresnel types, this clearly doesn't scale.
But, since the earlier Fresnel rework (D17101), the Fresnel type only matters
in one place now. This allows to significantly clean up the closure type
handling. To do this, MicrofacetBsdfs now separately store their Fresnel type,
and instead of a single MicrofacetExtra we have one struct per Fresnel type
(unless no extra data is needed).
Further, instead of having one _setup() function per combination, the Fresnel
setup is also split into separate functions. This decouples the implementation
of new Fresnel terms from most of the Microfacet logic, and makes it a very
simple and clean operation.
This commit replaces the current Glass approach, where Glass is a virtual closure
that gets replaced with a Glossy and a Refractive closure, with a combined
closure that handles Fresnel after sampling the microfacet. That way, the Fresnel
term is more accurate since it accounts for the microfacet normal, not the
shading normal.
Also updates the BSDF sampling to use a 3D sampler now, since we need two
dimensions to pick the microfacet normal and then a third dimension to pick
reflection/refraction. This can also be used to get rid of the LCG in the
Principled Hair BSDF, which means we can remove it altogether once MultiGGX is
gone.
Also, "sharp" is now supported as a microfacet distribution in OSL, and 2
is supported as the "refract" argument to microfacet() in order to get glass.
This patch fixes hanging unit tests when MetalRT is enabled. It simplifies and fixes the kernel selection logic by baking the MetalRT-specific options into `kernels_md5` rather than expanding out and testing MetalRT bit flags explicitly.
Pull Request #105270
To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered.
The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine.
Pull Request #105173
For every other texture types this is expected to be implicitly
`GPU_DATA_FLOAT`. There is only one case where this is not the case.
I believe this was previously needed because the data type was
conditionning the texture creation. This is not the case anymore.
This is a workaround for [issue #104087](https://projects.blender.org/blender/blender/issues/104087). We encounter crashes when using shader binary archives on AMD, so this disables them while we investigate a proper fix. Kernels will still be cached automatically by the OS file system cache. This cache may occasionally be purged due to external factors, in which case kernels will get compiled again.
Pull Request #105186
Cycles fallback display shader previously did not use viewport.
This would crash or cause the display not to show when using
GPU backends other than OpenGL, if another display shader
was unavailable.
Now use ShaderCreateInfo for Cycles fallback display.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #104987
This fixes issue [#105100](https://projects.blender.org/blender/blender/issues/105100) where multi-pass renders can be incorrect due to kernels using stale specialisation constants (e.g. when rendering Pokedstudio).
This patch adds a new group of md5 hashes (`global_defines_md5`) to track whether the injected block of #defines is stale and regenerate the source string as appropriate. It also renames the existing group of md5 hashes from `source_md5` to `kernels_md5` to clarify that these refer to a specific kernel set rather than just the source (which might build an arbitrarily large number of kernel sets).
Pull Request #105103
On the Windows platform, raise windows and give them focus as the mouse
hovers over them. This allows keyboard shortcuts for the area under the
mouse without having to click the window caption to make them active.
Pull Request #104681
Revert of commits that allowed non-temp Blender windows to be saved
and restored that spanned multiple monitors on the Windows platform.
This causes problems with temp windows (like Preferences & Render) that
cannot currently be fixed.
See 104956 for much more details.
This change is being redone after it was accidentally reverted.
Differential Revision: https://projects.blender.org/blender/blender/pulls/104956
Reviewed by Ray Molenkamp
The light tree itself is disabled on the AMD GPUs due to a compiler issue.
There are couple of places where this was not fully checked:
- The `light_sample` function in the kernel.
- The light threshold during synchronization
The former one is solved as easy as just adding an ifdef block.
The latter one is solved by delaying the threshold assignment for
later on.
Pull Request #105022
This patch adds initial support for compute shaders to
the vulkan backend. As the development is oriented to the test-
cases we have the implementation is limited to what is used there.
It has been validated that with this patch that the following test
cases are running as expected
- `GPUVulkanTest.gpu_shader_compute_vbo`
- `GPUVulkanTest.gpu_shader_compute_ibo`
- `GPUVulkanTest.gpu_shader_compute_ssbo`
- `GPUVulkanTest.gpu_storage_buffer_create_update_read`
- `GPUVulkanTest.gpu_shader_compute_2d`
This patch includes:
- Allocating VkBuffer on device.
- Uploading data from CPU to VkBuffer.
- Binding VkBuffer as SSBO to a compute shader.
- Execute compute shader and altering VkBuffer.
- Download the VkBuffer to CPU ram.
- Validate that it worked.
- Use device only vertex buffer as SSBO
- Use device only index buffer as SSBO
- Use device only image buffers
GHOST API has been changed as the original design was created before
we even had support for compute shaders in blender. The function
`GHOST_getVulkanBackbuffer` has been separated to retrieve the command
buffer without a backbuffer (`GHOST_getVulkanCommandBuffer`). In order
to do correct command buffer processing we needed access to the queue
owned by GHOST. This is returned as part of the `GHOST_getVulkanHandles`
function.
Open topics (not considered part of this patch)
- Memory barriers & command buffer encoding
- Indirect compute dispatching
- Rest of the test cases
- Data conversions when requested data format is different than on device.
- GPUVulkanTest.gpu_shader_compute_1d is supported on AMD devices.
NVIDIA doesn't seem to support 1d textures.
Pull-request: #104518
Contributed by Yulia Kuznetcova at Apple.
NanoVDB is patched to give add address spaces required by Metal. We hope that
in the future Metal will support the generic address space.
For AMD and Intel this is currently not available since it causes a performance
regression also on scenes without volumes.
Pull Request #104837
This is a revert of a revert, because the initial revert is only
supposed to be in the release branch.
This reverts commit 3eed00dc546b5930b255572901915ccbc26c0e69.
This reverts commit 19222627c6b57df77d3bc9ae60d7850ab7e23abb.
Something went wrong here, seems like this commit merged the main branch
into the release branch, which should never be done.
This reverts commit 68181c2560db25c1bd2b70beccf6022c5aa00f39.
I merged 3.6 into 3.5 by mistake. Basically I had a PR against main,
then changed it in the last minute to be against 3.5 via the
web-interface unaware that I shouldn't do it without updating the
patch.
Original Pull Request: #104889
Note that the node group has its sockets names
translated, while the built-in nodes don't.
So we need to use data_ for the built-in nodes names,
and the sockets of the created node groups.
Pull Request #104889
This better aligns with OSX/Linux warnings.
Although `__pragma(warning(suppress:4100))` is not the same as
`__attribute__((__unused__))` in gcc (which only affects the attribute
instead of the line), it still seems to be better to use it than to
hide the warning entirely.
`MEM_delete()` is designed for type safe destruction and freeing, void
pointers make that impossible.
Was reviewing a patch that was trying to free a C-style custom data
pointer this way. Apparently MSVC compiles this just fine, other
compilers error out. Make sure this is a build error on all platforms
with a useful message.
The proper fix (bb9eb262d4d3) caused compilation problems with HIP, so we're
delaying it until 3.6.
To fix the original bug report (#104586), this is a quick workaround that'll
hopefully not upset the compiler.
Pull Request #104723
The internal compiler error appears to be gone. Unclear why it appeared in the
first place and why it's gone now. Just random kernel code changes causing it.
Pull Request #104719
- Rename roughness variables for more clarity - before, the SVM/OSL code would
set s and v to the linear roughness values, and the setup function would over-
write them with the distribution parameters. This actually caused a bug in the
albedo code, since it intended to use the linear roughness value, but ended up
getting the remapped value.
- Deduplicate the evaluation and sample functions. Most of their code is the
same, only the middle part is different.
- Changed albedo computation to return the sum of the intensities of the four
BSDF lobes. Previously, the code applied the inverse of the color->sigma
mapping from the paper - this returns the color specified in the node, but
for very dark hair (e.g. when using the Melanin controls) the result is
extremely low (e.g. 0.000001) despite the hair still reflecting a significant
amount of light (since the R lobe is independent of sigma). This causes issues
with the light component passes, so this change fixes#104586.
- There's quite a few computations at the start of the evaluation function that
are needed for sampling, evaluation and albedo computation, but only depend on
the view direction. Therefore, just precompute them - we still have space in
PrincipledHairExtra after all.
- Fix a tiny bug - the direction sampling code did not account for the R lobe
roughness modifier.
Pull Request #104669
As a side effect of this change, more resolution divisions are now available.
Before this patch the possible resolution divisions were all powers of two.
Now the possible resolution divisions are the multiples of pixel_size.
This increase in possible resolution divisions is the same idea proposed in https://archive.blender.org/developer/D13590.
In that patch there were concerns that this will increase the time between a user navigating
and seeing the 1:1 render. To my knowledge this is a non-issue and there should be
little to no increase in time between those two events.
Pull Request #104450
(Follow on from D17043)
On AMD Navi2 devices the MetalRT checkbox was not hooked up properly and had no effect. This patch fixes it.
Co-authored-by: Michael Jones <michael_p_jones@apple.com>
Pull Request #104520
The required version numbers for various devices was hardcoded in the
UI messages. The result was that every time one of these versions was
bumped, every language team had to update the message in question.
Instead, the version numbers can be extracted, and injected into the
error messages using string formatting so that translation updates
need happen less frequently.
Pull Request #104488
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17173
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.
Patch authored by Marco Giordano.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17153
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16909
This patch removes the option to select both AMD and Intel GPUs on system that have both. Currently both devices will be selected by default which results in crashes and other poorly understood behaviour. This patch adds precedence for using any discrete AMD GPU over an integrated Intel one. This can be overridden with CYCLES_METAL_FORCE_INTEL.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17166
This patch fixes T103393 by undefining `__LIGHT_TREE__` on Metal/AMD as it has an unexpected & major impact on performance even when light trees are not in use.
Patch authored by Prakash Kamliya.
Reviewed By: brecht
Maniphest Tasks: T103393
Differential Revision: https://developer.blender.org/D17167
Mutex locks for manipulating GHOST_System::m_timerManager from
GHOST_SystemWayland relied on WAYLAND being the only user of the
timer-manager.
This isn't the case as timers are fired from
`GHOST_System::dispatchEvents`.
Resolve by using a separate timer-manager for wayland key-repeat timers.
Resolve a thread safety issue reported by valgrind's helgrind checker,
although I wasn't able to redo the error in practice.
NULL check on the key-repeat timer also needs to lock, otherwise it's
possible the timer is set in another thread before the lock is acquired.
Now all key-repeat timer access which may run from a thread
locks the timer mutex before any checks or timer manipulation.
This is both a cleanup and a preparation for the Principled v2 changes.
Notable changes:
- Clearcoat weight is now folded into the closure weight, there's no reason
to track this separately.
- There's a general-purpose helper for computing a Closure's albedo, which is
currently used by the denoising albedo and diffuse/gloss/transmission color
passes.
- The d/g/t color passes didn't account for closure albedo before, this means
that e.g. metallic shaders with Principled v2 now have their color texture
included in the glossy color pass. Also fixes T104041 (sheen albedo).
- Instead of precomputing and storing the albedo during shader setup, compute
it when needed. This is technically redundant since we still need to compute
it on shader setup to adjust the sample weight, but the operation is cheap
enough that freeing up the storage seems worth it.
- Future changes (Principled v2) are easier to integrate since the Fresnel
handling isn't all over the place anymore.
- Fresnel handling in the Multiscattering GGX code is still ugly, but since
removing that entirely is the next step, putting effort into cleaning it up
doesn't seem worth it.
- Apart from the d/g/t color passes, no changes to render results are expected.
Differential Revision: https://developer.blender.org/D17101
Cycles ignores the size of spot lights, therefore the illuminated area doesn't match the gizmo. This patch resolves this discrepancy.
| Before (Cycles) | After (Cycles) | Eevee
|{F14200605}|{F14200595}|{F14200600}|
This is done by scaling the ray direction by the size of the cone. The implementation of `spot_light_attenuation()` in `spot.h` matches `spot_attenuation()` in `lights_lib.glsl`.
**Test file**:
{F14200728}
Differential Revision: https://developer.blender.org/D17129
The image manager used to handle OSL textures on the GPU by
default loads images after displacement is evaluated. This is a
problem when the displacement shader uses any textures, hence
why the geometry manager already makes the image manager
load any images used in the displacement shader graph early
(`GeometryManager::device_update_displacement_images`).
This only handled Cycles image nodes however, not OSL nodes, so
if any `texture` calls were made in OSL those would be missed and
therefore crash when accessed on the GPU. Unfortunately it is not
simple to determine which textures referenced by OSL are needed
for displacement, so the solution for now is to simply load all of
them early if true displacement is used.
This patch also fixes the result of the displacement shader not
being used properly in OptiX.
Maniphest Tasks: T104240
Differential Revision: https://developer.blender.org/D17162
Also minor changes in comments:
- Reference BLENDER_HISTORY_FILE instead of the literal file-name
(simplifies looking up usage).
- Use usernames in tags, as noted in code-style.
The `MultiDevice` implementation of `get_cpu_osl_memory` returns a
nullptr when there is no CPU device in the mix. As such access to that
crashed in `update_osl_globals`. But that only updates maps that are not
currently used on the GPU anyway, so can just skip that when the CPU
is not used for rendering.
Maniphest Tasks: T104216
Paths to vulkan libraries, paths and related components were
hardcoded in the platform cmake file. This patch separates
this by using adding CMake modules for Vulkan and ShaderC.
This change has only been applied to the macOs configuration as
that is currently our main platform for development. Other platforms
will be added during the development of the Vulkan back-end.
Removing all OSL script nodes from the shader graph would cause that
graph to no longer report it using `KERNEL_FEATURE_SHADER_RAYTRACE`
via `ShaderManager::get_graph_kernel_features`, but the shader object
itself still would have the `has_surface_raytrace` field set.
This caused kernels to be reloaded without shader raytracing support, but
later the `DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE`
kernel would still be invoked since the shader continued to report it
requiring that through the `SD_HAS_RAYTRACE` flag set because of
`has_surface_raytrace`.
Fix that by ensuring `has_surface_raytrace` is reset on every shader update,
so that when all OSL script nodes are deleted it is set to false, and only
stays true when there are still OSL script nodes (or other nodes using it).
Maniphest Tasks: T104157
Differential Revision: https://developer.blender.org/D17140
Based on "Sampling the GGX Distribution of Visible Normals" by Eric Heitz
(https://jcgt.org/published/0007/04/01/).
Also, this removes the lambdaI computation from the Beckmann sampling code and
just recomputes it below. We already need to recompute for two other cases
(GGX and clearcoat), so this makes the code more consistent.
In terms of performance, I don't expect a notable impact since the earlier
computation also was non-trivial, and while it probably was slightly more
accurate, I'd argue that being consistent between evaluation and sampling is
more important than absolute numerical accuracy anyways.
Differential Revision: https://developer.blender.org/D17100
This gives closer results to what I've seen in papers and other renderers when
using the code to precompute albedo later (to replace MultiGGX).
It's usually a tiny difference, the only case where I've seen it matter is
in the `shader/node_group_float.blend` test - but that's a (single-scatter) GGX
closure with 0.9 roughness, so it's not too surprising. In any case, the new
result looks closer to Eevee, so that's good I guess.
Differential Revision: https://developer.blender.org/D17099
Speckles and missing lights were experienced in scenes with Nishita Sky
Texture and a Sun Size smaller than 1.5°, such as in Lone Monk and Attic
scenes.
Increasing the precision of cosf fixes it.
A noticeable (>5%) performance regression in oneAPI backend came with
a501a2dbff797829b61f21c5aeb9d19dba3e3874. Updating to latest graphics
compiler from driver 101.4032 fixes it.
I've tested it with current min-supported drivers and it runs well but
since compatibility of graphics compiler with older drivers isn't
guaranteed, I'm also bumping the min-supported driver versions.
If end-users consider latest drivers too fresh to switch to (version
isn't released as stable on Linux as of today but should be before
Blender 3.5 release), CYCLES_ONEAPI_ALL_DEVICES=1 env variable can be
used.
Intel Graphics Compiler on Linux will be updated in a later commit
so we can then close D16984.
Reviewed By: sergey, LazyDodo
Improve handling for cases where maximum in-flight command buffer count is exceeded. This can occur during light-baking operations. Ensures the application handles this gracefully and also improves workload pipelining by situationally stalling until GPU work has completed, if too much work is queued up.
This may have a tangible benefit for T103742 by ensuring Blender does not queue up too much GPU work.
Authored by Apple: Michael Parkin-White
Ref T96261
Ref T103742
Depends on D17018
Reviewed By: fclem
Maniphest Tasks: T103742, T96261
Differential Revision: https://developer.blender.org/D17019
This isn't correct as window activation is handled separately
from the cursor entering/leaving a window.
This would call de-activate when the cursor moved outside the window
even though the window remained focused.
Rely on focus changes which already handle activate/deactivate events.
Swap-buffers was being deferred (to prevent it being called
from the event handling thread) however when it was called the
OpenGL context might not be active (especially with multiple windows).
Moving the cursor between windows made eglSwapBuffers report:
EGL Error (0x300D): EGL_BAD_SURFACE.
Resolve this by removing swapBuffer calls and instead add a
GHOST_kEventWindowUpdateDecor event intended for redrawing
client-side-decoration.
Besides the warning, this results an error with LIBDECOR window frames
not redrawing when a window became inactive.
The background evaluation samples the sky discretely, so if the sun is
too small, it can be missed in the evaluation. To solve this, the sun is
ignored during the background evaluation and its contribution is
computed separately.
The lookup table method on CPU and the numerical root finding method on
GPU give quite different results. This commit deletes the Beckmann lookup
table and uses numerical root finding on all devices. For the numerical
root finding, a combined bisection-Newton method with precision control
is used.
Differential Revision: https://developer.blender.org/D17050
This makes it possible to use `texture` and `texture3d` in custom
OSL shaders with a constant image file name as argument on the
GPU, where previously texturing was only possible through Cycles
nodes.
For constant file name arguments, OSL calls
`OSL::RendererServices::get_texture_handle()` with the file name
string to convert it into an opaque handle for use on the GPU.
That is now used to load the respective image file using the Cycles
image manager and generate a SVM handle that can be used on
the GPU. Some care is necessary as the renderer services class is
shared across multiple Cycles instances, whereas the Cycles image
manager is local to each.
Maniphest Tasks: T101222
Differential Revision: https://developer.blender.org/D17032
This patch adds markup to specify that certain kernel data constants should not be specialised. Currently it is used for `tabulated_sobol_sequence_size` and `sobol_index_mask` which change frequently based on the aa sample count, trash the shader cache, and have little bearing on performance.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16968
This patch adds occupancy tuning for the newly announced high-end M2 machines, giving 10-15% render speedup over a pre-tuned build.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17037
These warnings can reveal errors in logic, so quiet them by checking
if the features are enabled before using variables or by assigning
empty strings in some cases.
- Check CMAKE_THREAD_LIBS_INIT is set before use as CMake docs
note that this may be left unset if it's not needed.
- Remove BOOST/OPENVDB/VULKAN references when disable.
- Define INC_SYS even when empty.
- Remove PNG_INC from freetype (not defined anywhere).
wi is the viewing direction, and wo is the illumination direction. Under this notation, BSDF sampling always samples from wi and outputs wo, which is consistent with most of the papers and mitsuba. This order is reversed compared with PBRT, although PBRT also traces from the camera.
While keeping SSE2, SSE4.1 and AVX2. This does not affect hardware support, it
only slightly reduces performance for some older CPUs.
To reduce maintenance cost and improve compile times.
Differential Revision: https://developer.blender.org/D16978
An alternative fix to [0] which caused an error with ASAN
(freeing an GHOST_ISystem instead of a GHOST_System).
Reported by @Baardaap in chat, I'm unable to reproduce the issue.
Instead of calling the destructor directly, add a private method that
deletes data before raising an exception.
[0]: fd36221930e35efd2c09af8fb91234a510e3b9dc
Use a consistent style for declaring the names of struct members
in their declarations. Note that this convention was already used in
many places but not everywhere.
Remove spaces around the text (matching commented arguments) with
the advantage that the the spell checking utility skips these terms.
Making it possible to extract & validate these comments automatically.
Also use struct names for `bAnimChannelType` & `bConstraintTypeInfo`
which were using brief descriptions.
Recently the event handling thread for Wayland sometimes used 100% of a
CPU core while idle.
Resolve by waiting for changes to the Wayland file-handle when
there are no events to read.
The previous fix from T100855 [0] no longer works on my system
(3.4.1 release also fails for GNOME/KDE/WLROOTS compositors).
Resolve by removing the loop from the wait-on-file handle check.
Also reduce locking/unlocking calls.
[0]: 37b256e26feb454d9febd84dac1b1ce8b8d84d90
User notifications in Blender were always annoying and therefore by default turned off.
- When tweaking compositor/material tree a notification was shown.
- When rendering an animation for each frame a notification was shown.
The reason for this was that it was automatically shown when a background job was
finished and Blender wasn't the top most application.
In stead of migrating user notification to UserNotification.framework it was decided
to remove it for now. If in the future notifications should be added back we should
start with a design to figure out where notifications makes sense.
Reviewed By: sergey, brecht
Maniphest Tasks: T103757
Differential Revision: https://developer.blender.org/D16955
Cycles converts internal links to converter nodes which don't do anything and
later on get collapsed by the graph optimization. However, the previous code
assumed that one Blender input socket maps to one Cycles input socket.
When a node is muted, there might be internal links from one input to multiple
outputs. In Cycles, this meant that one Blender input socket now mapped to
multiple input sockets on all the converter nodes, so only the last one survived.
THe fix is simple, just make the mapping a MultiMap.
The problem here is that whether an object is a shadow catcher or not affects the
visibility flags, but changes to the shadow catcher property did not trigger a
visibility flag update.
With the GPU API the sampler can not be set after texture binding, which caused
a delay of the actual change. Now do both in a single call for correctness and
performance.
OpenGL is deprecated by Apple and triggers a warning when used. The goal
is that OpenGL is replaced by Metal backend, but we are not there yet.
To improve tracability of new warnings we hide deprecation warnings
when the GHOST_ContextCGL.h file is included.
NOTE: This change silences other deprecation warnings as well.
The T103586 fix effectively ran the wl_surface_listener.leave callback
to as WLROOTS based compositors doesn't run them. Remove the workaround
since it's an error in WLROOTS to be fixed upstream.
Temporarily using the wrong window scale when disconnecting a monitor
on configurations that use different DPI per monitor is a minor enough
issue that I don't think it makes sense to workaround in GHOST.
Wayland requires the windows surface size is divisible by the surface
scale. This wasn't guaranteed when creating new temporary windows.
This meant opening the preferences could exit Blender with an error
with Hi-DPI configurations.
**Changes**
As described in T93602, this patch removes all use of the `MVert`
struct, replacing it with a generic named attribute with the name
`"position"`, consistent with other geometry types.
Variable names have been changed from `verts` to `positions`, to align
with the attribute name and the more generic design (positions are not
vertices, they are just an attribute stored on the point domain).
This change is made possible by previous commits that moved all other
data out of `MVert` to runtime data or other generic attributes. What
remains is mostly a simple type change. Though, the type still shows up
859 times, so the patch is quite large.
One compromise is that now `CD_MASK_BAREMESH` now contains
`CD_PROP_FLOAT3`. With the general move towards generic attributes
over custom data types, we are removing use of these type masks anyway.
**Benefits**
The most obvious benefit is reduced memory usage and the benefits
that brings in memory-bound situations. `float3` is only 3 bytes, in
comparison to `MVert` which was 4. When there are millions of vertices
this starts to matter more.
The other benefits come from using a more generic type. Instead of
writing algorithms specifically for `MVert`, code can just use arrays
of vectors. This will allow eliminating many temporary arrays or
wrappers used to extract positions.
Many possible improvements aren't implemented in this patch, though
I did switch simplify or remove the process of creating temporary
position arrays in a few places.
The design clarity that "positions are just another attribute" brings
allows removing explicit copying of vertices in some procedural
operations-- they are just processed like most other attributes.
**Performance**
This touches so many areas that it's hard to benchmark exhaustively,
but I observed some areas as examples.
* The mesh line node with 4 million count was 1.5x (8ms to 12ms) faster.
* The Spring splash screen went from ~4.3 to ~4.5 fps.
* The subdivision surface modifier/node was slightly faster
RNA access through Python may be slightly slower, since now we need
a name lookup instead of just a custom data type lookup for each index.
**Future Improvements**
* Remove uses of "vert_coords" functions:
* `BKE_mesh_vert_coords_alloc`
* `BKE_mesh_vert_coords_get`
* `BKE_mesh_vert_coords_apply{_with_mat4}`
* Remove more hidden copying of positions
* General simplification now possible in many areas
* Convert more code to C++ to use `float3` instead of `float[3]`
* Currently `reinterpret_cast` is used for those C-API functions
Differential Revision: https://developer.blender.org/D15982
When rendering in the viewport (or probably on instanced objects, but I didn't
test that), emissive objects whose scale is negative give the wrong value on the
"backfacing" input when multiple sampling is enabled.
The underlying problem was a corner case in how normal transformation is handled,
which is generally a bit messy.
From what I can tell, the pattern appears to be:
- If you first transform vertices to world space and then compute the normal from
them (as triangle light samping, MNEE and light tree do), you need to flip
whenever the transform has negative scale regardless of whether the transform
has been applied
- If you compute the normal in object space and then transform it to world space
(as the regular shader_setup_from_ray path does), you only need to flip if the
transform was already applied and was negative
- If you get the normal from a local intersection result (as bevel and SSS do),
you only need to flip if the transform was already applied and was negative
- If you get the normal from vertex normals, you don't need to do anything since
the host-side code does the flip for you (arguably it'd be more consistent to
do this in the kernel as well, but meh, not worth the potential slowdown)
So, this patch fixes the logic in the triangle emission code.
Also, turns out that the MNEE code had the same problem and was also having
problems in the viewport on negative-scale objects, this is also fixed now.
Differential Revision: https://developer.blender.org/D16952
The code that computes and inverts the shutter CDF had some issues that caused
the result to be asymmetric, this tweaks it to be more robust and produce
symmetric outputs for symmetric inputs.
At the first bounce, the diffuse/glossy/transmission weights are stored so that
contributions along the path can be split into the d/g/t indirect passes.
However, volume bounces always set the weight even at indirect bounces, so
even paths that had their first bounce on a purely glossy object would suddenly
start counting towards the diffuse indirect pass after a secondary volume bounce.
This was caused by rB0d73d5c1a2, which releases the scene mutex during kernel
loading. However, the reset mutex was still held, which can cause a deadlock
if another thread tries to reset the session, since it will acquire the
released scene mutex and then wait for the reset mutex.
Turns out there's no point in keeping the reset mutex locked after the delayed
reset section, so now we just release it earlier, which resolves the deadlock.
This breaks backwards compatibility some in that 3 sides will be mapped
differently now, but difficult to avoid and can be considered a bugfix.
Similar to rBdd8016f7081f.
Maniphest Tasks: T103615
Differential Revision: https://developer.blender.org/D16910
To better estimate light contribution. Note that estimating the texture
from the IES file is still missing.
Contributed by Alaska.
Differential Revision: https://developer.blender.org/D16901
This issue was introduced in rB78f28b55d39288926634d0cc.
The fix is to use a `std::shared_ptr` to ensure that the `Global` will live
long enough until all `Local` objects are destructed.
This patch adds a new "Kernel Optimization Level" dropdown menu to control Metal kernel specialisation. Currently this defaults to "full" optimisation, on the assumption that the changes proposed in D16371 will address usability concerns around app responsiveness and shader cache housekeeping.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16514
All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16371
Both, the guarded and lockfree allocator, are keeping track of current
and peak memory usage. Even the lockfree allocator used to use a
global atomic variable for the memory usage. When multiple threads
use the allocator at the same time, this variable is highly contended.
This can result in significant slowdowns as presented in D16862.
While specific cases could always be optimized by reducing the number
of allocations, having this synchronization point in functions used by
almost every part of Blender is not great.
The solution is use thread-local memory counters which are only added
together when the memory usage is actually requested. For more details
see in-code comments and D16862.
Differential Revision: https://developer.blender.org/D16862
WLROOTS compositors don't run surface leave callbacks,
while this may be considered a bug in WLROOTS, neither GTK/SDL crash
so workaround the crash too.
This also fixes a minor glitch where the cursor scale wasn't updated
when changing monitor scale at run-time.
This functionality is related only to debugging of SYCL implementation
via single-threaded CPU execution and is disabled by default.
Host device has been deprecated in SYCL 2020 spec and we removed it
in 305b92e05f748a0fd9cb62b9829791d717ba2d57.
Since this is still very useful for debugging, we're restoring a
similar functionality here through SYCL 2020 Host Task.
The "osl_texture" intrinsic was not implemented correctly. It should handle alpha
separately from color, the number of channels input parameter only counts color
channels.
This panel showed a duplication of options that were in the main light panel and only mistakenly shows up in the workbench engine where lights should have no options.
This panel was also used by the POV-Ray add-on but that was removed recently.
This reverts commit a3a9459050a96e75138b3441c069898f211f179c.
And fixes T103337.
a3a9459050a9 has some flaws and it needs to go through review (See D16803).
Conflicts:
intern/ghost/intern/GHOST_SystemWin32.cpp
Partially addresses T72011.
The problem here is that the previous barycentric clamping did not deal well
with skinny triangles and would end up generating "sub-pixel jittering"
locations that were actually >20 pixels away.
Differential Revision: https://developer.blender.org/D16727
Expands Color Mix nodes with new Exclusion mode.
Similar to Difference but produces less contrast.
Requested by Pierre Schiller @3D_director and
@OmarSquircleArt on twitter.
Differential Revision: https://developer.blender.org/D16543
Materials without connections to the output node would crash with OSL
in OptiX, since the Cycles `OSLCompiler` generates an empty shader
group reference for them, which resulted in the OptiX device
implementation setting an empty SBT entry for the corresponding direct
callables, which then crashed when calling those direct callables was
attempted in `osl_eval_nodes`. This fixes that by setting the SBT entries
for empty shader groups to a dummy direct callable that does nothing.
Switching viewport denoising causes kernels to be reloaded with a new
feature mask, which would destroy the existing OptiX pipelines. But OSL
kernels were not reloaded as well, leaving the shading pipeline
uninitialized and therefore causing an error when it is later attempted to
execute it. This fixes that by ensuring OSL kernels are always reloaded
when the normal kernels are too.
Recent reverting of changes to cursor grabbing intended to match
Blender 3.3 release. This is the case for 3.4x branch, however there is
an additional change to grabbing on WIN32 by Germano [0] which is a
significant improvement on old grabbing logic for Windows.
So instead of matching 3.3x behavior, restore logic that keeps
the cursor centered while grabbing & hidden.
This re-introduces T102792 issue displaying the paint-brush while
dragging buttons, this will have to be solved separately.
Re-apply [1] & [2], revert [3] & [4].
[4]: a3a9459050a96e75138b3441c069898f211f179c
[0]: 9fd6dae7939a65b67045749a0eadeb6864ded183
[1]: 4cac8025f00798938813f52dcb117be83db97f22
[2]: 230744d6fd96dcf5afe66a8f9b9f6f8bbe1f41bb
[3]: 0240b895994aa58258db6897ae0d6478da7fce5f
Replace ../lib/linux_centos7_x86_64 with ../lib/linux_x86_64_glibc_228,
built with Rocky8 Linux, compatible with the VFX platform CY2023,
see: T99618.
- Update build-bot configuration.
- Remove unnecessary check for Blosc, this is part of OpenVDB lib now.
- Remove WITH_CXX11_ABI, always use new C++11 ABI now
- Replace centos7 by glibc_228 everywhere
Note that existing builds with cached paths pointing to
"../lib/linux_centos7_x86_64" will need to be updated.
Includes contributions by Brecht.
Under Wayland the transform cursor wasn't displaying the warped cursor.
This worked on other platforms because cursor motion is warped where as
Wayland simulates cursor warping, so it's necessary to apply warping
when requesting the cursor location too.
This reverts commits
9fd6dae7939a65b67045749a0eadeb6864ded183,
4cac8025f00798938813f52dcb117be83db97f22 (minor cleanup).
Re-introducing T102346, which will be fixed in isolation.
Unfortunately even when the cursor is hidden & grabbed,
the underlying cursor coordinates are still shown in some cases.
This caused bug where dragging a button in the sculpt-context popup
would draw the brush at unexpected locations because internally
the cursor was warping in the middle of the window, reported as T102792.
Resolving this issue with the paint cursor is possible but tend towards
over-complicated solutions.
Revert this change in favor of a more localized workaround for walk-mode
(as was done prior [0] to fix T99021).
[0]: 4c4e8cc926a672ac60692b3fb8c20249f9cae679
This is a solution in response to the issues mentioned in comments on
rBe4f1d719080a and T103088.
Apparently the workaround of checking if the mouse is already inside
the area on the next event doesn't work for some tablets.
Perhaps the order of events or some very small jitter is causing this
issue on tablets. (Couldn't confirm).
Whatever the cause, the solution of checking the timestamp of the event
and thus ignoring the outdated ones is theoretically safer.
It is the same solution seen in MacOS.
Also calling `SendInput` 3 times every warp ensures that at least one
event is dispatched.
This adds a new mirror image extension type for shaders and
geometry nodes (next to the existing repeat, extend and clip
options).
See D16432 for a more detailed explanation of `wrap_mirror`.
This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`.
It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT`
flag must be set for the `MIRROR` flag to have an effect.
Differential Revision: https://developer.blender.org/D16432
The `render_color_index` skips attributes with different types
and domains in order to give the proper order for the UI list.
That is a different than an index in the group of all attributes.
The most solid solution I could think of is exposing the name of
the default color attribute. It's "solid" because we always address
attributes by name internally. Doing something different is bound
to create problems. It's also aligned with the design in T98366 and
D15169.
Another option would be to change the way the "attribute index"
is incremented in Cycles. That would be a valid solution, but would
be more complex and annoying.
For consistency, I also exposed the name of the active color attribute
the same way, though it isn't necessary to fix this particular bug.
The properties aren't editable, that can come in 3.5 as part of D15169.
Differential Revision: https://developer.blender.org/D16769
This is done based on the render sample count so that it doesn't impact
sampling quality. It's similar in spirit to the adaptive table size in D16561,
but in this case for performance rather than memory usage.
Differential Revision: https://developer.blender.org/D16726
The first two dimensions of scrambled, shuffled Sobol and shuffled PMJ02 are
equivalent, so this makes no real difference for the first two dimensions.
But Sobol allows us to naturally extend to more dimensions.
Pretabulated Sobol is now always used, and the sampling pattern settings is now
only available as a debug option.
This in turn allows the following two things (also implemented):
* Use proper 3D samples for combined lens + motion blur sampling. This
notably reduces the noise on objects that are simultaneously out-of-focus
and motion blurred.
* Use proper 3D samples for combined light selection + light sampling.
Cycles was already doing something clever here with 2D samples, but using
3D samples is more straightforward and avoids overloading one of the
dimensions.
In the future this will also allow for proper sampling of e.g. volumetric
light sources and other things that may need three or four dimensions.
Differential Revision: https://developer.blender.org/D16443
While it helps on many scenes, it can be disruptive for existing scenes and
for benchmarks the differences in timing can be confusing. So be a bit more
conservative and only it enable it for new scenes.
The use of a struct for device strings caused the CUDA compiler to
generate byte arrays as the argument type, whereas OSL generated
primitive integer types (for the hash). Fix that by using a typedef
instead so that the CUDA compiler too will use an integer type in the
PTX it generates.
Maniphest Tasks: T101222
Allow keyboard layouts which include "dead keys" to enter diacritics
by calling MapVirtualKeyW even when not key_down.
See D16770 for more details.
Differential Revision: https://developer.blender.org/D16770
Reviewed by Campbell Barton
Recently a new geometry node for splitting edges was added in D16399.
However, there was already a similar implementation in mesh.cc that was
mainly used to fake auto smooth support in Cycles by splitting sharp
edges and edges around sharp faces.
While there are still possibilities for optimization in the new code,
the implementation is safer and simpler, multi-threaded, and aligns
better with development plans for caching topology on Mesh and other
recent developments with attributes.
This patch removes the old code and moves the node implementation to
the geometry module so it can be used in editors and RNA. The "free
loop normals" argument is deprecated now, since it was only an internal
optimization exposed for Cycles.
The new mesh `editors` function creates an `IndexMask` of edges to
split by reusing some of the code from the corner normal calculation.
This change will help to simplify the changes in D16530 and T102858.
Differential Revision: https://developer.blender.org/D16732
There has been an attempt to reorganize this part, however, it seems that didn't compile on HIP, and is reverted in
rBc2dc65dfa4ae60fa5d2c3b0cfe86f99dcb5bf16f. This is another attempt of refactoring. as I have no idea why some things don't work on HIP, it's
best to check whether this compiles on other platforms.
The main changes are creating a new struct named `MeshLight` that is shared between `KernelLightDistribution` and `KernelLightTreeEmitter`,
and a bit of renaming, so that light sampling with or without light tree could call the same function.
Also, I noticed a patch D16714 referring to HIP compilation error. Not sure if it's related, but browsing
https://builder.blender.org/admin/#/builders/30/builds/7826/steps/7/logs/stdio, it didn't work on gfx1102, not gfx9*.
Differential Revision: https://developer.blender.org/D16722
The OpenVDB delay loading of voxel leaf data using mmap works poorly on network
drives. It has a mechanism to make a temporary local file copy to avoid this,
but we disabled that as it leads to other problems.
Now disable delay loading entirely. It's not clear that this has much benefit
in Blender. For rendering we need to load the entire grid to convert to NanoVDB,
and for geometry nodes there also are no cases where we only need part of grids.
To avoid issues with lights being either skipped or sampled unnecessarily
when the exposure is set low or high.
Contributed by Alaska.
Differential Revision: https://developer.blender.org/D16703
* preempt_attr was copied from CUDA, but not used in HIP.
* Remove shadowed variable before conditional in EnvironmentTextureNode code.
Differential Revision: https://developer.blender.org/D16741
Improve a few messages, but mostly fix typos in many areas of the UI.
See inline comments in the differential revisiion for the rationale
behind the various changes.
Differential Revision: https://developer.blender.org/D16716
Texture usage flags can now be provided during texture creation specifying
the ways in which a texture can be used. This allows the GPU backends to
perform contextual optimizations which were not previously possible. This
includes enablement of hardware lossless compression which can result in
a 15%+ performance uplift for bandwidth-limited scenes on hardware such
as Apple-Silicon using Metal.
GPU_TEXTURE_USAGE_GENERAL can be used by default if usage is not known
ahead of time. Patch will also be relevant for the Vulkan backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D15967
The PDF of mesh lights were not being scaled by `pdf_selection` when
the light tree was disable. This resulted in the mesh lights having
the wrong PDF and thus the wrong brightness.
Differential Revision: https://developer.blender.org/D16717
Compiling Cycles in Visual Studio 2022 yields the error:
C4146: unary minus operator applied to unsigned type, result still unsigned
Replacing it with explicit two's complement achieves the same result as signed
negation but avoids the error.
Differential Revision: https://developer.blender.org/D16616
For file formats like PNG, JPEG and TIFF. Eventually this should use
the OpenColorIO view transform, but this at least makes the image
closer to what it should be in most cases.
Differential Revision: https://developer.blender.org/D16482