Commit Graph

830 Commits

Author SHA1 Message Date
Patrick Mours
d64e171c4b Cycles: Enable OptiX on first generation Maxwell GPUs again 2020-07-27 16:11:00 +02:00
Patrick Mours
c64b12c0b8 Fix OptiX being shown as available on first generation Maxwell GPUs
The OptiX kernels are compiled for target "compute_sm_52", which is only available on second
generation Maxwell GPUs, so disable support for older ones.
2020-07-24 15:36:09 +02:00
Patrick Mours
a9644c812f Cycles: Use pre-compiled PTX kernel for older generation when no matching one is found
This patch changes the discovery of pre-compiled kernels, to look for any PTX, even if
it does not match the current architecture version exactly. It works because the driver can
JIT-compile PTX generated for architectures less than or equal to the current one.
This e.g. makes it possible to render on a new GPU architecture even if no pre-compiled
binary kernel was distributed for it as part of the Blender installation.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8332
2020-07-20 19:25:27 +02:00
Campbell Barton
5338b36fcc Cleanup: spelling 2020-07-14 15:19:52 +10:00
Brecht Van Lommel
6e74a8b69f Fix T78881: Cycles OpenImageDenoise not using albedo and normal correctly
Properly normalize buffers now. Also expose option to not use albedo and normal
just like OptiX.
2020-07-13 19:38:49 +02:00
Campbell Barton
651db1b26f Cleanup: spelling 2020-07-11 15:32:59 +10:00
Brecht Van Lommel
3dc0178390 Fix T78662: Cycles baking fails if denoising is enabled, after recent changes
This is not supported yet.
2020-07-10 20:08:46 +02:00
Brecht Van Lommel
6fbacd6048 Fix build error building without OpenImageDenoise 2020-07-10 19:56:53 +02:00
Brecht Van Lommel
6eeb32706a Cycles: support OpenImageDenoise in final renders
Performance is not great currently due to the API not seeming to support
efficient denoising of multiple tiles at the same time. So in many cases
only one or a few threads will actually be denoising at the same time.

In renders with many samples this is not a big problem, but for faster
renders it's a signficant overhead.

We should try to optimize this still, possibly by batching denoising of
a bigger neighborhood of multiple tiles at once.
2020-07-10 17:10:05 +02:00
Brecht Van Lommel
93791381fe Cleanup: reduce hardcoded numbers in denoising neighbor tiles code 2020-07-10 17:10:05 +02:00
Patrick Mours
737bd549b6 Cycles: Add support for native OptiX curve primitive
This patch adds support for the curve primitive from OptiX to Cycles. It's currently hidden
behind a debug option, since there can be some slight rendering differences still (because no
backface culling is performed and something seems off with endcaps). The curve primitive
was added with the OptiX 7.1 SDK and requires a r450 driver or newer, so this also updates
the codebase to be able to build with the new SDK.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8223
2020-07-07 15:39:02 +02:00
Patrick Mours
1562c9f031 Fix OptiX viewport denoising not working when rendering scene (without OptiX) that uses unsupported features
Denoising devices do not need to load the full feature set of kernels, so only activate the denoising
feature for them (so that it is possible to use features that are supported by the render devices, but
not the denoising devices).
2020-07-06 17:33:04 +02:00
Brecht Van Lommel
792cb8bdc7 Fix T77984: Cycles OpenCL error rendering empty scene 2020-07-01 20:01:25 +02:00
2d8c59ccb9 Fix T77095: fix Cycles performance regression with AMD RX cards
Apply the workaround only for known problematic drivers. The latest pro driver
appears to work correctly, hopefully the regular driver will as well once it
is updated to the same OpenCL driver version (3075.13).
2020-06-30 12:01:40 +02:00
Brecht Van Lommel
fb68a30af6 Fix crash compiling Cycles OpenCL, after recent TBB changes 2020-06-26 17:44:24 +02:00
Campbell Barton
fd5c185beb Cleanup: spelling 2020-06-25 23:14:36 +10:00
Brecht Van Lommel
b4e1571d0b Cleanup: compiler warnings 2020-06-24 17:25:44 +02:00
Brecht Van Lommel
669befdfbe Cycles: add Intel OpenImageDenoise support for viewport denoising
Compared to Optix denoise, this is usually slower since there is no GPU
acceleration. Some optimizations may still be possible, in avoid copies
to the GPU and/or denoising less often.

The main thing is that this adds viewport denoising support for computers
without an NVIDIA GPU (as long as the CPU supports SSE 4.1, which is nearly
all of them).

Ref T76259
2020-06-24 15:17:36 +02:00
Brecht Van Lommel
0a3bde6300 Cycles: add denoising settings to the render properties
Enabling render and viewport denoising is now both done from the render
properties. View layers still can individually be enabled/disabled for
denoising and have their own denoising parameters.

Note that the denoising engine also affects how denoising data passes are
output even if no denoising happens on the render itself, to make the passes
compatible with the engine.

This includes internal refactoring for how denoising parameters are passed
along, trying to avoid code duplication and unclear naming.

Ref T76259
2020-06-24 15:17:36 +02:00
207338bb58 Cycles: port curve-ray intersection from Embree for use in Cycles GPU
This keeps render results compatible for combined CPU + GPU rendering.
Peformance and quality primitives is quite different than before. There
are now two options:

* Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for
  fast rendering. Hair curves are subdivided with a fixed number of user
  specified subdivisions.

  This gives relatively good results, especially when used with the Principled
  Hair BSDF and hair viewed from a typical distance. There are artifacts when
  viewed closed up, though this was also the case with all previous primitives
  (but different ones).

* 3D Curve: render hair as 3D curve, for accurate results when viewing hair
  close up. This automatically subdivides the curve until it is smooth.

  This gives higher quality than any of the previous primitives, but does come
  at a performance cost and is somewhat slower than our previous Thick curves.

The main problem here is performance. For CPU and OpenCL rendering performance
seems usually quite close or better for similar quality results.

However for CUDA and Optix, performance of 3D curve intersection is problematic,
with e.g. 1.45x longer render time in Koro (though there is no equivalent quality
and rounded ribbons seem fine for that scene). Any help or ideas to optimize this
are welcome.

Ref T73778

Depends on D8012

Maniphest Tasks: T73778

Differential Revision: https://developer.blender.org/D8013
2020-06-22 13:28:01 +02:00
Brecht Van Lommel
d1ef5146d7 Cycles: remove SIMD BVH optimizations, to be replaced by Embree
Ref T73778

Depends on D8011

Maniphest Tasks: T73778

Differential Revision: https://developer.blender.org/D8012
2020-06-22 13:28:01 +02:00
Brecht Van Lommel
e50f1ddc65 Cycles: use TBB for task pools and task scheduler
No significant performance improvement is expected, but it means we have a
single thread pool throughout Blender. And it should make adding more
parallellization in the future easier.

After previous refactoring commits this is basically a drop-in replacement.
One difference is that the task pool had a mechanism for scheduling tasks to
the front of the queue to minimize memory usage. TBB has a smarter algorithm
to balance depth-first and breadth-first scheduling of tasks and we assume that
removes the need to manually provide hints to the scheduler.

Fixes T77533
2020-06-22 13:27:37 +02:00
Brecht Van Lommel
54e3487c9e Cleanup: remove task pool stop() and finished() 2020-06-22 13:06:47 +02:00
Brecht Van Lommel
b10b7cdb43 Cleanup: use lambdas instead of functors for task pools, remove threadid 2020-06-22 13:06:47 +02:00
Brecht Van Lommel
ace3268482 Cleanup: minor refactoring around DeviceTask 2020-06-22 13:06:47 +02:00
Brecht Van Lommel
6899cb3c07 Fix for T77095: work around render artifacts with AMD Radeon RX 4xx and 5xx 2020-06-18 14:41:51 +02:00
Brecht Van Lommel
fc7c34e380 Cleanup: fix compiler warnings 2020-06-17 14:36:51 +02:00
Patrick Mours
b586f801fc Cycles: Improve CUDA and OptiX error reporting in the viewport
This patch makes the infamous "Cancel" error in the viewport a thing of the past. Instead it
now shows a more useful error message and streamlines the error handling process in CUDA.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8008
2020-06-12 18:24:15 +02:00
Brecht Van Lommel
faf5f7b63d Cleanup: fix compiler warning after recent changes
It would be good to use override for all member functions, but doing it for
only somes generates compiler warning.
2020-06-10 20:34:01 +02:00
Patrick Mours
f367f1e5a5 Cycles: Improve OptiX viewport denoising performance with CUDA rendering
With this patch Cycles recognizing when a logical OptiX and CUDA device represent the same
physical GPU and attempts to eliminate unnecessary tile copies for viewport rendering if that
is the case for all active devices. In addition, denoising is now no longer performed on the first
available OptiX device only, but instead it will try to match CUDA and OptiX
rendering/denoising devices exactly to maximize utilization.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D7975
2020-06-10 14:12:13 +02:00
Patrick Mours
9f7d84b656 Cycles: Add support for P2P memory distribution (e.g. via NVLink)
This change modifies the multi-device implementation to support memory distribution
across devices, to reduce the overall memory footprint of large scenes and allow scenes to
fit entirely into combined GPU memory that previously had to fall back to host memory.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D7426
2020-06-08 17:55:49 +02:00
Patrick Mours
473aaa389c Cycles: Enable OptiX on all Maxwell+ GPUs 2020-06-05 12:33:00 +02:00
Patrick Mours
49c295813b Merge branch 'blender-v2.83-release' 2020-05-27 15:31:03 +02:00
Patrick Mours
28d9368538 Fix T76947: Optix realtime denoiser progressively reduces brightness of very bright objects
The input data to the OptiX denoiser was clamped to 0..10000 as required, but it could easily
exceed that range with a high number of samples (since the data contains the overall sum). To
fix that, divide by the number of samples first and multiply it back in after the denoiser ran.
2020-05-27 15:17:47 +02:00
Brecht Van Lommel
d9773edaa3 Cycles: code refactor to bake using regular render session and tiles
There should be no user visible change from this, except that tile size
now affects performance. The goal here is to simplify bake denoising in
D3099, letting it reuse more denoising tiles and pass code.

A lot of code is now shared with regular rendering, with the two main
differences being that we read some render result passes from the bake API
when starting to render a tile, and call the bake kernel instead of the
path trace kernel.

With this kind of design where Cycles asks for tiles from the bake API,
it should eventually be easier to reduce memory usage, show tiles as
they are baked, or bake multiple passes at once, though there's still
quite some work needed for that.

Reviewers: #cycles

Subscribers: monio, wmatyjewicz, lukasstockner97, michaelknubben

Differential Revision: https://developer.blender.org/D3108
2020-05-15 20:25:24 +02:00
Brecht Van Lommel
97f50c71b9 Fix --debug-cycles printing CUDA devices twice
Reuse the CUDA devices list for Optix device detection.
2020-05-14 16:07:22 +02:00
Brecht Van Lommel
d97c83712c Cycles: mark CUDA 10.2 as officially supported
It appears to work fine after a recent bugfix and testing for the past few
weeks.
2020-05-05 15:06:49 +02:00
Ray Molenkamp
aeb42cf8ab Cycles/Optix: Support building the optix kernels on demand.
CMake: `WITH_CYCLES_DEVICE_OPTIX` did not respect `WITH_CYCLES_CUDA_BINARIES` causing the optix kernel to be always build at build time.

Code: `device_optix.cpp` did not count on the optix kernel not existing in the default location.

For this to work, one should have before starting blender

1) working nvcc environment
2) Optix SDK installed and the OPTIX_ROOT_DIR environment variable pointing to it which is not set by default

Differential Revision: https://developer.blender.org/D7400

Reviewed By: Brecht
2020-04-11 12:59:21 -06:00
Brecht Van Lommel
53981c7fb6 Cleanup: refactor adaptive sampling to more easily change some parameters
No functional changes yet, this is work towards making CPU and GPU results
match more closely.
2020-04-07 20:29:48 +02:00
Ray Molenkamp
58ea0d93f1 Cycles/Optix: Add CYCLES_OPTIX_TEST override
This works similarly to the CYCLES_OPENCL_TEST
environment variable to allow testing on unsupported
hardware.

Note: like the OPENCL test override, this is
for *testing* only and bug reports on unsupported
hardware will *not* be accepted at this point in
time.
2020-03-26 11:30:17 -06:00
Brecht Van Lommel
f48d15a861 Cycles: limit number of processes compiling OpenCL kernel based on memory
The numbers here can probably be tweaked to be better, but it's hard to
predict and this should at least avoid excessive memory swapping.

Fixes T57064.
2020-03-25 16:39:37 +01:00
Brecht Van Lommel
394a1373a0 Cycles: use OpenCL C 2.0 if available, to improve performance for AMD
Tested with AMD Radeon Pro WX 9100, where it brings performance back to 2.80
level, and combined with recent changes is about 2-15% faster than 2.80 in
our benchmark scenes.

This somehow appears to specifically address the issue where adding more shader
nodes leads to slower runtime. I found no additional speedup by applying this
to change to 2.80 or removing the new shader node code.

Ref T71479

Patch by Jeroen Bakker.

Differential Revision: https://developer.blender.org/D6252
2020-03-24 20:09:36 +01:00
Ray Molenkamp
44c6b6615b OpenCL: Bring back CYCLES_OPENCL_TEST override
Back in 2.79 you could either use the debug panel or an
environment variable to override using OpenCL for unsupported
hardware. Which was rather useful for developers when testing
on NVidia just to be sure the CL kernels at-least build properly.

This broke in rB949ab753bb2

This diff restores testing though the CYCLES_OPENCL_TEST
environment variable.

Differential Revision: https://developer.blender.org/D7202

Reviewers: brecht
2020-03-21 11:55:45 -06:00
Dalai Felinto
2d1cce8331 Cleanup: make format after SortedIncludes change 2020-03-19 09:33:58 +01:00
Brecht Van Lommel
472534d16e Fix memory leak in recent Cycles image texture refactor 2020-03-12 20:30:49 +01:00
Brecht Van Lommel
26bea849cf Cleanup: add device_texture for images, distinct from other global memory
There was too much image texture specific stuff in device_memory, and too
much code duplication between devices.
2020-03-12 17:28:55 +01:00
Brecht Van Lommel
21821601f2 Fix Optix build error on Linux with some compilers 2020-03-11 20:35:38 +01:00
Brecht Van Lommel
f01bc597a8 Cleanup: stop encoding image data type in slot index
This is legacy code from when we had a fixed number of textures.
2020-03-11 17:07:17 +01:00
Brecht Van Lommel
dcdcc23488 Fix T74504: Cycles wrong progress bar with CPU adaptive sampling 2020-03-06 23:46:58 +01:00
Brecht Van Lommel
b31b44c223 Fix error in Cycles Optix adaptive sampling after recent cleanup 2020-03-06 23:46:58 +01:00