Commit Graph

538 Commits

Author SHA1 Message Date
Sergey Sharybin
f8a999c965 Cycles: Move triangle intersection precalc to an util file
This is a preparation work for the followup commit which wil l move
remaining parts of Woop intersection logic to an utility file.

Doing it as a separate commit to keep changes more atomic and easier
to bisect when/if needed.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
b797a5ff78 Cycles: Cleanup, move utility function to utility file
Was an old TODO, this function is handy for some math utilities as well.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
aa0602130b Cycles: Cleanup, code style and comments 2017-03-23 17:45:19 +01:00
Sergey Sharybin
1c5cceb7af Cycles: Move intersection math to own header file
There are following benefits:

- Modifying intersection algorithm will not cause so much re-compilation.
- It works around header dependency hell and allows us to use vectorization
  types much easier in there.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
5c06ff8bb9 Cycles: Cleanup, remove unused function 2017-03-23 17:45:19 +01:00
Mai Lavelle
96868a3941 Fix T50888: Numeric overflow in split kernel state buffer size calculation
Overflow led to the state buffer being too small and the split kernel to
get stuck doing nothing forever.
2017-03-11 05:39:28 -05:00
2d3c44389a Fix OpenCL warnings about doubles on some platforms. 2017-03-11 00:55:23 +01:00
Hristo Gueorguiev
9de9f25b24 Cycles: add single program debug option for split kernel
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
2017-03-09 17:09:37 +01:00
Sergey Sharybin
75cb4850f0 Cycles: Use 1-based line number for #line directives
AMD CPU platform was complaining about #line 0 directives in the code.
2017-03-08 12:45:18 +01:00
Mai Lavelle
817873cc83 Cycles: CUDA implementation of split kernel 2017-03-08 01:24:53 -05:00
Mai Lavelle
0892352bfe Cycles: CPU implementation of split kernel 2017-03-08 00:52:41 -05:00
Sergey Sharybin
a87766416f Cycles: Report device maximum allocation and detected global size 2017-03-08 00:52:41 -05:00
Mai Lavelle
230c00d872 Cycles: OpenCL split kernel refactor
This does a few things at once:

- Refactors host side split kernel logic into a new device
  agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
  allows as many threads as will fit in memory regardless of tile size, which
  can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
  number of arguments passed to kernels. Means there's less code to deal
  with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
  other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
2017-03-08 00:52:41 -05:00
Mai Lavelle
dfd6055eb0 Cycles: Add more atomic operations 2017-03-08 00:52:41 -05:00
Sergey Sharybin
0e995e0bfe Cycles: Fix strict -Wpedantic warnings with GCC
Patch by Stefan Werner, thanks!
2017-03-06 14:18:26 +01:00
Aaron Carlisle
6d1ac79514 Cleanup: Grey --> Gray 2017-02-27 19:33:57 -05:00
Sergey Sharybin
1e29286c8c Cycles: Fix compilation warning with CUDA on OSX 2017-02-24 14:33:10 +01:00
Sergey Sharybin
50328b41a7 Cycles: Fix compilation error on 32bit Linux 2017-02-23 17:30:26 +01:00
Sergey Sharybin
4e12113bea Cycles: Fix wrong render results with texture limit and half-float textures 2017-02-23 14:46:22 +01:00
Sergey Sharybin
13e075600a Cycles: Add utility function to convert float to half
handles overflow and underflow, but not NaN/inf.
2017-02-23 14:42:06 +01:00
Sergey Sharybin
2268f41418 Cycles: Update current Cycles version 2017-01-23 10:25:59 +01:00
Sergey Sharybin
254fbcdd7b Cycles: Fix compilation error on with older GCC
Hopefully it works on all platforms now.
2017-01-20 11:55:48 +01:00
Sergey Sharybin
78b94902f8 Cycles: Add fast-math safe isnan and isfinite
Currently unused, but might become really handy in the future.
2017-01-19 14:51:11 +01:00
Sergey Sharybin
6d36e033ba Cycles: Remove using namespace hell
Please NEVER EVER use such a statement, it's only causing HUGE
issues. What is even worse: it's not always possible to immediately
see that the hell is coming from such a statement.

There is still some statements in the existing code, will leave
those for a later cleanup.
2017-01-19 14:51:11 +01:00
Lukas Stockner
a2ebc5268f Cycles: Refactor Progress system to provide better estimates
The Progress system in Cycles had two limitations so far:
 - It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image.
 - Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased.

This patch fixes both problems:
First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time.
The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samples*num_tiles, the code now counts every sample for every pixel, so that the final value is num_samples*num_pixels.

Along with that, some unused variables were removed from the Progress and Session classes.

Reviewers: brecht, sergey, #cycles

Subscribers: brecht, candreacchio, sergey

Differential Revision: https://developer.blender.org/D2214
2016-12-03 05:02:21 +01:00
Sergey Sharybin
acc1f8fbed Cycles: Add AVX intrinsics helpers
They are defined for MSVC but seems to be missing in GCC and CLang-3.8.

Maybe some further tweaks to policy when to define those functions is
needed, but should be fine for now.
2016-12-02 12:23:38 +01:00
Sergey Sharybin
0ac2be7030 Cycles: Disable AVX2 crash workarounds
I can no longer reproduce crash with neither of the files where
the crash was originally visible. This is something where other
changes (light threshold, sampling) had an effect and made code
to work as it is supposed to. Could have been optimizator issue
or something like that.

Let's see if we hit same issue again.
2016-12-02 10:17:05 +01:00
Sergey Sharybin
a537e7b426 Cycles: Fix strict compilation warnings 2016-11-23 10:59:54 +01:00
Sergey Sharybin
751573ce6f Fix T50034: Blender changes processor affinity unauthorized 2016-11-22 16:03:16 +01:00
Sergey Sharybin
4818b3c97e Cycles: Fix re-definition of some functions on x32 arch 2016-11-22 12:34:45 +01:00
Sergey Sharybin
edc10f5529 Cycles: Another attempt to fix compilation on 32bit Linux 2016-11-22 12:11:08 +01:00
Sergey Sharybin
af444e913f Cycles: Attempt to fix 32bit buildbot builds after recent commit 2016-11-22 12:06:16 +01:00
Sergey Sharybin
272412f9c0 Cycles: Implement texture size limit simplify option
Main intention is to give some quick way to control scene's memory
usage by clamping textures which are too big. This is really handy
on the early production stages when you first create really nice
looking hi-res textures and only when it all works and approved
start investing time on optimizing your scene.

This is a new option in Scene Simplify panel and it acts as
following: when texture size is bigger than the given value it'll
be scaled down by half for until it fits into given limit.

There are various possible improvements, such as:

- Use threaded scaling using our own task manager.

  This is actually one of the main reasons why image resize is
  manually-implemented instead of using OIIO's resize. Other
  reason here is that API seems limited to construct 3D texture
  description easily.

- Vectorization of uchar4/float4/half4 textures.

- Use something smarter than box filter.

  Was playing with some other filters, but not sure they are
  really better: they kind of causes more fuzzy edges.

Even with such a TODOs in the code the option is already quite
useful.

Reviewers: brecht

Reviewed By: brecht

Subscribers: jtheninja, Blendify, gregzaal, venomgfx

Differential Revision: https://developer.blender.org/D2362
2016-11-22 12:00:09 +01:00
Sergey Sharybin
4ee08e9533 Atomics: Make naming more obvious about which value is being returned 2016-11-15 12:16:26 +01:00
b5a58507f2 Fix Cycles OSL compilation based on modified time not working. 2016-11-12 17:33:07 +01:00
Lukas Stockner
4e68f48227 Cycles: Initialize the RNG state from the kernel instead of the host
This allows to save a memory copy, which will be particularly useful for network rendering.

Reviewers: sergey, brecht, dingto, juicyfruit, maiself

Differential Revision: https://developer.blender.org/D2323
2016-10-30 11:51:20 +01:00
Lukas Stockner
26bf230920 Cycles: Add optional probabilistic termination of light samples based on their expected contribution
In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways.
To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased.
This is the same approach that's also used in path termination based on length.

Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise.

Reviewers: #cycles

Subscribers: sergey, brecht

Differential Revision: https://developer.blender.org/D2217
2016-10-30 11:31:28 +01:00
Lukas Stockner
1272ee455e Cycles: Implement texture coordinates for Point, Spot and Area Lamps
When using the Normal output of the Texture Coordinate node on Point and Spot lamps, the coordinates now depend on the rotation of the lamp.
On Area lamps, the Parametric output of the Geometry node now returns UV coordinates on the area lamp.

Credit for the Area lamp part goes to Stefan Werner (from D1995).
2016-10-29 19:24:08 +02:00
Sergey Sharybin
f11298692b Cycles: More workarounds for weird crashes on AVX2
Oh man, is it a compiler bug? Is it something we do stupid?

For now more crap to prevent crashes. During the conference will talk to
Maxyn about how can we troubleshoot such weird issues.
2016-10-27 12:51:03 +02:00
Sergey Sharybin
7e380ad4c0 Cycles: Another attempt to fix crashes on AVX2 processors
Basically don't use rcp() in areas which seems to be critical after
second look. Also disabled some multiplication operators, not sure
yet why they might be a problem.

Tomorrow will be setting up a full test with all cases which were
buggy in our farm to see if this fix is complete.
2016-10-26 22:14:41 +02:00
Sergey Sharybin
35f152358b Cycles: Completely disable transform SSE for now
Was causing issues on another frame.

On a tight schedule, disabling for now so artists are happy.

Still looking into root of the issue!
2016-10-26 15:23:58 +02:00
Sergey Sharybin
7c7d23691f Cycles: Fix crashes after recent optimization commits
There is some precision issues for big magnitude coordinates which started
to give weird behavior of release builds. Some weird memory usage in BVH
which is tricky to nail down because only happens in release builds and GDB
reports all variables as optimized out when trying to use RelWithDebInfo.

There are two things in this commit:

- Attempt to make vectorized code closer to original one, hoping that it'll
  eliminate precision issue.
  This seems to work for transform_point().
- Similar trick did not work for transform_direction() even tho absolute
  error here is much smaller. For now disabled that function, need a more
  careful look here.
2016-10-26 14:30:25 +02:00
Sergey Sharybin
064caae7b2 Cycles: BVH-related SSE optimization
Several ideas here:

- Optimize calculation of near_{x,y,z} in a way that does not require
  3 if() statements per update, which avoids negative effect of wrong
  branch prediction.

- Optimization of direction clamping for BVH.

- Optimization of point/direction transform.

Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).

Once again ,thanks Maxym for inspiration!

Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
2016-10-25 14:47:34 +02:00
Sergey Sharybin
af411d918e Cycles: Implement SSE-optimized path of util_max_axis()
The idea here is to avoid if statements which could cause wrong
branch prediction.

Gives a bit of measurable speedup up to ~1%. Still nice :)

Inspired by Maxym Dmytrychenko, thanks!
2016-10-25 13:54:17 +02:00
Sergey Sharybin
cde18cf3b3 Cycles: Fix static initialization order fiasco
Initialization order of global stats and node types was not strictly
defined and it was possible to have node types initialized first and
stats after that. This will zero out memory which was allocated from
the statistics causing assert failure when de-initializing node types.
2016-10-24 13:47:39 +02:00
Sergey Sharybin
0ddb8d9b13 Cycles: Disable optimization of operator / for float3
This was giving some speedup but made intersection tests to fail
from watertight point of view.

Needs deeper investigation, but need to quickly get it fixed for
the studio.
2016-10-14 13:53:26 +02:00
Sergey Sharybin
22cdf44101 Cycles: Use const reference for register variables in non-OpenCL code
This is something tested by @LazyDodo and suggested by Maxym to make
MSVC happier.
2016-10-12 14:48:59 +02:00
Sergey Sharybin
e588106d45 Cycles: Use more SSE intrinsics for float3 type
This gives about 5% speedup on AVX2 kernels (other kernels still
have SSE disabled for math operations) and this solves the slowdown
of koro scene mention in the previous commit.

The title says it all actually. This commit also contains
changes to pass float3 as const reference in affected functions.

This should make MSVC happier without breaking OpenCL because it's
only done in areas which are ifdef-ed for non-OpenCL.

Another patch based on inspiration from Maxym Dmytrychenko, thanks!
2016-10-12 14:43:00 +02:00
Sergey Sharybin
6a4ec3ca43 Cycles: Add new avxf vectorized data type
Based on existing ssef data type and to my knowledge it's also what happens in
Embree nowadays.

Inspired by Maxym Dmytrychenko and required for the upcoming triangle
intersection commit.

Hopefully the copyright message is correct.
2016-10-12 13:54:13 +02:00
a3abb020e3 Fix Cycles CUDA performance on CUDA 8.0.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.

On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.

Reviewed By: sergey

Differential Revision: https://developer.blender.org/D2269
2016-10-03 22:15:25 +02:00