Commit Graph

554 Commits

Author SHA1 Message Date
Sergey Sharybin
51ec9441b7 Cycles: Split vectorized types into separate files
The final goal to reach is to make vectorized types much easier to maintain
and the previous design had following issues:

- Having all types and methods implementation made the source file rather
  bloated and unfun to navigate in.

- It was not possible to quickly glance available API for the type you are
  interested in.

- Adding more vectorization types will bloat the file even more, making
  things even more tricky to follow.
2017-04-25 10:33:26 +02:00
Sergey Sharybin
b06cd746ce Cycles: Cleanup, preprocessor indentation 2017-04-25 10:33:26 +02:00
Sergey Sharybin
360cf8393a Cycles: Make vectorized types constructor from register explicit
This is not a cheap operation which we dont' want to happen silently.
2017-04-13 15:08:00 +02:00
Sergey Sharybin
e6392458d3 Cycles: Remove unused function
It was quite wrong actually by doing some __m128 to flaot4 round trips.
2017-04-13 15:08:00 +02:00
lazydodo
b332fc8f23 [Cycles/msvc] Get cycles_kernel compile time under control.
Ever since we merged the extra texture types (half etc) and spit kernel the compile time for cycles_kernel has been going out of control.

It's currently sitting at a cool 1295.762 seconds with our standard compiler (2013/x64/release)

I'm not entirely sure why msvc gets upset with it, but the inlining of matrix near the bottom of the tri-cubic 3d interpolator is the source of the issue, this patch excludes it from being inlined.

This patch bring it back down to a manageable 186 seconds. (7x faster!!)

with the attached bzzt.blend that @sergey  kindly provided i got the following results with builds with identical hashes

58:51.73 buildbot
58:04.23 Patched

it's really close, the slight speedup could be explained by the switch instead of having multiple if's (switches do generate more optimal code than a chain of if/else/if/else statements) but in all honesty it might just have been pure luck (dev box,very polluted, bad for benchmarks) regardless, this patch doesn't seem to slow down anything with my limited testing.

{F532336}

{F532337}

Reviewers: brecht, lukasstockner97, juicyfruit, dingto, sergey

Reviewed By: brecht, dingto, sergey

Subscribers: InsigMathK, sergey

Tags: #cycles

Differential Revision: https://developer.blender.org/D2595
2017-04-07 10:26:55 -06:00
Sergey Sharybin
3ce30823ff Cycles: Add utility class to simplify scoped spin locks 2017-04-05 14:57:34 +02:00
Mai Lavelle
4b7d95290f Cycles: More fixes after include changes 2017-03-31 10:12:13 +02:00
Sergey Sharybin
0579eaae1f Cycles: Make all #include statements relative to cycles source directory
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.

For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.

Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.

This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.

Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.

Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner

Reviewed By: lukasstockner97, maiself, nirved, dingto

Subscribers: brecht

Differential Revision: https://developer.blender.org/D2586
2017-03-29 13:41:11 +02:00
Sergey Sharybin
61db9ee27a Cycles: Attempt to workaround compilation error on new CUDA toolkit and sm_2x 2017-03-29 11:50:17 +02:00
Sergey Sharybin
6ea54fe9ff Cycles: Switch to reformulated Pluecker ray/triangle intersection
The intention of this commit it to address issues mentioned in the
reports T43865,T50164 and T50452.

The code is based on Embree code with some extra vectorization
to speed up single ray to single triangle intersection.

Unfortunately, such a fix is not coming for free. There is some
slowdown for AVX2 processors, mainly due to different vectorization
code, which caused different number of instructions to be executed
and different instructions-per-cycle counters. But on another hand
this commit makes pre-AVX2 platforms such as AVX and SSE4.1 a bit
faster. The prerformance goes as following:

              2.78c AVX2   2.78c AVX   Patch AVX2         Patch AVX
BMW            05:21.09     06:05.34    05:32.97 (+3.5%)   05:34.97 (-8.5%)
Classroom      16:55.36     18:24.51    17:10.41 (+1.4%)   17:15.87 (-6.3%)
Fishy Cat      08:08.49     08:36.26    08:09.19 (+0.2%)   08:12.25 (-4.7%
Koro           11:22.54     11:45.24    11:13.25 (-1.5%)   11:43.81 (-0.3%)
Barcelone      14:18.32     16:09.46    14:15.20 (-0.4%)   14:25.15 (-10.8%)

On GPU the performance is about 1.5-2% slower in my tests on GTX1080
but afraid we can't do much as a part of this chaneg here and
consider it a price to pay for more proper intersection check.

Made in collaboration with Maxym Dmytrychenko, big thanks to him!

Reviewers: brecht, juicyfruit, lukasstockner97, dingto

Differential Revision: https://developer.blender.org/D1574
2017-03-28 17:26:47 +02:00
Sergey Sharybin
3f61280327 Cycles: Pass m128 vectors by const reference 2017-03-28 11:01:11 +02:00
Sergey Sharybin
bd053ac7ba Cycles: Correct ifdef around float3 intrinsics 2017-03-27 16:13:07 +02:00
Sergey Sharybin
5b45715f8a Cycles: Correct isfinite check used in integrator
Use fast-math friendly version of this function.

We should probably avoid unsafe fast math, but this is to be done with
real care with all the benchmarks properly done.

For now comitting much safer fix.
2017-03-24 15:39:33 +01:00
Sergey Sharybin
a96110e710 Cycles: Remove old non-optimized triangle intersection function
It is unused now and if we want similar function we should use
Pluecker intersection which is same performance with SSE optimization
but which is more watertight.
2017-03-23 17:59:34 +01:00
Sergey Sharybin
a1348dde2e Cycles: Fix speed regression on GPU
Avoid construction of temporary array and make utility function force-inlined.
Additionally avoid calling float4_to_float3 twice.

This brings render times to the same values as before current patch series.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
a5b6742ed2 Cycles: Move watertight triangle intersection to an utility file
This way the code can be reused more easily.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
f8a999c965 Cycles: Move triangle intersection precalc to an util file
This is a preparation work for the followup commit which wil l move
remaining parts of Woop intersection logic to an utility file.

Doing it as a separate commit to keep changes more atomic and easier
to bisect when/if needed.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
b797a5ff78 Cycles: Cleanup, move utility function to utility file
Was an old TODO, this function is handy for some math utilities as well.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
aa0602130b Cycles: Cleanup, code style and comments 2017-03-23 17:45:19 +01:00
Sergey Sharybin
1c5cceb7af Cycles: Move intersection math to own header file
There are following benefits:

- Modifying intersection algorithm will not cause so much re-compilation.
- It works around header dependency hell and allows us to use vectorization
  types much easier in there.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
5c06ff8bb9 Cycles: Cleanup, remove unused function 2017-03-23 17:45:19 +01:00
Mai Lavelle
96868a3941 Fix T50888: Numeric overflow in split kernel state buffer size calculation
Overflow led to the state buffer being too small and the split kernel to
get stuck doing nothing forever.
2017-03-11 05:39:28 -05:00
2d3c44389a Fix OpenCL warnings about doubles on some platforms. 2017-03-11 00:55:23 +01:00
Hristo Gueorguiev
9de9f25b24 Cycles: add single program debug option for split kernel
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
2017-03-09 17:09:37 +01:00
Sergey Sharybin
75cb4850f0 Cycles: Use 1-based line number for #line directives
AMD CPU platform was complaining about #line 0 directives in the code.
2017-03-08 12:45:18 +01:00
Mai Lavelle
817873cc83 Cycles: CUDA implementation of split kernel 2017-03-08 01:24:53 -05:00
Mai Lavelle
0892352bfe Cycles: CPU implementation of split kernel 2017-03-08 00:52:41 -05:00
Sergey Sharybin
a87766416f Cycles: Report device maximum allocation and detected global size 2017-03-08 00:52:41 -05:00
Mai Lavelle
230c00d872 Cycles: OpenCL split kernel refactor
This does a few things at once:

- Refactors host side split kernel logic into a new device
  agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
  allows as many threads as will fit in memory regardless of tile size, which
  can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
  number of arguments passed to kernels. Means there's less code to deal
  with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
  other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
2017-03-08 00:52:41 -05:00
Mai Lavelle
dfd6055eb0 Cycles: Add more atomic operations 2017-03-08 00:52:41 -05:00
Sergey Sharybin
0e995e0bfe Cycles: Fix strict -Wpedantic warnings with GCC
Patch by Stefan Werner, thanks!
2017-03-06 14:18:26 +01:00
Aaron Carlisle
6d1ac79514 Cleanup: Grey --> Gray 2017-02-27 19:33:57 -05:00
Sergey Sharybin
1e29286c8c Cycles: Fix compilation warning with CUDA on OSX 2017-02-24 14:33:10 +01:00
Sergey Sharybin
50328b41a7 Cycles: Fix compilation error on 32bit Linux 2017-02-23 17:30:26 +01:00
Sergey Sharybin
4e12113bea Cycles: Fix wrong render results with texture limit and half-float textures 2017-02-23 14:46:22 +01:00
Sergey Sharybin
13e075600a Cycles: Add utility function to convert float to half
handles overflow and underflow, but not NaN/inf.
2017-02-23 14:42:06 +01:00
Sergey Sharybin
2268f41418 Cycles: Update current Cycles version 2017-01-23 10:25:59 +01:00
Sergey Sharybin
254fbcdd7b Cycles: Fix compilation error on with older GCC
Hopefully it works on all platforms now.
2017-01-20 11:55:48 +01:00
Sergey Sharybin
78b94902f8 Cycles: Add fast-math safe isnan and isfinite
Currently unused, but might become really handy in the future.
2017-01-19 14:51:11 +01:00
Sergey Sharybin
6d36e033ba Cycles: Remove using namespace hell
Please NEVER EVER use such a statement, it's only causing HUGE
issues. What is even worse: it's not always possible to immediately
see that the hell is coming from such a statement.

There is still some statements in the existing code, will leave
those for a later cleanup.
2017-01-19 14:51:11 +01:00
Lukas Stockner
a2ebc5268f Cycles: Refactor Progress system to provide better estimates
The Progress system in Cycles had two limitations so far:
 - It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image.
 - Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased.

This patch fixes both problems:
First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time.
The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samples*num_tiles, the code now counts every sample for every pixel, so that the final value is num_samples*num_pixels.

Along with that, some unused variables were removed from the Progress and Session classes.

Reviewers: brecht, sergey, #cycles

Subscribers: brecht, candreacchio, sergey

Differential Revision: https://developer.blender.org/D2214
2016-12-03 05:02:21 +01:00
Sergey Sharybin
acc1f8fbed Cycles: Add AVX intrinsics helpers
They are defined for MSVC but seems to be missing in GCC and CLang-3.8.

Maybe some further tweaks to policy when to define those functions is
needed, but should be fine for now.
2016-12-02 12:23:38 +01:00
Sergey Sharybin
0ac2be7030 Cycles: Disable AVX2 crash workarounds
I can no longer reproduce crash with neither of the files where
the crash was originally visible. This is something where other
changes (light threshold, sampling) had an effect and made code
to work as it is supposed to. Could have been optimizator issue
or something like that.

Let's see if we hit same issue again.
2016-12-02 10:17:05 +01:00
Sergey Sharybin
a537e7b426 Cycles: Fix strict compilation warnings 2016-11-23 10:59:54 +01:00
Sergey Sharybin
751573ce6f Fix T50034: Blender changes processor affinity unauthorized 2016-11-22 16:03:16 +01:00
Sergey Sharybin
4818b3c97e Cycles: Fix re-definition of some functions on x32 arch 2016-11-22 12:34:45 +01:00
Sergey Sharybin
edc10f5529 Cycles: Another attempt to fix compilation on 32bit Linux 2016-11-22 12:11:08 +01:00
Sergey Sharybin
af444e913f Cycles: Attempt to fix 32bit buildbot builds after recent commit 2016-11-22 12:06:16 +01:00
Sergey Sharybin
272412f9c0 Cycles: Implement texture size limit simplify option
Main intention is to give some quick way to control scene's memory
usage by clamping textures which are too big. This is really handy
on the early production stages when you first create really nice
looking hi-res textures and only when it all works and approved
start investing time on optimizing your scene.

This is a new option in Scene Simplify panel and it acts as
following: when texture size is bigger than the given value it'll
be scaled down by half for until it fits into given limit.

There are various possible improvements, such as:

- Use threaded scaling using our own task manager.

  This is actually one of the main reasons why image resize is
  manually-implemented instead of using OIIO's resize. Other
  reason here is that API seems limited to construct 3D texture
  description easily.

- Vectorization of uchar4/float4/half4 textures.

- Use something smarter than box filter.

  Was playing with some other filters, but not sure they are
  really better: they kind of causes more fuzzy edges.

Even with such a TODOs in the code the option is already quite
useful.

Reviewers: brecht

Reviewed By: brecht

Subscribers: jtheninja, Blendify, gregzaal, venomgfx

Differential Revision: https://developer.blender.org/D2362
2016-11-22 12:00:09 +01:00
Sergey Sharybin
4ee08e9533 Atomics: Make naming more obvious about which value is being returned 2016-11-15 12:16:26 +01:00