One problem is that it was always using __mm_blendv_ps emulation even if the
instruction was supported. The other that the emulation function was wrong.
Thanks a lot to Ray Molenkamp for tracking this one down.
While unlikely to have had any serious effects because of limited use, the
previous implementation was not actually atomic due to a data race and
incorrectly coded CAS loop. We also had duplicates of this code in a few
places, it's now been moved to a single location with all other atomic
operations.
It is defined to & for CPU side compilation, and defined to an empty for any GPU
platform. The idea here is to use this macro instead of #ifdef block with bunch
of duplicated lines just to make it so CPU code is efficient.
Eventually we might switch to references on CUDA as well, but that would require
some intensive testing.
* Remove some unnecessary SSE emulation defines.
* Use full precision float division so we can enable it.
* Add sqrt(), sqr(), fabs(), shuffle variations, mask().
* Optimize reduce_add(), select().
Differential Revision: https://developer.blender.org/D2764
I need to use some macros defined in util_simd.h for float3/float4, to emulate
SSE4 instructions on SSE2. But due to issues with order of header includes this
was not possible, this does some refactoring to make it work.
Differential Revision: https://developer.blender.org/D2764
Two main things here:
1. Replace all unsafe for #line directive characters into a single loop,
avoiding multiple iterations and multiple temporary strings created.
2. Don't merge token char by char but calculate start and end point and
then copy all substring at once.
This gives about 15% speedup of source processing time. At this point
(with all previous commits from today) we've shrinked down compiled
sources size from 108 MB down to ~5.5 MB and lowered processing time
from 4.5 sec down to 0.047 sec on my laptop running Linux (this was a
constant time which Blender will always spent first time loading kernel,
even if we've got compiled clbin).
Basically gather lines as-is during traversal, avoiding allocating
memory for all the lines in headers.
Brings additional performance improvement abut 20%.
The idea here is that it is possible to mark certain include statements
as "precompiled" which means all subsequent includes of that file will
be replaced with an empty string.
This is a way to deal with tricky include pattern happening in single
program OpenCL split kernel which was including bunch of headers about
10 times.
This brings preprocessing time from ~1sec to ~0.1sec on my laptop.
The idea is to re-use files which were already processed. Gives about 4x speedup
of processing time (~4.5sec vs 1.0sec) on my laptop for the whole OpenCL kernel.
For users it will mean lower delay before OpenCL rendering might start.
GCC seems to detect uninitialized into function calls now, but then isn't
always smart enough to see that it is actually initialized. Disabling this
warning entirely seems a bit too much, so initialize a bit more now.
This is not enough to mutex-guard modification code of integer values,
since this operation is NOT atomic. This is not even safe for a single
byte data types.
For now guarded the getter functions, similar to other functions in
this module.
Ideally we want to switch modification to an atomic operations, so we
wouldn't need any locks in the getters.
- Some arguments were inapproriatry tagged as unused
using (void)foo semantic.
Only use such semantic in tricky casses, when something
needs to be ignored in release builds or something is
dependent on tricky ifndef policy.
For rest of the cases just use void foo(int /bar*/)
semantic, which ensures variable is not used. Solves
confusion and code running out of sync with later
development.
- Used proper unused semantic to some arguments.
- Added braces to make code easier to follow, tricky
indentation with ifdef, uh.
Once again, numerical instabilities causing the Cholesky decomposition to fail.
However, further increasing the diagonal correction just because of a few
pixels in very specific scenes and settings seems unjustified.
Therefore, this commit simply falls back to the basic NLM-filtered pixel
if the more advanced model fails.
There were following issues with ccl_restrict_ptr:
- We already had ccl_restrict for all platforms.
- It was secretly adding `const` qualifier to the declaration,
which is quite weird since non-const pointer can also be
declared as restricted.
- We never in Blender are using foo_ptr or FooPtr type definitions,
so not sure why we should introduce such a thing here.
- It is absolutely wrong from semantic point of view to put pointer
into the restrict macro -- const is a part of type, not part of
hint for compiler that some pointer is never aliased.
Use smarter check of where the file is coming from instead of
attempting to replace same source twice with different settings.
Brings down processing time from 3.6sec to 1.8sec.
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.
To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.
Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.
Finally, thanks to all the people who supported this project:
- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
that could and/or should work better!