The code was templated already, so don't see big reason to have
3 versions of templated functions. It was giving some extra code
to maintain and in fact already had divergency for support of huge
image resolution (missing size_t cast in byte image loading).
There should be no changes visible by artists.
`kernel_path.h` and `kernel_path_branched.h` have a lot of conditional code and
it was kind of hard to tell what code belonged to which directive. Should be
easier to read now.
Add subframe to the animated seed hash calculation.
Should be no difference for the regular files, only for cases when scene is
rendered from sequencer with a speed effect, which is not really a common thing.
This is a late follow-up commit to the light sample threshold changes which
caused difference in rendering all existing .blend files which is not something
we are happy about: it is fine to use new optimized defaults for new files, but
existing ones should always be rendering in the same way as they used to be.
Sorry for the inconveniece, but such thing should have been done to begin with.
If this setting was modified it will not be reset to zero.
Now all render tests should be passing again.
P.S. Also really annoying to bump subversion for such reasons, but currently we
don't have better way to achieve what we want.
After CUDA dynload changes having CUDA toolkit became required
in order to compile Cycles. This only happened due to wrong
default value to the option.
Previously, it was only possible to choose a single GPU or all of that type (CUDA or OpenCL).
Now, a toggle button is displayed for every device.
These settings are tied to the PCI Bus ID of the devices, so they're consistent across hardware addition and removal (but not when swapping/moving cards).
From the code perspective, the more important change is that now, the compute device properties are stored in the Addon preferences of the Cycles addon, instead of directly in the User Preferences.
This allows for a cleaner implementation, removing the Cycles C API functions that were called by the RNA code to specify the enum items.
Note that this change is neither backwards- nor forwards-compatible, but since it's only a User Preference no existing files are broken.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: brecht, juicyfruit, mib2berlin, Blendify
Differential Revision: https://developer.blender.org/D2338
With this fix, using a MIS map resolution equal to the image size for closest imterpolation or twice the size for linear interpolation gets rid of all fireflies.
Previously, a much higher resolution was needed to get acceptable noise levels.
Basically, the problem here was that the transform that's used to bring texture coordinates
to world space is either fetched while setting up the shader (with Object Motion is enabled) or
fetched when needed (otherwise). That helps to save ShaderData memory on OpenCL when Object Motion isn't needed.
Now, if OM is enabled, the Lamp transform can just be stored inside the ShaderData as well. The original commit just assumed it is.
However, when it's not (on OpenCL by default, for example), there is no easy way to fetch it when needed, since the ShaderData doesn't
store the Lamp index.
So, for now the lamps just don't support local texture coordinates anymore when Object Motion is disabled.
To fix and support this properly, one of the following could be done:
- Just always pre-fetch the transform. Downside: Memory Usage increases when not using OM on OpenCL
- Add a variable to ShaderData that stores the Lamp ID to allow fetching it when needed
- Store the Lamp ID inside prim or object. Problem: Cycles currently checks these for whether an object was hit - these checks would need to be changed.
- Enable OM whenever a Texture Coordinate's Normal output is used. Downside: Might not actually be needed.
This allows to save a memory copy, which will be particularly useful for network rendering.
Reviewers: sergey, brecht, dingto, juicyfruit, maiself
Differential Revision: https://developer.blender.org/D2323
In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways.
To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased.
This is the same approach that's also used in path termination based on length.
Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise.
Reviewers: #cycles
Subscribers: sergey, brecht
Differential Revision: https://developer.blender.org/D2217
This option allows to create a smoother transition between Bricks and Mortar - 0 applies no smoothing, and 1 smooths across the whole mortar width.
Mainly useful for displacement textures.
The new default value for the smoothing option is 0.1 to give some smoothing that helps with antialiasing, but existing nodes are loaded with smoothing 0 to preserve compatibility.
Reviewers: sergey, dingto, juicyfruit, brecht
Reviewed By: brecht
Subscribers: Blendify, nutel
Differential Revision: https://developer.blender.org/D2230
When using the Normal output of the Texture Coordinate node on Point and Spot lamps, the coordinates now depend on the rotation of the lamp.
On Area lamps, the Parametric output of the Geometry node now returns UV coordinates on the area lamp.
Credit for the Area lamp part goes to Stefan Werner (from D1995).
Oh man, is it a compiler bug? Is it something we do stupid?
For now more crap to prevent crashes. During the conference will talk to
Maxyn about how can we troubleshoot such weird issues.
Basically don't use rcp() in areas which seems to be critical after
second look. Also disabled some multiplication operators, not sure
yet why they might be a problem.
Tomorrow will be setting up a full test with all cases which were
buggy in our farm to see if this fix is complete.
There is some precision issues for big magnitude coordinates which started
to give weird behavior of release builds. Some weird memory usage in BVH
which is tricky to nail down because only happens in release builds and GDB
reports all variables as optimized out when trying to use RelWithDebInfo.
There are two things in this commit:
- Attempt to make vectorized code closer to original one, hoping that it'll
eliminate precision issue.
This seems to work for transform_point().
- Similar trick did not work for transform_direction() even tho absolute
error here is much smaller. For now disabled that function, need a more
careful look here.
Several ideas here:
- Optimize calculation of near_{x,y,z} in a way that does not require
3 if() statements per update, which avoids negative effect of wrong
branch prediction.
- Optimization of direction clamping for BVH.
- Optimization of point/direction transform.
Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).
Once again ,thanks Maxym for inspiration!
Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
Similar to the previous commit, avoid negative effect of bad branch prediction.
Gives measurable performance up to ~2% in tests here.
Once again, thanks to Maxym Dmytrychenko!
The idea here is to avoid if statements which could cause wrong
branch prediction.
Gives a bit of measurable speedup up to ~1%. Still nice :)
Inspired by Maxym Dmytrychenko, thanks!
Similar to regular triangle intersection case. Gives about 3% speedup rendering
SSS object on my desktop,
Question: how to avoid such a code duplication in a nice way without speed loss?
This will confuse hell of a guarded allocators because it is possible
to have allocation happened prior to Blender's guarded allocator is
fully initialized.
This was causing crashes and assert failures when running blender
with fully guarded memory allocator.
Initialization order of global stats and node types was not strictly
defined and it was possible to have node types initialized first and
stats after that. This will zero out memory which was allocated from
the statistics causing assert failure when de-initializing node types.
It was possible to have non-initialized unaligned BVH split
to be used when regular BVH split SAH was inf. Now we ensure
that unaligned splitter is only used when it's really initialized.
It's a regression and should be in 2.78a.
Note that volume rendering is not supported yet, this is a step towards that.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2299
Previously an error message would be printed whenever the OpenCL build produced output.
However, some frameworks seem to print extra information even if the build succeeded, so now the actual returned error is checked as well.
When --debug-cycles is activated, the build output will always be printed, otherwise it only gets printed if there was an error.
The problem here was, as the title says, that the two kernels were swapped.
Since shader evaluation is only used for building the samling map when World MIS is enabled, rendering without it would still work fine, although baking also was broken.
The previous refactor changed the code to use a separate logging mechanism to support multithreaded compilation.
However, since that's not supported by any frameworks yes, it just resulted in bad logging behaviour.
So, this commit changes the logging to go diectly to stdout/stderr once again by default.
This was giving some speedup but made intersection tests to fail
from watertight point of view.
Needs deeper investigation, but need to quickly get it fixed for
the studio.
This gives about 5% speedup on AVX2 kernels (other kernels still
have SSE disabled for math operations) and this solves the slowdown
of koro scene mention in the previous commit.
The title says it all actually. This commit also contains
changes to pass float3 as const reference in affected functions.
This should make MSVC happier without breaking OpenCL because it's
only done in areas which are ifdef-ed for non-OpenCL.
Another patch based on inspiration from Maxym Dmytrychenko, thanks!
This commit basically vectorizes existing code using AVX2 instructions
(without modifying algorithm itself). This gives quite nice speedups:
BMW: -8%
Classroom: -5%
Cat: -5%
Koro: +1%
Barcelona: -8%
That's on Linux machine, reported performance improvement on Windows
goes up to 20%.
Not currently sure why Koro is somewhat slower because it mainly uses
curve intersection tests, could be a time noise? Or osmething with the
cache utilization perhaps? In any case speedup in other scenes makes
me thinking that current state is acceptable for initial implementation.
This is again inspired by Maxym Dmytrychenko.
Based on existing ssef data type and to my knowledge it's also what happens in
Embree nowadays.
Inspired by Maxym Dmytrychenko and required for the upcoming triangle
intersection commit.
Hopefully the copyright message is correct.
When ray hits curve segment with SSS shader it was possible to have
uninitialized hit_P variable used for sampling.
Seems that was a reason of our headache of difference between AVX2
and SSE4 render results here, so now we can revert all the nasty
ifdef-ed inline policies.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
Fixes for clamp-omp, fewer shared variables, fix some cases of threads writing
to the same memory location. Issue found by Jens Verwiebe, who reports 30%
speedup with 16 core CPU, when using this with a recent clang-omp version.
This is also an important mathematical operation that can be folded
if it is known that one argument is a certain constant. For colors
the operation is provided as a Gamma node.
The SVM Gamma node needs a small fix to make it follow the 0 ^ 0 == 1
rule, same as the Power node, or the Gamma node itself in OSL mode.
Reviewers: #cycles
Differential Revision: https://developer.blender.org/D2263
It will discard the whole tile, but it's still kind of more friendly than
fully locked interface (sort of) for until tile is fully sampled.
Sorry if it causes PITA to merge for the opencl split work, but this issue
bothering a lot when collecting benchmarks.
Previously it was falling back to just a path after #include
statement was finished. Now we fall back to a proper current
file name after dealing with the preprocessor statement.
Some of the files were wrongly attributing code to some other
organizations and in few places proper attribution was missing.
This is mainly either a copy-paste error (when new file was
created from an existing one and header wasn't updated) or due
to some refactor which split non-original-BF code with purely
BF code.
Should solve some confusion around.
In Windows, event dispatching code is throwing out the wheel scroll count value.
Despite of how many fast you move the wheel, it only make one-notch scroll event.
This patch convert wheel event to multiple 1-notch wheel events.
This also correct the handling of smooth scroll mouse wheel (which can report smaller than 1-notch wheel movement) by accumulating the small wheel delta values.
Reviewers: djnz, shadowrom, elubie, #platform:_windows, sergey, juicyfruit, brecht
Reviewed By: shadowrom, elubie, #platform:_windows, brecht
Subscribers: dingto, elubie, brachi, brecht
Differential Revision: https://developer.blender.org/D143
The light sampling functions calculate light sampling PDF for the case that the light has been randomly selected out of all lights.
However, since BPT handles lamps and meshlights separately, this isn't the case. So, to avoid a wrong result, the code just included the 0.5 factor in the throughput.
In theory, however, the correction should be made to the sampling probability, which needs to be doubled. Now, for the regular calculation, that's no real difference since the throughput is divided by the pdf.
However, it does matter for the MIS calculation - it's unbiased both ways, but including the factor in the PDF instead of the throughput should give slightly better results.
Reviewers: sergey, brecht, dingto, juicyfruit
Differential Revision: https://developer.blender.org/D2258
We raised the minimum to GL 2.1 in Blender 2.77, and dropped support for older GPUs (pre-2012 Intel mostly). On Windows you get a popup message, but on Mac we simply crashed. Every Mac has a builtin software renderer for GL 2.1 so let's use that when the GPU is not capable!
Run blender --debug-gpu to see version detection & software fallback.
Idea here is to select the lowest isolation level that wont compromise quality.
By using the lowest level we save memory and processing time. This will also
help avoid precision issues that have been showing up from using the highest
level (T49179, T49257).
This is a pretty simple heuristic that gives ok results. There's more we could
do here, such as filtering for vertices/edges adjacent geometric features that
need isolation instead of checking them all, but the logic there could get a
bit involved.
There's potential for slight popping of edges during animation if the dice
rate is low, but I don't think this should be a problem since low dice rates
really shouldn't be used in animation anyways.
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D2240
This reverts commit ecbfa31caaadb03c53c0fe1459718b99613c8804.
Original commit broke logic in nodes re-fitting. That area can
access non-existing children momentarely. Not sure what would
be best solution here, for now simply reverting the change/
Problem was zero length normal caused by a precision issue in patch evaluation.
This is somewhat of a quick fix, but is better than allowing possible NaNs to
occur and cause problems elsewhere.
Both spot and area light have large areas where they're not visible.
Therefore, this patch stops the light sampling code when one of these cases (outside of the spotlight cone or behind the area light) occurs, before the lamp shader is evaluated.
In the case of the area light, the solid angle sampling can also be skipped.
In a test scene with Sample All Lights and 18 Area lamps and 9 Spot lamps that all point away from the area that the camera sees, render time drops from 12sec to 5sec.
Reviewers: brecht, sergey, dingto, juicyfruit
Differential Revision: https://developer.blender.org/D2216
Small issues in GHOST
- use NSApplicationDelegate protocol for our app delegate
- make sure NSApp is initialized before using
(cherry picked from commit df7be04ca6d4b6dccc998445386228699d72d072)
The title says it all actually. From tests with barber shop scene here
gives 2-3x speedup for shader compilation on my oldie i7 machine. The
gain is mainly due to textures metadata query from jpeg files (which
seems to requite de-compression before metadata can be read). But in
theory could give nice improvements for scenes with huge node trees
as well (i'm talking about node trees of complexity of fractal which
we had reports about in the past).
Reviewers: juicyfruit, dingto, lukasstockner97, brecht
Reviewed By: brecht
Subscribers: monio, Blendify
Differential Revision: https://developer.blender.org/D2215
The issue was caused by some false-positive empty non-AABB intersection.
Tried to tweak it a bit so it does not record intersection anymore.
Hopefully will work for all platforms. Tested here on iMac and Debian.
Basically just moves cached kernels from ~/.config/blender/BLENDER_VERSION to
~/.cache/cycles/kernels. This has following benefits:
- Follows XDG specification more closely,
not as if it's totally crucial or measurable by users, but still nice.
- Prevents unexpected sizes of config folder, makes disk space used in more
predictable for users way.
- Allows to share kernels across multiple Blender versions,
which makes it easier debugging at the times close to release.
- "Copy Previous Settings" operator will no longer be copying possibly
gigabytes of cached kernels, which used to lead to really nast disk usage
and annoying delays of copying settings.
- In the future we can have some smart logic to clear old unused cached
kernels.
Currently only done for Linux and OSX. Windows still follows old "cache"
folder logic, but it's not really important for now because we don't
support kernel compilation on this platform yet.
Reviewers: dingto, juicyfruit, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2197
Constant folding was removing all nodes connected to the displacement output
if they evaluated to a constant, causing there to be no valid graph for
displacement even when there was displacement to be applied, and sometimes
caused crashes.
Using ones complement for detecting if transform has been applied was confusing
and led to several bugs. With this proper checks are made.
Also added a few transforms where they were missing, mostly affecting baking
and displacement when `P` is used in the shader (previously `P` was in the
wrong space for these shaders)
Also removed `TIME_INVALID` as this may have resulted in incorrect
transforms in some cases.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2192
Bump mapping was happening in world space while displacement happens in object
space, causing shading errors when displacement type was used with bump mapping.
To fix this the proper transforms are added to bump nodes. This is only done
for automatic bump mapping however, to avoid visual changes from other uses of
bump mapping. It would be nice to do this for all bump mapping to be consistent
but that will have to wait till we can break compatibility.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2191
The existing code uses the input value count of the first channel
for all of them. If the first channel is the largest, it leads to
a crash-causing buffer overrun in memcpy below. Likely this was
left since the time when only one channel was supported.
As a crash fix, probably should go into 2.78
This commits changes two things:
* It adds more keysyms preferably taken from XLookupKeysym than XLookupString (namely, all numpad ones).
* It falls back to keysyms from XLookupKeysym in other cases, when XLookupString does not produce anything we know of.
Finding the correct balance here is far from easy, but think we are comming rather close to it now...
Most of the time, Lamps in Cycles are just a constant emission closure, no texturing etc. Therefore, running a full shader evaluation is wasteful.
To avoid that, Cycles now detects these constant emission shaders and stores their value in the lamp data along with a flag in the shader.
Then, at runtime, if this flag is set, the lamp code just uses this value and only runs the full shader evaluation if it is neccessary.
In scenes with a lot of lamps and with "Sample all direct/indirect" enabled, this saves up to 20% of rendering time in my tests.
Reviewers: #cycles
Differential Revision: https://developer.blender.org/D2193
For proper indexing to work we need to use unaligned node with
identity transform instead of aligned nodes when doing refit.
To be backported to 2.78 release.
Weirdly enough, this version of XCode seems to have static_assert()
even when NOT using C++11. This is totally weird and counter intuitive
since static_assert() is supposed to be C++11 onlky feature.
Can XCode stop using future, please? :)
The two SVM nodes added with e7ea1ae78c caused a slowdown on AMD cards when rendering with OpenCL, whether displacement was used or not.
In the Barcelona Pavillon scene on a RX480, this would cause a 12% slowdown.
Therefore, this commit adds a additional flag for feature-adaptive compilation so that the new SVM nodes are only enabled when they are needed (Node tree connected to the Displacement output and Displacement type set to Both).
Also, the nodes were also added to shaders when the Displacement Type was set to Bump (the default), which was unneccessary and is fixed now.
Thanks to linda2 on IRC for reporting and testing and to maiself for help with the displacement shader code.
This fix might be relevant for 2.78, but it should be tested further before including it.
Could not reproduce it here, but since users having the issue claims it comes from
rB16cb9391634dcc50e, let's try to use again ugly `XLookupKeysym()` for those modifier keys too...
Object coordinates can now be used in the displacement shader and will give
correct results, where as before bump mapping was calculated from the displace
positions and resulted in incorrect shading.
This works by evaluating the shader in two parts, first bump then surface, and
setting the shader state to match what it would be if the surface was
undisplaced for the bump shader evaluation. Currently only `P` is set as if
undisplaced, but other shader variables could be set as well, such as `I` or
`time`. Since these aren't set to anything meaningful for displacement I left
them out of this patch, we can decide what to do with them separately.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2156
Storing multiple copies of a shader was needed when the displacement method was
a mesh option and could be different for each mesh. Now that its a shader option
this is unnecessary.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2156
There basically are two issues here: in smooth mode (and all non-tangent
normal map types) it doesn't invert the normal for backfacing polys;
on the other hand for flat shaded tangent type it is inverted too soon.
This fix does a brute force correction by checking the backfacing flag.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D2181
Meshes with Cycles subdivision were being transformed to world space leading to
normals to sometimes be calculated in that space, while they should be in
object space. Also caused dicing to happen at the wrong rate for scaled meshes.
Users have been getting a bit confused by the way things are worded/arranged in
the UI. This patch makes a few changes to the UI to make it more clear how to
use subdivision:
- make Subdivide UVs option inactive when adaptive subdivision is enabled as UV
subdivision is currently unsupported
- add "px" to dicing rates in the Geometry Panel
- display the final dicing rate in the modifier
- reworded "Dicing Rate" in the modifier to "Dicing Scale" to make more clear
that this is a multiplier for the scene dicing rate and added a note the the
tooltip pointing the user to that setting in the Geometry Panel
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2174
Changes from microdisplacement work broke previous support for subdivision
meshes, sometimes leading to crashes; this makes things work again. Files
that contain "patch" nodes will need to be updated to use meshes instead, as
specifying patches was both inefficient and completely unsupported by the new
subdivision code.
The if branches were reordered when the original patch was
committed, which broke the implicit non-NULL guarantee on link.
To prevent re-occurrence, add a couple of unit tests.
For best results use the latest 3Dconnexion driver. But latest is only
supported on Mac OS 10.9+. We go all the way back to Mac OS 10.6 so
have to deal with older driver versions.
See the original dlclose line for my faulty assumption. Waiting to
unload the driver later fixes the crash. Newer drivers don’t seem to
have this issue.
Also removed WITH_INPUT_NDOF guards as NDOFManager.h takes care of
this. Follow-up to b10d005 a few days ago.
When WITH_INPUT_NDOF is disabled, 3D mouse handling code is removed
from:
- GHOST (was mostly done, finished the job)
- window manager
- various editors
- RNA
- keymaps
The input tab of user prefs does not show 3D mouse settings. Key map
editor does not show NDOF mappings.
DNA does not change.
On my Mac the compiled binary is 42KB smaller after this change. It
runs fine WITH_INPUT_NDOF on or off.
This reverts commit 40b367479c6fe23d6f2b6d822f2d5266485619f3.
Didn't build or solve any known issue. Please don't push changes without
testing them first.
These latter can cause MSVC debug asserts if the array is empty. With C++11
we'll be able to do this for std::vector later. This hopefully fixes an assert
in the Cycles subdivision code.
Kernels can now be built without patch evaluation when not needed by the
scene (Catmull-Clark subdivision not in use), giving a performance boost
for some devices.
Enable based on --debug-gpu at the command line. Linux already works
this way.
This commit in master (2.78 timeframe) is the smallest change that gets
the desired result. I did a similar commit in blender2.8. Most ongoing
GL debug work will go into 2.8, not 2.7x.
In support of T49089
By calling `tessellate()` from the mesh manager in Cycles we can do pre/post
processing or even threaded tessellation without concerning client side code
with the details.
This way OpenCL devices can also benefit from a smaller memory footprint, when using e.g. bumpmaps (greyscale, 1 channel).
Additional target for my GSoC 2016.
Now we have the 4 component ones first (float4, byte4, half4) followed by the 1 component ones (float, byte, half).
Makes code a bit more consistent and also reduces code a bit when enabling half support on GPU in next commit.
This also exposed a typo in half CPU images for 3D textures, which wasn't used yet, but good to have that one fixed anyway.
Using unit tests is a wrong way to control static behavior of the
application. They should only be used for checking dynamic behavior,
all the rest is easily controllable at compile time.
Doing tests at ocmpile time are actually more robust approach since
we don't have strict policy of runnign unit tests before accepting
any change.
Proper alignment control is coming shortly.
This reverts commit 7c3a06c34918567e6b0ab67bded60725ff63073b.
Since the optimal values depend on the device used, this option doesn't make much sense in the XML.
Therefore, it's now specified via the command line, just like the device itself.
Another issue with the modified particle motion blur fix: since
pre and post are used as validity markers, they must be set even
if there is no actual motion, like the original bool flags were.
Otherwise an object starting to move or stopping is interpreted
as having invalid blur data and hidden.
Displacement is now a per material setting, which means old files will have to
be updated if they had used displacement. Cool side effect of this change is
material previews now show displacement.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2140
Enables Catmull-Clark subdivision meshes with support for creases and attribute
subdivision. Still waiting on OpenSubdiv to fully support face varying
interpolation for subdividing uv coordinates tho. Also there may be some
inconsistencies with Blender's subdivision which will be resolved at a
later time.
Code for reading patch tables and creating patch maps is borrowed
from OpenSubdiv.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2111
Adds a descriptor for attributes that can easily be passed around and extended
to contain more data. Will be used for attributes on subdivision meshes.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2110
Currently cycles cannot correctly render motion blur for objects that appear or
disappear during the shutter window. Until that can be fixed properly, it may be
better to hide such particles rather than let them render as if they were
stationary for half of the frame.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2125
It is possible that compilation will fail without giving anything in the
log buffer. For this cases giving a tip about error code will be really
handy.
Patch by @Ilia, thanks!
This changes actually lead to 2x slowdown. It's getting a bit annoying
because those are the changes to make pre-maxwell cards render with the
same speed.
The problem happens because smoke collides only with the surface of the
collider and uses incompressible fluid solver. This means that scaling
the collider tries to compress or decompress fluid within the volume of
the collider, which can't be handled by the simulation. Fast rotation
likely also causes transient scaling due to emulation of arcs by chords.
This can be fixed by finding compartments completely isolated by obstacles
from the rest of the domain, and forcing total divergence within each one
to be zero so that equations are solvable. Physical validity is somewhat
dubious, but without this the solver simply breaks down.
From the physics point of view, the effect of the correction should be
similar to opening a hole from every cell to another dimension that lets
an equal amount of air to pass through to balance the change in volume.
Reviewers: miikah, lukastoenne
Reviewed By: lukastoenne
Subscribers: dafassi, scorpion81, #physics
Maniphest Tasks: T43220, T47551
Differential Revision: https://developer.blender.org/D2112
As a result of other folding simplifications it may happen that
two type conversion nodes end up directly connected. In some
cases it may be possible to then remove both. A realistic case
might be an optimized out Mix RGB node used to blend vectors.
It seems it's safe to optimize when B is a float3 type
(color, vector), and A is float3 or float.
Reviewers: #cycles, sergey
Reviewed By: #cycles, sergey
Subscribers: sergey
Differential Revision: https://developer.blender.org/D2134
This way we can easily switch between toolkits without worrying
whether some kernel was compiled with old or new CUDA toolkit.
It's also now possible to switch machine architecture and have
proper cached kernel detected. Not as if it happens every day,
but i did such a bitness switch back in the days :)
Code coverage of different combinations of secondary conditions
is obviously not complete because there are so many of them, but
all main rules should be there.
The reason for CORRECT vs INVALID is that both words have the same
number of characters so calls line up, but look quite different.
Reviewers: #cycles, sergey
Reviewed By: #cycles, sergey
Subscribers: dingto, sergey, brecht
Differential Revision: https://developer.blender.org/D2130
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).