Otherwise devices used for display will lock up the UI too much. This means
you might still get 100% CPU for the display device, but for others CPU usage
should be low still.
The check to see if a device is used for display may not be entirely reliable,
it checks if there is a watchdog timeout on the device, but I'm not entirely
sure that always exists for display devices or is disabled for non-display
devices, though some tools like cuda-gdb seem to make the same assumption.
Ref T39559
This makes it easier to have per kernel number of registers. Also, all the
tunable parameters for this are now in kernel.cu, rather than spread over cmake,
scons and device_cuda.cpp.
The problem here was that animation buffers got initialized with zeros in the beginning for unknown parts. Now it gets initialized with the first known value.
The bug's result was that the animation of the pitch started with 0 on first playback and thus any seeking while the pitch is zero resulted in seeking to the beginning.
Nice little mistake, since the invalid mem access only happened once (the first time),
was close to valid mem, and was only used to read, it would not crash often...
Idea and code by Brecht, many thanks!
Reviewers: brecht
Reviewed By: brecht
CC: campbellbarton, dingto
Differential Revision: https://developer.blender.org/D369
This fixes the ptaxs "ACCESS_VIOLATION" error and should allow our Linux and Windows build bots to compile again.
Unfortunately this comes with a performance penalty on sm_2x cards, so this is only a workaround for now. Branched Path is still globally disabled on GPU.
In practice this means that if you don't connect a texture to your volume nodes
it will figure that out and render the node faster, rather than you having to
specify it manually.
Main weakness is custom OSL nodes where we have to assume it is heterogeneous
because we don't know what kind of data the node accesses.
This now uses decoupled ray marching, and removes the probalistic scattering.
What this means is that each AA sample will be slower but contain less noise,
hopefully giving less render time to reach the same noise levels.
For those following along, there's still a bunch of volume sampling improvements
to do: all-light sampling, multiple importance sampling, transmittance threshold,
better indirect light handling, multiple scatter approximation.
This basically records all volumes steps, which can then later be used multiple
time to take scattering samples, without having to step through the volume
again. From the paper:
"Importance Sampling Techniques for Path Tracing in Participating Media"
This works only on the CPU, due to usage of malloc/free.
Similar to surfaces, this will now always scatter rather than probabilistically
scattering or not depending on the transmittance.
This also makes calculation of branched path throughput non-probalistic, which
makes thing slower too. That's to be solved by decoupled ray marching later.