This commit extends the technique of dynamic linked list to the logic
system to eliminate as much as possible temporaries, map lookup or
full scan. The logic engine is now free of memory allocation, which is
an important stability factor.
The overhead of the logic system is reduced by a factor between 3 and 6
depending on the logic setup. This is the speed-up you can expect on
a logic setup using simple bricks. Heavy bricks like python controllers
and ray sensors will still take about the same time to execute so the
speed up will be less important.
The core of the logic engine has been much reworked but the functionality
is still the same except for one thing: the priority system on the
execution of controllers. The exact same remark applies to actuators but
I'll explain for controllers only:
Previously, it was possible, with the "executePriority" attribute to set
a controller to run before any other controllers in the game. Other than
that, the sequential execution of controllers, as defined in Blender was
guaranteed by default.
With the new system, the sequential execution of controllers is still
guaranteed but only within the controllers of one object. the user can
no longer set a controller to run before any other controllers in the
game. The "executePriority" attribute controls the execution of controllers
within one object. The priority is a small number starting from 0 for the
first controller and incrementing for each controller.
If this missing feature is a must, a special method can be implemented
to set a controller to run before all other controllers.
Other improvements:
- Systematic use of reference in parameter passing to avoid unnecessary data copy
- Use pre increment in iterator instead of post increment to avoid temporary allocation
- Use const char* instead of STR_String whenever possible to avoid temporary allocation
- Fix reference counting bugs (memory leak)
- Fix a crash in certain cases of state switching and object deletion
- Minor speed up in property sensor
- Removal of objects during the game is a lot faster
controller.actuators[name] and controller.sensors[name]
Made a read-only sequence type for logic brick sensors and actuators which can access single items or be used like a list or dictionary.
We could use a python dictionary or CValueList but that would be slower to create.
So you can do...
for s in controller.sensors: print s
print controller.sensors["Sensor"]
print controller.sensors[0]
sensors = list(controller.sensors)
This sequence type keeps a reference to the proxy it came from and will raise an error on access if the proxy has been removed.
Commit 20099 started using a FBO way too big.
According to Paul Bourke this is how it's done in other Engines:
Projectors HD:
1920x1050 - buffersize = 1024; FBO size = 2048
1400x1050 - buffersize = 1024; FBO size = 2048
Projectors XGA:
1024x768 - buffersize = 512; FBO size = 1024
Now in Blender Game Engine we are using:
Projectors HD:
1920x1050 - buffersize = 1050; FBO size = 2048
1400x1050 - buffersize = 1050; FBO size = 2048
Projectors XGA:
1024x768 - buffersize = 768; FBO size = 1024
(I guess I should be committing code to the ge_dome branch instead of the trunk. I feel bad doing all those adjustments in a hurry to 2.49 final release in the trunk. That is ok, right?)
*) a small note:
In the end it turned out that we have upright and downright domes out there.
So I may rearrange the order of the gui later:
(1 = fisheye, 2 = truncated up, 3 = truncated down, 4 = envmap, 5 = spherical panoramic)
I don't plan to do a doVersion() for that, so if you are using it already keep in mind that the modes may change before 249 final release.
After last commit (20099) warping meshes got slower (more quality == less performance). Since we don't need an extra warping for truncated domes, It's better to handle them directly in openGL without the need of warping it.
I'll talk with some Dome owners to see if we need both Upright and Downright modes. I may remove one of them by 2.49 them.
*) also: a proper GLEW_EXT_framebuffer_object check before generating FBO (for warping meshes).
**) next in line (maybe after RC2): tilt option to tilt the camera up to 90º upward.
We are using an image twice as big to render the fisheye before warping.
It'll slow down warping meshes a little, but we get way more resolution.
Therefore I will bring Truncated Dome mode back in order to avoid using warping mesh for that.
not fixed but the problem is now less bad when projection painting, bilinear interpolation was rounding down.
- added gameOb.attrDict to get the internal gameObject dict.
- mesh.getVertex wasnt setting an exception.
This commit extend the technique of dynamic linked list to the mesh
slots so as to eliminate dumb scan or map lookup. It provides massive
performance improvement in the culling and in the rasterizer when
the majority of objects are static.
Other improvements:
- Compute the opengl matrix only for objects that are visible.
- Simplify hash function for GEN_HasedPtr
- Scan light list instead of general object list to render shadows
- Remove redundant opengl calls to set specularity, shinyness and diffuse
between each mesh slots.
- Cache GPU material to avoid frequent call to GPU_material_from_blender
- Only set once the fixed elements of mesh slot
- Use more inline function
The following table shows the performance increase between 2.48, 1st round
and this round of improvement. The test was done with a scene containing
40000 objects, of which 1000 are in the view frustrum approximately. The
object are simple textured cube to make sure the GPU is not the bottleneck.
As some of the rasterizer processing time has moved under culling, I present
the sum of scenegraph(includes culling)+rasterizer time
Scenegraph+rasterizer(ms) 2.48 1st round 3rd round
All objects static, 323.0 86.0 7.2
all visible, 1000 in
the view frustrum
All objects static, 219.0 49.7 N/A(*)
all invisible.
All objects moving, 323.0 105.6 34.7
all visible, 1000 in
the view frustrum
Scene destruction 40min 40min 4s
(*) : this time is not representative because the frame rate was at 60fps.
In that case, the GPU holds down the GE by frame sync. By design, the
overhead of the rasterizer is 0 when the the objects are invisible.
This table shows a global speed up between 9x and 45x compared to 2.48a
for scenegraph, culling and rasterizer overhead. The speed up goes much
higher when objects are invisible.
An additional 2-4x speed up is possible in the scenegraph by upgrading
the Moto library to use Eigen2 BLAS library instead of C++ classes but
the scenegraph is already so fast that it is not a priority right now.
Next speed up in logic: many things to do there...
also deprecated getActuators() and getSensors() for 'sensors' and 'actuators' attributes.
an example of getting every sensor connected to an object.
all_sensors = [s for c in ob.controllers for s in c.sensors]
When enabled, this option converts any positive trigger from the sensor
into a pair of positive+negative trigger, with the negative trigger sent
in the next frame. The negative trigger from the sensor are not passed
to the controller as the option automatically generates the negative triggers.
From the controller point of view, the sensor is positive only for 1 frame,
even if the underlying sensor state remains positive.
The option interacts with the other sensor option in this way:
- Level option: tap option is mutually exclusive with level option. Both
cannot be enabled at the same time.
- Invert option: tap option operates on the negative trigger of the
sensor, which are converted to positive trigger by the invert option.
Hence, the controller will see the sensor positive for 1 frame when
the underlying sensor state turns negative.
- Positive pulse option: tap option adds a negative trigger after each
repeated positive pulse, unless the frequency option is 0, in which case
positive pulse are generated on every frame as before, as long as the
underlying sensor state is positive.
- Negative pulse option: this option is not compatible with tap option
and is ignored when tap option is enabled.
Notes:
- Keyboard "All keys" is handled specially when tap option is set:
There will be one pair of positive/negative trigger for each new
key press, regardless on how many keys are already pressed and there
is no trigger when keys are released, regardless if keys are still
pressed.
In case two keys are pressed in succesive frames, there will
be 2 positive triggers and 1 negative trigger in the following frame.
Use dynamic linked list to handle scenegraph rather than dumb scan
of the whole tree. The performance improvement depends on the fraction
of moving objects. If most objects are static, the speed up is
considerable. The following table compares the time spent on
scenegraph before and after this commit on a scene with 10000 objects
in various configuratons:
Scenegraph time (ms) Before After
(includes culling)
All objects static, 8.8 1.7
all visible but small fraction
in the view frustrum
All objects static, 7,5 0.01
all invisible.
All objects moving, 14.1 8.4
all visible but small fraction
in the view frustrum
This tables shows that static and invisible objects take no CPU at all
for scenegraph and culling. In the general case, this commit will
speed up the scenegraph between 2x and 5x. Compared to 2.48a, it should
be between 4x and 10x faster. Further speed up is possible by making
the scenegraph cache-friendly.
Next round of performance improvement will be on the rasterizer: use
the same dynamic linked list technique for the mesh slots.
patch from Alex Fraser (z0r)
eg.
- vec.xyz = vec.zyx
- vec.xy = vec.zw
- vec.xxy = vec.wzz
- vec.yzyz = vec.yxyx
See http://en.wikipedia.org/wiki/Swizzling_(computer_graphics)
made some minor modifications to this patch.
tested access times and adding 336 attributes to vectors doesn't make a noticeable differences to speed of existing axis attributes (x,y,z,w) - thanks to python dict lookups.
- when the attribute check function failed it didnt set an error raising a SystemError instead
- Rasterizer.getMaterialMode would never return KX_BLENDER_MULTITEX_MATERIAL
- PropertySensor value attribute checking function was always returning a fail.
- Vertex Self Shadow python script didnt update for meshes with modifiers.
- Added support for any number of attributes, this means packages are supported automatically.
so as well as "myModule.myFunc" you can do "myPackage.myModule.myFunc", nested packages work too.
- pass the controller to the python function as an argument for functions that take 1 arg, this check is only done at startup so it wont slow things down.
added support for
- Vast performance increase when removing scene containing large number of
objects: the sensor/controller map was updated for each deleted object,
causing massive slow down when the number of objects was large (O(n^2)).
- Use reference when scanning the sensor map => avoid useless copy.
- Remove dynamically the object bounding box from the DBVT when the object
is invisible => faster culling.
This function sets the maximum number of logic frame executed per render frame.
Valid values: 1..5
This function is useful to control the amount of processing consumed by logic.
By default, up to 5 logic frames can be executed per render frame. This is fine
as long as the time spent on logic is negligible compared to the render time.
If it's not the case, the default value will drag the performance of the game
down by executing unnecessary logic frames that take up most of the CPU time.
You can avoid that by lowering the value with this function.
The drawback is less precision in the logic system to physics and I/O activity.
Note that it does not affect the physics system: physics will still run
at full frame rate (actually up to 5 times the ticrate).
You can further control the render frame rate with GameLogic.setLogicTicRate().
sys.path is the search path for python modules. This is useful so people making games can put all their scripts in a folder and be sure they will always load into the BGE.
for each blend file a scripts directory is added to the path
/home/me/foo.blend
will look for modules in...
/home/me/scripts/*.py
It could also default to look for modules in the same directory as the blend file but I think this is messy.
Added a note in the tooltip about //scripts so its not such a hidden feature.
This works by storing the original sys.path, then adding the paths for the blendfile and all its libs,
when a new blendfile is loaded, the original sys.path is restored before adding the blendfiles paths again so the sys.path wont get junk in it.
One problem with this - when using linked libs the module names must be unique else it will load the wrong module for one of the controllers.
also fixed 2 bugs
- sys.path in the blenderplayer was growing by 1 for every file load in blenderplayer
- the relative path (gp_GamePythonPath), wasnt being set when loading files in the blenderlayer (as I wrongly said in the last commit).