This commit extends the technique of dynamic linked list to the logic
system to eliminate as much as possible temporaries, map lookup or
full scan. The logic engine is now free of memory allocation, which is
an important stability factor.
The overhead of the logic system is reduced by a factor between 3 and 6
depending on the logic setup. This is the speed-up you can expect on
a logic setup using simple bricks. Heavy bricks like python controllers
and ray sensors will still take about the same time to execute so the
speed up will be less important.
The core of the logic engine has been much reworked but the functionality
is still the same except for one thing: the priority system on the
execution of controllers. The exact same remark applies to actuators but
I'll explain for controllers only:
Previously, it was possible, with the "executePriority" attribute to set
a controller to run before any other controllers in the game. Other than
that, the sequential execution of controllers, as defined in Blender was
guaranteed by default.
With the new system, the sequential execution of controllers is still
guaranteed but only within the controllers of one object. the user can
no longer set a controller to run before any other controllers in the
game. The "executePriority" attribute controls the execution of controllers
within one object. The priority is a small number starting from 0 for the
first controller and incrementing for each controller.
If this missing feature is a must, a special method can be implemented
to set a controller to run before all other controllers.
Other improvements:
- Systematic use of reference in parameter passing to avoid unnecessary data copy
- Use pre increment in iterator instead of post increment to avoid temporary allocation
- Use const char* instead of STR_String whenever possible to avoid temporary allocation
- Fix reference counting bugs (memory leak)
- Fix a crash in certain cases of state switching and object deletion
- Minor speed up in property sensor
- Removal of objects during the game is a lot faster
This commit extend the technique of dynamic linked list to the mesh
slots so as to eliminate dumb scan or map lookup. It provides massive
performance improvement in the culling and in the rasterizer when
the majority of objects are static.
Other improvements:
- Compute the opengl matrix only for objects that are visible.
- Simplify hash function for GEN_HasedPtr
- Scan light list instead of general object list to render shadows
- Remove redundant opengl calls to set specularity, shinyness and diffuse
between each mesh slots.
- Cache GPU material to avoid frequent call to GPU_material_from_blender
- Only set once the fixed elements of mesh slot
- Use more inline function
The following table shows the performance increase between 2.48, 1st round
and this round of improvement. The test was done with a scene containing
40000 objects, of which 1000 are in the view frustrum approximately. The
object are simple textured cube to make sure the GPU is not the bottleneck.
As some of the rasterizer processing time has moved under culling, I present
the sum of scenegraph(includes culling)+rasterizer time
Scenegraph+rasterizer(ms) 2.48 1st round 3rd round
All objects static, 323.0 86.0 7.2
all visible, 1000 in
the view frustrum
All objects static, 219.0 49.7 N/A(*)
all invisible.
All objects moving, 323.0 105.6 34.7
all visible, 1000 in
the view frustrum
Scene destruction 40min 40min 4s
(*) : this time is not representative because the frame rate was at 60fps.
In that case, the GPU holds down the GE by frame sync. By design, the
overhead of the rasterizer is 0 when the the objects are invisible.
This table shows a global speed up between 9x and 45x compared to 2.48a
for scenegraph, culling and rasterizer overhead. The speed up goes much
higher when objects are invisible.
An additional 2-4x speed up is possible in the scenegraph by upgrading
the Moto library to use Eigen2 BLAS library instead of C++ classes but
the scenegraph is already so fast that it is not a priority right now.
Next speed up in logic: many things to do there...
Use dynamic linked list to handle scenegraph rather than dumb scan
of the whole tree. The performance improvement depends on the fraction
of moving objects. If most objects are static, the speed up is
considerable. The following table compares the time spent on
scenegraph before and after this commit on a scene with 10000 objects
in various configuratons:
Scenegraph time (ms) Before After
(includes culling)
All objects static, 8.8 1.7
all visible but small fraction
in the view frustrum
All objects static, 7,5 0.01
all invisible.
All objects moving, 14.1 8.4
all visible but small fraction
in the view frustrum
This tables shows that static and invisible objects take no CPU at all
for scenegraph and culling. In the general case, this commit will
speed up the scenegraph between 2x and 5x. Compared to 2.48a, it should
be between 4x and 10x faster. Further speed up is possible by making
the scenegraph cache-friendly.
Next round of performance improvement will be on the rasterizer: use
the same dynamic linked list technique for the mesh slots.
- Vast performance increase when removing scene containing large number of
objects: the sensor/controller map was updated for each deleted object,
causing massive slow down when the number of objects was large (O(n^2)).
- Use reference when scanning the sensor map => avoid useless copy.
- Remove dynamically the object bounding box from the DBVT when the object
is invisible => faster culling.
- renamed generic attribute "isValid" to "invalid" since BL_Shader already uses isValid.
- Moved deprecation warnings from CValue
- removed unused KX_Scene::SetProjectionMatrix and KX_Scene::GetViewMatrix
- Added KX_Scene attributes "lights", "cameras", "objects_inactive", to allow access to objects in unseen layers (before the AddObject actuator adds them)
- KX_Camera deprecated cam.enableViewport(bool) for cam.isViewport which can be read as well.
CListValue fixes
- Disable changing CValueLists that the BGE uses internally (scene.objects.append(1) would crash when drawing)
- val=clist+list would modify clist in place, now return a new value.
- clist.append([....]), was working like extend.
- clist.append(val) didnt work for most CValue types like KX_GameObjects.
Other changes
- "isValid" was always returning True.
- Set all errors for invalid proxy access to PyExc_SystemError (was using a mix of error types)
- Added PyObjectPlus::InvalidateProxy() to manually invalidate, though if python ever gains access again, it will make a new valid proxy. This is so removing an object from a scene can invalidate the object even if its stored elsewhere in a CValueList for eg.
Realtime modifiers applied on mesh objects will be supported in
the game engine with the following limitations:
- Only real time modifiers are supported (basically all of them!)
- Virtual modifiers resulting from parenting are not supported:
armature, curve, lattice. You can still use these modifiers
(armature is really not recommended) but in non parent mode.
The BGE has it's own parenting capability for armature.
- Modifiers are computed on the host (using blender modifier
stack).
- Modifiers are statically evaluated: any possible time dependency
in the modifiers is not supported (don't know enough about
modifiers to be more specific).
- Modifiers are reevaluated if the underlying mesh is deformed
due to shape action or armature action. Beware that this is
very CPU intensive; modifiers should really be used for static
objects only.
- Physics is still based on the original mesh: if you have a
mirror modifier, the physic shape will be limited to one half
of the resulting object. Therefore, the modifiers should
preferably be used on graphic objects.
- Scripts have no access to the modified mesh.
- Modifiers that are based on objects interaction (boolean,..)
will not be dependent on the objects position in the GE.
What you see in the 3D view is what you get in the GE regardless
on the object position, velocity, etc.
Besides that, the feature is compatible with all the BGE features
that affect meshes: armature action, shape action, relace mesh,
VideoTexture, add object, dupligroup.
Known problems:
- This feature is a bit hacky: the BGE uses the derived mesh draw
functions to display the object. This drawing method is a
bit slow and is not 100% compatible with the BGE. There may
be some problems in multi-texture mode: the multi-texture
coordinates are not sent to the GPU.
Texface and GLSL on the other hand should be fully supported.
- Culling is still based on the extend of the original mesh.
If you have a modifer that extends the size of the mesh,
the object may disappear while still in the view frustrum.
- Derived mesh is not shared between replicas.
The derived mesh is allocated and computed for each object
with modifiers, regardless if they are static replicas.
- Display list are not created on objects with modifiers.
I should be able to fix the above problems before release.
However, the feature is already useful for game development.
Once you are ready to release the game, you can apply the modifiers
to get back display list support and mesh sharing capability.
MSVC, scons, Cmake, makefile updated.
Enjoy
/benoit
Separate getting a normal attribute and getting __dict__, was having to do too a check for __dict__ on each class (multiple times per getattro call from python) when its not used that often.
- initialize pythons sys.argv in the blenderplayer
- ignore all arguments after a single " - " in the blenderplayer (like in blender), so args can be passed to the game.
- add a utility function PyOrientationTo() - to take a Py euler, quat or 3x3 matrix and convert into a C++ MT_Matrix3x3.
- add utility function ConvertPythonToMesh to get a RAS_MeshObject from a KX_MeshProxy or a name.
- Added error prefix arguments to ConvertPythonToGameObject, ConvertPythonToMesh and PyOrientationTo so the error messages can include what function they came from.
- deprecated brick.getOwner() for the "owner" attribute.
- More verbose error messages.
- BL_Shader wasnt setting error messages on some errors
- FilterNormal depth attribute was checking for float which is bad because scripts often expect ints assigned to float attributes.
- Added a check to PyVecTo for a tuple rather then always using a generic python sequence. On my system this is over 2x faster with an optmized build.
- comments to PyObjectPlus.h
- remove unused/commented junk.
- renamed PyDestructor to py_base_dealloc for consistency
- all the PyTypeObject's were still using the sizeof() their class, can use sizeof(PyObjectPlus_Proxy) now which is smaller too.
This changes how the BGE classes and Python work together, which hasnt changed since blender went opensource.
The main difference is PyObjectPlus - the base class for most game engine classes, no longer inherit from PyObject, and cannot be cast to a PyObject.
This has the advantage that the BGE does not have to keep 2 reference counts valid for C++ and Python.
Previously C++ classes would never be freed while python held a reference, however this reference could be problematic eg: a GameObject that isnt in a scene anymore should not be used by python, doing so could even crash blender in some cases.
Instead PyObjectPlus has a member "PyObject *m_proxy" which is lazily initialized when python needs it. m_proxy reference counts are managed by python, though it should never be freed while the C++ class exists since it holds a reference to avoid making and freeing it all the time.
When the C++ class is free'd it sets the m_proxy reference to NULL, If python accesses this variable it will raise a RuntimeError, (check the isValid attribute to see if its valid without raising an error).
- This replaces the m_zombie bool and IsZombie() tests added recently.
In python return values that used to be..
return value->AddRef();
Are now
return value->GetProxy();
or...
return value->NewProxy(true); // true means python owns this C++ value which will be deleted when the PyObject is freed
Other small changes...
- KX_Camera and KX_Light didnt have get/setitem access in their PyType definition.
- CList.from_id() error checking for a long was checking for -1 against an unsigned value (own fault)
- CValue::SpecialRelease was incrementing an int for no reason.
- renamed m_attrlist to m_attr_dict since its a PyDict type.
- removed custom getattro/setattro functions for KX_Scene and KX_GameObject, use py_base_getattro, py_base_setattro for all subclasses of PyObjectPlus.
- lowercase windows.h in VideoBase.cpp for cross compiling.
There is a problem importing 3ds files where I cant find a way to check if the transforms are applied to the vertex locations or not.
Since 2.44 I made the importer assume they were not since you can manually remove transformations, but not reverse.
Nevertheless most 3ds files have the matrix applied, better not give a bad import by default.
Did some research and other 3ds importers (lib3ds for eg), have the same problem and just assume the transformations applied.
3dsMax imports both correctly so there must be a way to tell but I could not link it to the 3ds version or other mesh options.
Added an option to workaround this problem in rare cases where its needed.
- KX_GameObject.cpp & KX_Scene.cpp, clear the dict before removing the reference in case there is a circular reference.
Added occlusion culling capability in the BGE.
More info: http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.49/Game_Engine#BGE_Scenegraph_improvement
MSVC, scons, cmake, Makefile updated.
Other minor performance improvements:
- The rasterizer was computing the openGL model matrix of the objects too many times
- DBVT view frustrum culling was not properly culling behind the near plane:
Large objects behind the camera were sent to the GPU
- Remove all references to mesh split/join feature as it is not yet functional
- setting the scene attributes would always add to the scenes custom dictionary.
- new CListValue method from_id(id)
so you can store a Game Objects id and use it to get the game object back.
ob_id = id(gameOb)
...
gameOb = scene.objects.from_id(ob_id)
This is useful because names are not always unique.
- Only try and remove light objects from the light list.
- Only loop over mesh verts once when getting the bounding box
- dont return None from python attribute localInertia when theres no physics objects. better return a vector still.
- add names to send message PyArg_ParseTuple functions.
This commit contains a number of performance improvements for the
BGE in the Scenegraph (parent relation between objects in the
scene) and view frustrum culling.
The scenegraph improvement consists in avoiding position update
if the object has not moved since last update and the removal
of redundant updates and synchronization with the physics engine.
The view frustrum culling improvement consists in using the DBVT
broadphase facility of Bullet to build a tree of graphical objects
in the scene. The elements of the tree are Aabb boxes (Aligned
Axis Bounding Boxes) enclosing the objects. This provides good
precision in closed and opened scenes. This new culling system
is enabled by default but just in case, it can be disabled with
a button in the World settings. There is no do_version in this
commit but it will be added before the 2.49 release. For now you
must manually enable the DBVT culling option in World settings
when you open an old file.
The above improvements speed up scenegraph and culling up to 5x.
However, this performance improvement is only visible when
you have hundreds or thousands of objects.
The main interest of the DBVT tree is to allow easy occlusion
culling and automatic LOD system. This will be the object of further
improvements.
- Make BGE's ListValue types convert to python lists for printing since the CValue GetText() function didnt work well- printing lists as [,,,,,] for scene objects and mesh materials for eg.
- Check attributes are descriptor types before casting.
- py_getattr_dict use the Type dict rather then Method and Attribute array.
Use each types dictionary to store attributes PyAttributeDef's so it uses pythons hash lookup (which it was already doing for methods) rather then doing a string lookup on the array each time.
This also means attributes can be found in the type without having to do a dir() on the instance.
- Initialize python types with PyType_Ready, which adds methods to the type dictionary.
- use Pythons get/setattro (uses a python string for the attribute rather then char*). Using basic C strings seems nice but internally python converts them to python strings and discards them for most functions that accept char arrays.
- Method lookups use the PyTypes dictionary (should be faster then Py_FindMethod)
- Renamed __getattr -> py_base_getattro, _getattr -> py_getattro, __repr -> py_base_repr, py_delattro, py_getattro_self etc.
From here is possible to put all the parent classes methods into each python types dictionary to avoid nested lookups (api has 4 levels of lookups in some places), tested this but its not ready yet.
Simple tests for getting a method within a loop show this to be between 0.5 and 3.2x faster then using Py_FindMethod()
Added the method into the PyType so python knows about the methods (its supposed to work this way).
This means in the future the api can use PyType_Ready() to store the methods in the types dictionary.
Python3 removes Py_FindMethod and we should not be using it anyway since its not that efficient.
- Bugfix for running dir() on all BGE python objects. was not getting the immediate methods and attributes for each class.
- Use attributes for KX_Scene (so they are included with dir())
- Override __dict__ attributes for KX_Scene and KX_GameObject so custom properties are included with a dir()
Python dir(ob) for game types now includes attributes names,
* Use "__dict__" rather then "__methods__" attribute to be Python 3.0 compatible
* Added _getattr_dict() for getting the method and attribute names from a PyObject, rather then building it in the macro.
* Added place holder *::Attribute array, needed for the _getattr_up macro.
* fixed segfaults in CListValue.index(val) and CListValue.count(val) when the pyTypes could not be converted into a CValue.
* added scene.objects to replace scene.getObjectList()
* added function names to PyArg_ParseTuple() so errors will include the function names
* removed cases of PyArg_ParseTuple(args,"O",&pyobj) where METH_O ensures a single argument.
* Made PyObjectFrom use ugly python api rather then Py_BuildValue(), approx %40 speedup for functions that return Python vector and matrix types like ob.orientation.
Use 'const char *' rather then the C++ 'STR_String' type for the attribute identifier of python attributes.
Each attribute and method access from python was allocating and freeing the string.
A simple test with getting an attribute a loop shows this speeds up attribute lookups a bit over 2x.
- Reset hit object pointer at end of frame of touch sensor to avoid returning invalid pointer to getHitObject().
- Clear all references in KX_TouchSensor::m_colliders when the sensor is disabled to avoid loose references.
- Test GetSGNode() systematically for all KX_GameObject functions that can be called from python in case a python controller keeps a reference in GameLogic (bad practice anyway).
add -nojoystick commandline option: it takes 5 seconds everytime to start the game engine, while there IS no joystick.
In other words: blender -noaudio -nojoystick improves workflow turnaround times for P - ESC from 7 seconds to 1 second!
Improved Bullet soft body advanced options, still work-in-progress. Make sure to create game Bullet soft bodies from scratch, it is not compatible with last weeks builds.
the features that are needed to run the game. Compile tested with
scons, make, but not cmake, that seems to have an issue not related
to these changes. The changes include:
* GLSL support in the viewport and game engine, enable in the game
menu in textured draw mode.
* Synced and merged part of the duplicated blender and gameengine/
gameplayer drawing code.
* Further refactoring of game engine drawing code, especially mesh
storage changed a lot.
* Optimizations in game engine armatures to avoid recomputations.
* A python function to get the framerate estimate in game.
* An option take object color into account in materials.
* An option to restrict shadow casters to a lamp's layers.
* Increase from 10 to 18 texture slots for materials, lamps, word.
An extra texture slot shows up once the last slot is used.
* Memory limit for undo, not enabled by default yet because it
needs the .B.blend to be changed.
* Multiple undo for image painting.
* An offset for dupligroups, so not all objects in a group have to
be at the origin.
The root cause of this bug is the fact that Bullet shapes
are shared between duplicated game objects. As the physics
object scale is stored in the shape, all duplicas must
have the same scale otherwise the physics representation
is incorrect.
This fix introduces a mechanism to duplicate shapes at
runtime so that Bullet shapes are not shared anymore.
The drawback is an increased memory consuption.
A reference count mechanism will be introduced in a
later revision to keep Bullet shape shared between
duplicas that have the same scale.
=======================================
Alpha blending + sorting was revised, to fix bugs and get it
to work more predictable.
* A new per texture face "Sort" setting defines if the face
is alpha sorted or not, instead of abusing the "ZTransp"
setting as it did before.
* Existing files are converted to hopefully match the old
behavior as much as possible with a version patch.
* On new meshes the Sort flag is disabled by the default, to
avoid unexpected and hard to find slowdowns.
* Alpha sorting for faces was incredibly slow. Sorting faces
in a mesh with 600 faces lowered the framerate from 200 to
70 fps in my test.. the sorting there case goes about 15x
faster now, but it is still advised to use Clip Alpha if
possible instead of regular Alpha.
* There still various limitations in the alpha sorting code,
I've added some comments to the code about this.
Some docs at the bottom of the page:
http://www.blender.org/development/current-projects/changes-since-246/realtime-glsl-materials/
Merged some fixes from the apricot branch, most important
change is that tangents are now exactly the same as the rest
of Blender, instead of being computed in the game engine with a
different algorithm.
Also, the subversion was bumped to 1.