blender

kannonier8/blender

Fork 0

forked from bartvdbraak/blender

Commit Graph

Author	SHA1	Message	Date
Benoit Bolsee	386122ada6	BGE performance, 4th round: logic This commit extends the technique of dynamic linked list to the logic system to eliminate as much as possible temporaries, map lookup or full scan. The logic engine is now free of memory allocation, which is an important stability factor. The overhead of the logic system is reduced by a factor between 3 and 6 depending on the logic setup. This is the speed-up you can expect on a logic setup using simple bricks. Heavy bricks like python controllers and ray sensors will still take about the same time to execute so the speed up will be less important. The core of the logic engine has been much reworked but the functionality is still the same except for one thing: the priority system on the execution of controllers. The exact same remark applies to actuators but I'll explain for controllers only: Previously, it was possible, with the "executePriority" attribute to set a controller to run before any other controllers in the game. Other than that, the sequential execution of controllers, as defined in Blender was guaranteed by default. With the new system, the sequential execution of controllers is still guaranteed but only within the controllers of one object. the user can no longer set a controller to run before any other controllers in the game. The "executePriority" attribute controls the execution of controllers within one object. The priority is a small number starting from 0 for the first controller and incrementing for each controller. If this missing feature is a must, a special method can be implemented to set a controller to run before all other controllers. Other improvements: - Systematic use of reference in parameter passing to avoid unnecessary data copy - Use pre increment in iterator instead of post increment to avoid temporary allocation - Use const char* instead of STR_String whenever possible to avoid temporary allocation - Fix reference counting bugs (memory leak) - Fix a crash in certain cases of state switching and object deletion - Minor speed up in property sensor - Removal of objects during the game is a lot faster	2009-05-10 20:53:58 +00:00
Benoit Bolsee	42557f90bd	BGE performance, 3rd round: culling and rasterizer. This commit extend the technique of dynamic linked list to the mesh slots so as to eliminate dumb scan or map lookup. It provides massive performance improvement in the culling and in the rasterizer when the majority of objects are static. Other improvements: - Compute the opengl matrix only for objects that are visible. - Simplify hash function for GEN_HasedPtr - Scan light list instead of general object list to render shadows - Remove redundant opengl calls to set specularity, shinyness and diffuse between each mesh slots. - Cache GPU material to avoid frequent call to GPU_material_from_blender - Only set once the fixed elements of mesh slot - Use more inline function The following table shows the performance increase between 2.48, 1st round and this round of improvement. The test was done with a scene containing 40000 objects, of which 1000 are in the view frustrum approximately. The object are simple textured cube to make sure the GPU is not the bottleneck. As some of the rasterizer processing time has moved under culling, I present the sum of scenegraph(includes culling)+rasterizer time Scenegraph+rasterizer(ms) 2.48 1st round 3rd round All objects static, 323.0 86.0 7.2 all visible, 1000 in the view frustrum All objects static, 219.0 49.7 N/A() all invisible. All objects moving, 323.0 105.6 34.7 all visible, 1000 in the view frustrum Scene destruction 40min 40min 4s () : this time is not representative because the frame rate was at 60fps. In that case, the GPU holds down the GE by frame sync. By design, the overhead of the rasterizer is 0 when the the objects are invisible. This table shows a global speed up between 9x and 45x compared to 2.48a for scenegraph, culling and rasterizer overhead. The speed up goes much higher when objects are invisible. An additional 2-4x speed up is possible in the scenegraph by upgrading the Moto library to use Eigen2 BLAS library instead of C++ classes but the scenegraph is already so fast that it is not a priority right now. Next speed up in logic: many things to do there...	2009-05-07 09:13:01 +00:00
Benoit Bolsee	3abb8e8e68	BGE performance: second round of scenegraph improvement. Use dynamic linked list to handle scenegraph rather than dumb scan of the whole tree. The performance improvement depends on the fraction of moving objects. If most objects are static, the speed up is considerable. The following table compares the time spent on scenegraph before and after this commit on a scene with 10000 objects in various configuratons: Scenegraph time (ms) Before After (includes culling) All objects static, 8.8 1.7 all visible but small fraction in the view frustrum All objects static, 7,5 0.01 all invisible. All objects moving, 14.1 8.4 all visible but small fraction in the view frustrum This tables shows that static and invisible objects take no CPU at all for scenegraph and culling. In the general case, this commit will speed up the scenegraph between 2x and 5x. Compared to 2.48a, it should be between 4x and 10x faster. Further speed up is possible by making the scenegraph cache-friendly. Next round of performance improvement will be on the rasterizer: use the same dynamic linked list technique for the mesh slots.	2009-05-03 22:29:00 +00:00

Author

SHA1

Message

Date

Benoit Bolsee

386122ada6

BGE performance, 4th round: logic

This commit extends the technique of dynamic linked list to the logic
system to eliminate as much as possible temporaries, map lookup or 
full scan. The logic engine is now free of memory allocation, which is
an important stability factor. 

The overhead of the logic system is reduced by a factor between 3 and 6
depending on the logic setup. This is the speed-up you can expect on 
a logic setup using simple bricks. Heavy bricks like python controllers
and ray sensors will still take about the same time to execute so the
speed up will be less important.

The core of the logic engine has been much reworked but the functionality
is still the same except for one thing: the priority system on the 
execution of controllers. The exact same remark applies to actuators but
I'll explain for controllers only:

Previously, it was possible, with the "executePriority" attribute to set
a controller to run before any other controllers in the game. Other than
that, the sequential execution of controllers, as defined in Blender was
guaranteed by default.

With the new system, the sequential execution of controllers is still 
guaranteed but only within the controllers of one object. the user can
no longer set a controller to run before any other controllers in the
game. The "executePriority" attribute controls the execution of controllers
within one object. The priority is a small number starting from 0 for the
first controller and incrementing for each controller.

If this missing feature is a must, a special method can be implemented
to set a controller to run before all other controllers.

Other improvements:
- Systematic use of reference in parameter passing to avoid unnecessary data copy
- Use pre increment in iterator instead of post increment to avoid temporary allocation
- Use const char* instead of STR_String whenever possible to avoid temporary allocation
- Fix reference counting bugs (memory leak)
- Fix a crash in certain cases of state switching and object deletion
- Minor speed up in property sensor
- Removal of objects during the game is a lot faster

2009-05-10 20:53:58 +00:00

Benoit Bolsee

42557f90bd

BGE performance, 3rd round: culling and rasterizer.

This commit extend the technique of dynamic linked list to the mesh
slots so as to eliminate dumb scan or map lookup. It provides massive 
performance improvement in the culling and in the rasterizer when 
the majority of objects are static.

Other improvements:
- Compute the opengl matrix only for objects that are visible.
- Simplify hash function for GEN_HasedPtr
- Scan light list instead of general object list to render shadows
- Remove redundant opengl calls to set specularity, shinyness and diffuse
  between each mesh slots.
- Cache GPU material to avoid frequent call to GPU_material_from_blender
- Only set once the fixed elements of mesh slot
- Use more inline function

The following table shows the performance increase between 2.48, 1st round
and this round of improvement. The test was done with a scene containing 
40000 objects, of which 1000 are in the view frustrum approximately. The
object are simple textured cube to make sure the GPU is not the bottleneck.
As some of the rasterizer processing time has moved under culling, I present
the sum of scenegraph(includes culling)+rasterizer time

Scenegraph+rasterizer(ms)       2.48      1st round       3rd round

All objects static,            323.0           86.0             7.2
all visible, 1000 in 
the view frustrum

All objects static,            219.0           49.7             N/A(*)
all invisible.

All objects moving,            323.0          105.6            34.7
all visible, 1000 in 
the view frustrum

Scene destruction              40min          40min              4s

(*) : this time is not representative because the frame rate was at 60fps.
      In that case, the GPU holds down the GE by frame sync. By design, the
      overhead of the rasterizer is 0 when the the objects are invisible. 

This table shows a global speed up between 9x and 45x compared to 2.48a
for scenegraph, culling and rasterizer overhead. The speed up goes much
higher when objects are invisible.

An additional 2-4x speed up is possible in the scenegraph by upgrading
the Moto library to use Eigen2 BLAS library instead of C++ classes but
the scenegraph is already so fast that it is not a priority right now.

Next speed up in logic: many things to do there...

2009-05-07 09:13:01 +00:00

Benoit Bolsee

3abb8e8e68

BGE performance: second round of scenegraph improvement.

Use dynamic linked list to handle scenegraph rather than dumb scan
of the whole tree. The performance improvement depends on the fraction
of moving objects. If most objects are static, the speed up is 
considerable. The following table compares the time spent on 
scenegraph before and after this commit on a scene with 10000 objects
in various configuratons:

Scenegraph time (ms)              Before         After
(includes culling)

All objects static,               8.8            1.7  
all visible but small fraction          
in the view frustrum

All objects static,               7,5            0.01
all invisible.

All objects moving,               14.1           8.4
all visible but small fraction
in the view frustrum

This tables shows that static and invisible objects take no CPU at all
for scenegraph and culling. In the general case, this commit will 
speed up the scenegraph between 2x and 5x. Compared to 2.48a, it should
be between 4x and 10x faster. Further speed up is possible by making
the scenegraph cache-friendly.

Next round of performance improvement will be on the rasterizer: use
the same dynamic linked list technique for the mesh slots.

2009-05-03 22:29:00 +00:00

3 Commits