3D Optomizing Model Rendering

Kentae

Member
Hi :)

I'm trying to optomize the way I render models that are reused a bunch of times, like trees, rocks, bushes, and so on. In this example I use the trees.

Usually I would do it this way.
CREATE:
Code:
tree_buffer = vertex_create_buffer();
vertex_begin( tree_buffer, vertex_format );

tree_gml( tree_buffer );

vertex_end( tree_buffer );
vertex_freeze( tree_buffer );

tree_tex = background_get_texture( tex_tree );



leaves_buffer = vertex_create_buffer();
vertex_begin( leaves_buffer, vertex_format );

leaves_gml( leaves_buffer );

vertex_end( leaves_buffer );
vertex_freeze( leaves_buffer );

leaves_tex = background_get_texture( tex_leaves );
DRAW:
Code:
with ( obj_tree )
    {
    var wmat = matrix_get( matrix_world );
    
    var mat = matrix_build( x, y, z - 5, 0, 0, zr, 10, 10, 10 );
    matrix_set( matrix_world, mat );
    
    vertex_submit( other.tree_buffer, pr_trianglelist, other.tree_tex );
    d3d_set_culling( false );
    draw_set_alpha_test( true );
    draw_set_alpha_test_ref_value( 128 );
    vertex_submit( other.leaves_buffer, pr_trianglelist, other.leaves_tex );
    draw_set_alpha_test( false );
    d3d_set_culling( true );
    
    matrix_set( matrix_world, wmat );
    }
This gives me pretty decent performance but I feel like there would be a better way. This way I have a ***kload of vertex batches and texture swaps.

So I tried this.

Same create event, but this draw event:
Code:
with ( obj_tree )
    {
    var wmat = matrix_get( matrix_world );
    
    var mat = matrix_build( x, y, z - 5, 0, 0, zr, 10, 10, 10 );
    matrix_set( matrix_world, mat );
    
    vertex_submit( other.tree_buffer, pr_trianglelist, other.tree_tex );
    
    matrix_set( matrix_world, wmat );
    }
 
    
d3d_set_culling( false );
draw_set_alpha_test( true );
draw_set_alpha_test_ref_value( 128 ); 
with ( obj_tree )
    {
    var wmat = matrix_get( matrix_world );
    
    var mat = matrix_build( x, y, z - 5, 0, 0, zr, 10, 10, 10 );
    matrix_set( matrix_world, mat );
    
    vertex_submit( other.leaves_buffer, pr_trianglelist, other.leaves_tex );
    
    matrix_set( matrix_world, wmat );
    }
draw_set_alpha_test( false );
d3d_set_culling( true );
This reduces the texture swaps by a huge lot but for some reason also the fps. in other words I get worse perfomance from this.

Then I figured I'd try to get all the tree-models into the same vertex_submit using this create event:
Code:
// !!! Tree TEST !!!
var wmat = matrix_get( matrix_world );

tree_buffer = vertex_create_buffer();
vertex_begin( tree_buffer, vertex_format );


for ( i = 0; i < 10; i++; )
    {
    for ( j = 0; j < 10; j++; )
        {
        var xx = 100 + ( i * 40 );
        var yy = 100 + ( j * 40 );
        var zz = terrain_get_z( xx, yy );
        
        var mat = matrix_build( xx, yy, zz, 0, 0, 0, 10, 10, 10 );
        
        matrix_set( matrix_world, mat );
        tree_gml( tree_buffer );
        matrix_set( matrix_world, wmat ); // Added this to see if it would fix the issue of all the trees being in the same place.
        }
    }


vertex_end( tree_buffer );
vertex_freeze( tree_buffer );
matrix_set( matrix_world, wmat );

tree_tex = background_get_texture( tex_tree );
And a simple draw event:
Code:
vertex_submit( other.tree_buffer, pr_trianglelist, other.tree_tex );
This gives me exellent performance but all the trees are stuck at 0,0,0 coordinates.
Or maybe there is just one tree being rendered? i'm not sure about this.

So I assume that the matrix functions don't do much in the create event, or am I doing something wrong?

Btw, In the tree_gml and leaves_gml scripts there are just bunch of lines like this:
vertex_position_3d
vertex_normal
vertex_colour
vertex_texcoord
And I pass in the vertex_buffer it's supposed to be bound to.
This works fine though :)
 

Binsk

Member
Matrices are passed into the gpu when a buffer is submitted to mutate vertices in real time. They do absolutely nothing when defining the model itself.

Your models are rendered at 0,0,0 because it is only issuing the last matrix set before rendering. To change this you'd hove to modify each vertex manually when defining it. This is moving the processing from GPU to CPU, however, and since GPUs are specifically created for that kind of math I doubt it would be faster.
 

Kentae

Member
Hmm okay, Thanks for the reply :)

Do you have any suggestions as to how I can optimize is further? Thus far the first method gives me the best performance.
 
M

Multimagyar

Guest
I can imagine separate draw can be faster in some cases, not quite sure why would it be faster in this case, maybe there is some built in stuff to reject vertex structures if they are not visible. I don't think there is, merely a guess.

for memory concerns if you are going with a separate tree method it's advised to only load the model into the memory once and not per object. But it's also convenient to have LoDs (level of detail) along them. as you go further away from a model it's harder to tell how many vertices or polygons a model is made out of. So with a point distance from the camera to a model you could get a scale that tells you if you want to draw a more detailed or less detailed version of a model. Drawing less triangles can save you a few ms more so if the models appear in high quantity.

d3d_set_hidden is useful to reject faces that otherwise not visible anyway winning on the GPU a few extra bits.

You can find a cheap way to determine if a model is in front or behind the camera and not call the draw event of that object at all. One way for instance is getting a point on the model (like an estimate middle of the model based on the vertex points loaded in), make a normal vector to the camera, get the forward vector from the camera, make a dot product with these two vectors, and if the result is smaller than 0 then the point you got is in front of the camera therefore there is a chance that you see it. Be wary the brackets mention a static middle point. if the model is quite a bit long a more dynamic point, like in view perspective the deepest but closest to the middle of the screen point on the model's bounding box might be a better choice as if that edge point is behind or on the plane with the camera you certainly won't see it while a middle point won't guarantee that you won't see a large boat if you stand next to the middle point of it.

If you store world models in a separate objects and you know they are static you can also just generate a world matrix for it with matrix_build once and store it and reuse it. this way you can save some matrix building time.
 

lolslayer

Member
Is that like gpu instancing?? Also, how'd you do shadows for point lights? Is it a cube map or something?
Well yeah GPU instancing would be cool, we could finally make some decent foliage systems with it :)

There are multiple ways of doing shadows for point lights, cube maps being the most solid one, but GM doesn't support the geometry shader which is necessary for strong optimizations, and GM also doesn't support cubemaps for shaders, so you have to split up all 6 faces of the cubemap and send them all individually to the GPU for the use in shaders, and then there's the problem that some weaker GPU's (like intel integrated ones) only support 6 textures at the same time for shaders, and GM already forces one in, being the default GM texture. But it has been done before and you can quite decently draw 6 point lights with shadows if you want.

A variant on cubemaps is dual paraboloid shadow mapping, which is essentially the same but then with 2 faces instead of the 6 representing the cubemap, it's less precise but performance wise it is great:
http://gamedevelop.eu/en/tutorials/dual-paraboloid-shadow-mapping.htm

I got point light shadows to work with 2 different systems. One was by dividing the map into mathematical volume formules, and for every pixel you'll check if no volume is blocking the path to the lighting. It works great but it only works with a very limited amount of shapes and lights, you can see it in action here:

Another system I made was combining 3D lighting with 2D shadows. I simplified the map to a 2D top down view and then did regular 2D shadow mapping for every light and I stored the results in Surfaces. Then I sent the Surfaces to the GPU and I check for every pixel if in the surface if the currently calculated light source isn't blocked by anything, the results can be seen here:

I'm currently thinking about a system where you simplify the world in a heightmap and then you can real-time raytrace through it, this could give you the power for easy real-time shadowing, but also for real-time reflections. Nothing to show yet, but this game uses 3D raycasting on heightmaps to calculate what to draw, this also explains why such an old game supported real-time shadows:

Of course, you can also bake the lighting, I for example made an isometric world system once where every surface did a single raytrace to see if it could see the light source to then store the lighting value for itself. The system can be seen here:

What I'm currently working on is getting pre-baked lightmaps to work like older games did, I found an easy program for lightmap baking, the only thing I need to do is write my own exporter for my own model format, write my own importer in GML for that model format and then combine the model with a shader that applies the lightmap with the diffuse texture colour and eventually with dynamic lights too.
 
Please stop me next time you ask for this stuff
I don't see a problem making long posts about gamemaker stuff on a gamemaker forum. About the heightmap stuff, in certain circumstances you can also try to use a horizon map... it is a lot less precise, but much cheaper because it wont involve any loops. The idea is you store the elevation angle of the horizon at each point... and with 4 channels, and dividing the texture into fourths, that can give you 16 different compass directions, interpolating between the nearest two can give you alright results. It woudl be difficult to do on a whole map, but for individual objects it can work.

I too wish we could get more functionality in particular with shaders... 3d textures? vertex texture sampling? geometry shaders? And so much more! What fun we could have!

Anyway, @Kentae, I really want to understand why in the heck you are getting better performance drawing trees individually. That seems really strange to me, unless you have a very small number of trees.
 
M

Multimagyar

Guest
Please stop me next time you ask for this stuff
I don't see problem with this post it's not stating anything degrading or against the rules like "Unity can do this and better because x"
I gonna quote this one because the other is a we bit longer depending on your system you still can kind of cheat. having a shader that runs per light with a deferred render solution than a 6 light solution. I still would personally pre-bake my lights but save it in the vertex buffer than the texture and just have colour interpolation (assuming the light is static)

Geometry shaders and 2D texture sampling in vertex event I don't really see happen any time soon. don't forget that GMS still considers opengl for embedded systems (which actually works more consistently than OpenGL 4.6 even between two computers) but it comes with the price that while it's implementable in DirectX, OpenGL ES has no idea what a geometry shader is. I see more chances that yoyogames add the ability to access the depth buffer than giving you Geometry shader till they finally realize that khronos wants people to sooner or later move over to Vulkan over any distribution of OpenGL (preferably sooner). Even then I feel like they won't feel the need until a platform literally refuses to run a project. I personally just want chained accessors without people thinking I want to use a grid like data structure.

I would assume it's 100 trees else the test would be invalid. Maybe they are trees with not many vertices? Or the GPU just pulls the structure faster because it's already in the memory and just have to render it?
 

Kentae

Member
I think maybe you have all sligthly misunderstood how I do things in my project...
The tree objects do not have their own draw event, all is handeled in a master render object.
The tree objects them selves are, more or less, there for me to have an easier time placing them in the testing face.
Their model and texture is loaded only once, by the master object.

I then submit the vertex buffer holding the tree model once for each tree (one more submit for the leaves though but I'm planning on putting these together later).

So the model and texture are loaded in the master object and all the trees are drawn from a single draw event, also in the master object.

I should probably have been more clear about this :p
I also use GMS 1.4 if anyone was wondering.

I can imagine separate draw can be faster in some cases, not quite sure why would it be faster in this case, maybe there is some built in stuff to reject vertex structures if they are not visible. I don't think there is, merely a guess.

for memory concerns if you are going with a separate tree method it's advised to only load the model into the memory once and not per object. But it's also convenient to have LoDs (level of detail) along them. as you go further away from a model it's harder to tell how many vertices or polygons a model is made out of. So with a point distance from the camera to a model you could get a scale that tells you if you want to draw a more detailed or less detailed version of a model. Drawing less triangles can save you a few ms more so if the models appear in high quantity.

d3d_set_hidden is useful to reject faces that otherwise not visible anyway winning on the GPU a few extra bits.

You can find a cheap way to determine if a model is in front or behind the camera and not call the draw event of that object at all. One way for instance is getting a point on the model (like an estimate middle of the model based on the vertex points loaded in), make a normal vector to the camera, get the forward vector from the camera, make a dot product with these two vectors, and if the result is smaller than 0 then the point you got is in front of the camera therefore there is a chance that you see it. Be wary the brackets mention a static middle point. if the model is quite a bit long a more dynamic point, like in view perspective the deepest but closest to the middle of the screen point on the model's bounding box might be a better choice as if that edge point is behind or on the plane with the camera you certainly won't see it while a middle point won't guarantee that you won't see a large boat if you stand next to the middle point of it.

If you store world models in a separate objects and you know they are static you can also just generate a world matrix for it with matrix_build once and store it and reuse it. this way you can save some matrix building time.
Yeah, I know about all that... planning on adding LoDs later when I create my own tree model.
i'm currently using one I found on blendswap for testing.

Anyway, @Kentae, I really want to understand why in the heck you are getting better performance drawing trees individually. That seems really strange to me, unless you have a very small number of trees.
I was actually able to render 400 trees this way without the game lagging... But that was right on the edge though :p
 
Top