Is there a quicker method to remove part of a buffer?

Is there a quicker method to remove what's stored on a vertex buffer than this?
I'm hoping someone with experience editing buffers knows a way to correct me here.

I have a marketplace asset that generates many grass vertex on a layer, with some extra code to ensure it only generates on certain tiles.
The part that freezes my game for up to 3 seconds is when I remove a specific grass vertex from the buffer.

When I want to remove a grass vertex mid game:
- I create 2 identical non vertex buffers via buffer_create_from_vertex_buffer so I can edit them. Consider one a "main" the other a "throwaway".
- Then using buffer_get_size and buffer_peek , I locate the area in the buffer that's storing data for drawing the specific vertex I want to remove by finding a matching x y.
- Due to no such thing as "buffer_erase_part", I copy everything after the chunk I want to remove, to replace the part in the middle of the buffer I want to erase.
For example: the buffer looks like ABCDE. there's no function for removing C, so to achieve that I copy DE to replace C and turn the buffer into ABDEE.
(all done with buffer_copy copying from the "throwaway" buffer to the "main" buffer. Then I delete the "throwaway" buffer with buffer_delete).
- I trim the end of buffer with buffer_resize . Turning ABDEE into ABDE. Meaning I successfuly yet painfully removed the grass vertex aka C.
- I delete the current vertex buffer the shader is using with vertex_delete_buffer .
- Then create a new vertex buffer using my edited "main" buffer vertex_create_buffer_from_buffer .
- I erase the main buffer with buffer_delete.

I think the copy and resize functions are freezing the game momentarily, although it could be any part of this.
My previous method used only one buffer with buffer_peek and buffer_poke to copy paste byte by byte, then buffer_resize. This caused a much much longer freeze of up to 20 sec.
I tried having multiple instances with unique vertex buffers but this slowed my game down to 20fps, and worsened when I walked around the room.

I apolagise for the lack of screenshots, It's interwined with a lot more code I don't particularly want to share.
 
Last edited:

Nocturne

Friendly Tyrant
Forum Staff
Admin
Tbh, looking at what you're doing and the way GMS buffers work, I can't really think of anything that you could do which would be faster... I mean, Vertex Buffers in GM aren't really meant to be edited in this way. The idea is that they are either re-created each frame or "frozen" and never touched again. It might actually be quicker to simply recreate them if something changes rather than try to edit them as you are doing now! The only other option that I think you'd have would be to create a DLL that does all the buffer work for you, as that would probably be way faster. However the limitations on that would be Windows only, and you'd need to know how to write such a DLL in the first place!
 

Bart

WiseBart
That does look like the way it should be done. There's likely no quicker method to actually remove part of a buffer.
Though I think there are some other things that you could consider, next to Nocturne's suggestions.

Did you try profiling the code to see what it is that takes up the most time?
Depending on what that turns out to be you could try a couple of things.

I'd expect the most likely cause to be the calls to vertex_create_buffer_from_buffer. If not, some other things could be optimized/improved.

You can get rid of buffer_resize by keeping the buffer size fixed and then keep track of the actual number of bytes being used.
vertex_create_buffer_from_buffer_ext can then be used to create a vertex buffer out of part of the buffer.

Another thing is the buffer_copy. If that happens to be the bottleneck you could try splitting that single function call over multiple steps, i.e. instead of copying everything at once, copy the data in 4 steps, where you copy a fourth every time.
That is, of course, assuming that you know well in advance what's going to change.
Meanwhile, for drawing, you keep using the "old" vertex buffer.

One more thing that could work (not too sure, though!) is to simply fill that particular part of the buffer with zeroes using buffer_fill.
Those will then also end up in the vertex buffer and even be processed by the vertex shader but they'll never lead to visible pixels on screen, since the surface area of the triangles made up by those vertices is actually zero.
Another way would be to flag vertices as visible or invisible using an additional vertex attribute.

While these are just a couple of ideas that might not be that obvious I think they could help in fixing that "freezing" issue you're having.
 
First of all I really appreciate the replies guys, I was worried I'd have to abandon this use of vertex buffers. Also well thought upon responses too! :)
I managed to get it fixed with your ideas in mind.

I'm unsure why I always forget the profiler exists, but I profiled it as you suggested and it was indeed vertex_create_buffer_from_buffer,
along with buffer_peek and especially buffer_create_from_vertex_buffer.

1616859252418.png

It might actually be quicker to simply recreate them if something changes rather than try to edit them as you are doing now! The only other option that I think you'd have would be to create a DLL that does all the buffer work for you, as that would probably be way faster. However the limitations on that would be Windows only, and you'd need to know how to write such a DLL in the first place!
I'm not prepared to limit the game to windows but recreating buffers sounds like a really good idea! So I reworked my code today to only use buffer_create_from_vertex_buffer at the start, to make two non vertex copies, once, and never delete them. As a result, not having to use that function again, or the buffer_delete function has significantly sped up my game when running the buffer editing step event. I don't know what affect this has on memory, I really wanted to avoid that route but I suppose I'd be using that memory to edit buffers anyway - even if only momentarily making buffers and deleting them.

You can get rid of buffer_resize by keeping the buffer size fixed and then keep track of the actual number of bytes being used.
vertex_create_buffer_from_buffer_ext can then be used to create a vertex buffer out of part of the buffer.

Another thing is the buffer_copy. If that happens to be the bottleneck you could try splitting that single function call over multiple steps, i.e. instead of copying everything at once, copy the data in 4 steps, where you copy a fourth every time.
That is, of course, assuming that you know well in advance what's going to change.
Meanwhile, for drawing, you keep using the "old" vertex buffer.

One more thing that could work (not too sure, though!) is to simply fill that particular part of the buffer with zeroes using buffer_fill.
These are also really good things to consider, lots of options here actually, really good help. I thought of filling areas in the vertex buffer with zeroes before but I worried that looping through the buffer once with buffer_peek to check if a vertex grass exists at x y, then looping again with buffer_peek to find a zeroed (emptied) part would slow down things even more. However, looking at the profiler now, this might be a good long term solution if I list everything in the vertex_buffer as it would show in the room left to right. Although I'm unsure if I should do this with a ds_grid or not.


I know I can have almost infinite grass sprites in this vertex buffers without issues, if I use vertex_freeze that is.
So at the moment I've limited the game to only have one vertex grass sprite per tile but it'd be nice to have up to 8 per tile, sadly that means more loops, copying, resizing, but not anymore!
Going forward I think I have two similar options, although I don't know which one's better and quicker:

1. A fixed size vertex buffer, erasing parts of the buffer by buffer_fill with zeroes. Also ordering all grass sprites inside the buffer in order from left to right in the room to avoid having to loop though with buffer_peek, only peek once.

2. The same as above but using a ds_grid and 1 non vertex buffer, storing everything in the ds_grid so I can actually freeze my vertex buffer this time with vertex_freeze since I'd always be using the ds_grid. Also means I won't always have two non vertex buffers loaded for the purpose of editing with buffer_copy. (that bloody function needs two "different" buffers). This does mean it'd be making a new vertex buffer again from scratch though each time.

3. Same as option 1, keeping the 2 non vertex buffers, no vertex_freeze , but using a list for holding locations of "zeroed" (emptied) sections to avoid looping.


I just don't know what is considered overkill and therefore more taxing than before, or an actual improvement.

Anyway thanks for the great help guys! :D
 

Juju

Member
Thanks for posting your profiling data, that's very useful. I've got good news: you can probably turn this 12ms event into ~1.5ms.

The first thing to observe is that going from vertex buffer to standard buffer takes 2.2ms. There's no reason to perform this step; just keep your buffer around between frames! The size of buffers for vertex data is going to be on the order of kilobytes and it is a totally valid memory-speed tradeoff to store that buffer for the duration of a scene. You can't get away from having to recreate your vertex buffer when grass needs to update but, even without refactoring, you can immediately scrub 2.2ms off your event time by not discarding your buffer. The same goes for your "throwaway" buffer - no reason that shouldn't kept around as well.

Next thing is that your overall frame time is 12ms yet your heaviest function calls only add to roughly 5.4ms in total. This means there's ~6.4ms of unaccounted-for operations which, in this case, is going to be your search loop. This conclusion is further backed up by the fact that you have 10,533 buffer_peek() operations. The majority of the time you're using is nothing to do with copying buffer data around, it's just finding the part of the buffer you're trying to remove. This isn't going to be solved with a different buffer copying method so we need to take a different approach.

I'm going to assume that you're initially building the vertex buffer based on where grass is in your scene and that you know where the grass is before building the vertex buffer. Trying to deduce where grass is by analysing the vertex buffer itself is going a long way round to rediscover information you already know. If this isn't the case (for some reason?) then you'll need to pre-process your vertex buffer but the basic method I'm going to describe still applies.

The idea is that you want to know where grass is in your buffer without requiring an expensive search operation. You mentioned using a ds_grid but that's going to limit where you can place your grass - it'll have to be one piece of grass per tile, and that can be rather constraining (I had issues with this for The Swords Of Ditto). The easiest way to get around grid-based position checking is with a "dictionary lookup": this means using a ds_map to store grass locations. This process would be as follows:
  1. Create a ds_map and a standard buffer. You'll need to keep these around for the duration of gameplay
  2. Find a piece of grass. Write its position as a key to the ds_map (e.g. string(x) + "," + string(y)) and use the "tell" of the buffer as the value
  3. Write grass data into the buffer
  4. Repeat 2+3 until you have no more grass left to write
  5. Build a vertex buffer from the buffer
  6. Freeze the vertex buffer
You now have a ds_map, a buffer, and a vertex buffer. These three together are enough information to efficiently modify the vertex buffer. When it comes time to remove grass:
  1. Check against your ds_map to find the grass position in the standard buffer
  2. Use buffer_fill() to null out that data in the buffer
  3. Delete the old vertex buffer
  4. Rebuild your vertex buffer from the buffer
  5. If performance allows, one frame later freeze the vertex buffer (this distributes the freezing "load" over a couple frames to avoid lag spikes)
You can further optimise this by breaking down your buffer/vertex buffer into smaller pieces instead of covering an entire scene with the same vertex buffer. This'll reduce the time taken to regenerate the vertex buffer as the vertex buffer itself will be smaller per update. If you don't want to find the grass by position and instead want to find the grass by ID (instance ID or struct or whatever) then you can use an ID instead of a position.

Hope that helps! Good luck.

P.S. Since you're doing 5,267 calls of buffer_get_size(), here's another tip: you generally should not put function calls in the evaluator for for-loops. You're probably doing for(var i = 0; i < buffer_get_size(buffer); i++) {}. This will cause GM to re-evaluate buffer_get_size() every single iteration which, if the size of the buffer isn't changing, is unnecessary. Instead use var size = buffer_get_size(buffer); for(var i = 0; i < size; i++) {}

P.P.S. If you're doing a 3D game with a fixed camera perspective (e.g. a top-down game) then you should try to order your grass from front-to-back to avoid overdraw. This is opposite way we usually do things in GameMaker with depth but it helps the GPU do things more efficiently when there's a lot of stuff being drawn whilst z-testing is turned on.
 
Last edited:
Thanks for the response! I'll be honest, I'm amazed how well you deducted I was using this:
You're probably doing for(var i = 0; i < buffer_get_size(buffer); i++) {}. This will cause GM to re-evaluate buffer_get_size() every single iteration which, if the size of the buffer isn't changing, is unnecessary. Instead use var size = buffer_get_size(buffer); for(var i = 0; i < size; i++) {}
I'll try to resolve that today actually.

I noticed buffer_peek had such a ridiculously high Ms because I thought I was being clever making it check the entire buffer for primitives matching x y of not only the tile I clicked on but all 8 surrounding tiles too. So 9 times the work. This is because I make the grass tiles around the one I click adapt to the missing / new tile. Basically the "x16 autotile" function in gms tilsets. This task became easier to handle after implementing the improvements you mentioned.

Anyway you're spot on about building the vertex buffer based on where the grass tiles initially are. I managed to dig deep and learn a bit more yesterday from the very very scarce explanations online of what buffer data types vertex buffers use. Knowing this I know that two triangles are built from each grass sprite (with uv & color). Each x, y, z, u, v and a,b,g,r being a size of 4 bytes. So 4 bytes * 12 values (per triangle corner) * 6 (because we need 6 corners aka 2 triangles).

That means 288 bytes per sprite. It's probably meaningless to anyone reading that, but in short, I think I've worked out how many bytes to increment in the buffer to access data for a specific primitive (vertex grass sprite). So I no longer need a ds_map or ds_grid.

To make it work in my favour I made obj_vertexgrass build the vertex buffer in order from the top left of the room, this time building primitives for EVERY tile but with 0 alpha if there's no grass tile there. So I can later change that alpha value inside the buffer to make those primitives visible, and vice versa. It takes longer to enter the room because it's incrementing along the room instead of irandom_range() but I guess that's a drawback I need to put up with.

It works, but I'm having issues with it consistently hiding and showing primitives. If I right click, placing grass, it makes primitives visible there. Awesome.
However left click, which removes a grass tile, (and is meant to hide primitives there too) leaves plenty of primitives still visible, removing only 1 or 2. I think I messed up my maths somewhere and it's removing the primitives of the tile above, or to the left of the one I clicked.
So I included as much code for this below, incase someone manages to spot the issue before I do and incase anyone reading this wants to use the same process as me.

Aside from that this is what the profiler looks like now. Still having slight freezes after each left click (trying to hide primitives, and it still not hiding them). Not sure why :eek:.
The moment I left or right click in game (and therefore update the buffer and tilemap) All animations in game look like they're at 20fps. Minimizing and unminimizing the game from my taskbar fixes this back to normal though. I have no idea how to fix this. It feels like I'm overloading the graphics pipeline or something because I didn't intentionally code it to do that... I also assume obj_vertexgrass Ms can't be improved if I want it to keep it's functionality, but it's nice to see that my o_mouse code is now beneath it.

1617029809764.png

P.P.S. If you're doing a 3D game with a fixed camera perspective (e.g. a top-down game) then you should try to order your grass from front-to-back to avoid overdraw. This is opposite way we usually do things in GameMaker with depth but it helps the GPU do things more efficiently when there's a lot of stuff being drawn whilst z-testing is turned on.
You're correct here too - Fixed perspective, Top Down. That's interesting. I assume you mean drawn from the bottom right, to the top left of the room? I'll alter the code to do this after the hiding and showing of primitives works correctly..

Code:
(Still has issues. but Ill update this thread with a solution when I find it).

GML:
room_h = (room_height/16);
room_w = (room_width/16);

gridx = 0;
gridy = 0;
draw_done = 0;

while draw_done != 1 {

    m = tilemap_get(grass_id, gridx, gridy) & tile_index_mask;
    if m >= 1 and m <= 12 { make_empty = 0; } else { make_empty = 1; }

    if gridy == (room_h-1) and gridx == (room_w-1) { draw_done = 1; } else { buff_tile_amount += 1; }

    repeat (vgrass_per_tile) {
   
        /////////////////////////////////////////
        //// working out all primitive values ///
        //// not included because it isn't //////
        //// all exactly my code.  ///////////////
        /////////////////////////////////////////
        /// alpha = 1; is somewhere in here ////
        /////////////////////////////////////////

        if make_empty == 1 { alpha = 0}

        // triangle 1:
        vertex_position_3d(); // vec3 x,y,z
        vertex_texcoord(); // vec2
        vertex_texcoord(); // vec2
        vertex_texcoord(); // vec2
        vertex_texcoord(); // vec2
        vertex_colour(); // ABGR

        vertex_position_3d();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_colour();

        vertex_position_3d();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_colour();

        //triangle 2
        vertex_position_3d();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_colour();

        vertex_position_3d();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_colour();

        vertex_position_3d();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_texcoord();
        vertex_colour();
    }
    if gridx == (room_w-1) { gridx = 0; gridy += 1;}
    else { gridx += 1; }
   
    make_empty = 0;
}

GML:
var tile_num = ((gridx) + ((gridy-1)*(room_width/16)))+(room_width/16);
var at = (288 * vgrass_per_tile * tile_num);

GML:
var col_type = buffer_u32;
var t_dis = (4*12);
var rep = 1;
repeat (vgrass_per_tile) {

    //triangle 1
    buffer_poke(b_vbuff, ((0*t_dis)+(at+44))*rep, col_type, empty_buff);
    buffer_poke(b_vbuff, ((1*t_dis)+(at+44))*rep, col_type, empty_buff);
    buffer_poke(b_vbuff, ((2*t_dis)+(at+44))*rep, col_type, empty_buff)

    //triangle 2
    buffer_poke(b_vbuff, ((3*t_dis)+(at+44))*rep, col_type, empty_buff);
    buffer_poke(b_vbuff, ((4*t_dis)+(at+44))*rep, col_type, empty_buff);
    buffer_poke(b_vbuff, ((5*t_dis)+(at+44))*rep, col_type, empty_buff);
   
    rep+=1;
}

GML:
var t_dis = (4*12);
var col_type = buffer_u32;
var rep = 1;
repeat (vgrass_per_tile) {
   
    //triangle 1
    buffer_poke(b_vbuff, ((0*t_dis)+(at+44))*rep, col_type, vis_col);
    buffer_poke(b_vbuff, ((1*t_dis)+(at+44))*rep, col_type, vis_col);
    buffer_poke(b_vbuff, ((2*t_dis)+(at+44))*rep, col_type, vis_col);
                                   
    //triangle 2
    buffer_poke(b_vbuff, ((3*t_dis)+(at+44))*rep, col_type, vis_col);
    buffer_poke(b_vbuff, ((4*t_dis)+(at+44))*rep, col_type, vis_col);
    buffer_poke(b_vbuff, ((5*t_dis)+(at+44))*rep, col_type, vis_col);
   
    rep += 1;
}

Thank you once again for the help guys.
 

Juju

Member
for EVERY tile but with 0 alpha if there's no grass tile there
This is a really bad idea if the majority of your space has no grass. I strongly advise against this. What's happening is that you're totally saturating your GPU with work and it's causing system-wide instability as other processes can't properly use the GPU.

Remember that all of your permanently empty tiles still count as data that GM needs to communicate to the GPU and all of that wasted data is substantially impacting the performance of your code - vertex_create_from_buffer() has jumped from 1ms to 3.9ms. This is not a good sign. That your submission is taking 7.7ms is also concerning, I don't think I've ever seen vertex_submit() cost that much.

You need to reduce the amount of wasted empty space, either by using the ds_map method I suggested, or using a ds_grid if your grass is strongly locked to tiles.
 
Last edited:
This is a really bad idea if the majority of your space has no space. I strongly advise against this. What's happening is that you're totally saturating your GPU with work and it's causing system-wide instability as other processes can't properly use the GPU.

Use the ds_map method I suggested, or use a ds_grid if your grass is strongly locked to tiles. Remember that all of your empty tiles still count as data that GM needs to communicate to the GPU and all of that permanently empty space is impacting the performance of your code. Better yet, spatially partition your singular vertex buffer into many smaller vertex buffers .
Ah I see, I'll switch to that method then.
I don't know if to ask this, but from your experience, would there be any harm in making the vertex buffer only contain what's in the camera's view, and add / remove from the vertex buffer as the camera changes position?
I'm hoping this might lower the Ms for buffer_submit.

I'm also unsure if to keep the buffer the size of the room, or resize it to remove "zeroed" bytes...
 
Last edited:
Top