GameMaker Using buffers as 2D Arrays

K

kevins_office

Guest
I am trying to get faster creation and destruction times out of 2D arrays.
When using normal GML 2D arrays[x , y] they get created with the default GML Real data type, not only does this bloat the memory on large arrays[1000, 1000] it makes it very slow.

I tried making buffers like an array using peek/poke with offset to store and fetch values. This greatly improved creation time from 11,000µs to 300µs. However access time is 5 times slower than using 2D arrays.

So before i toss the baby out with the bath water i want to ask the community if buffers actually are slower to use or if it is my code slowing it down. Here are the scripts...

Code:
/// @func fast_array_create(w_size, h_size, byte_size)
/// @arg w_size
/// @arg h_size
/// @arg byte_size

// byte_size 1: 0 to 255 (buffer_u8)
// byte_size 2: 0 to 65,535 (buffer_u16)
// byte_size 4: -2,147,483,648 to 2,147,483,647 (buffer_s32)

var wSize = argument[0];
var hSize = argument[1];
var byteSize = argument[2];
var array_id = buffer_create((wSize * hSize * byteSize) + 4, buffer_fixed, byteSize);
buffer_poke(array_id, 0, buffer_u32, wSize);
return array_id;
Code:
/// @func fast_array_destroy(array_id)
/// @arg array_id

buffer_delete(argument[0]);
return;
Code:
/// @func fast_array_set(array_id, cell_x, cell_y, value)
/// @arg array_id
/// @arg cell_x
/// @arg cell_y
/// @arg value

var array_id = argument[0];
var cell_x = argument[1];
var cell_y = argument[2];
var value = argument[3];

var wSize = buffer_peek(array_id, 0, buffer_u32);
var byteSize = buffer_get_alignment(array_id);
var bufferType = buffer_u8;
if (byteSize == 2) bufferType = buffer_u16;
if (byteSize == 4) bufferType = buffer_s32;
var pos = (cell_y * wSize * byteSize) + (cell_x * byteSize) + 4;
buffer_poke(array_id, pos, bufferType, value);
return;
Code:
/// @func fast_array_get(array_id, cell_x, cell_y)
/// @arg array_id
/// @arg cell_x
/// @arg cell_y

var array_id = argument[0];
var cell_x = argument[1];
var cell_y = argument[2];

var wSize = buffer_peek(array_id, 0, buffer_u32);
var byteSize = buffer_get_alignment(array_id);
var bufferType = buffer_u8;
if (byteSize == 2) bufferType = buffer_u16;
if (byteSize == 4) bufferType = buffer_s32;
var pos = (cell_y * wSize * byteSize) + (cell_x * byteSize) + 4;
return buffer_peek(array_id, pos, bufferType);
Is there a faster way to accomplish this? Another method beside buffers?
 
Last edited by a moderator:
K

kevins_office

Guest
How are you creating the array?
If you mean standard 2D array not using buffers...

Code:
// array_create_2d(size, size)
var wSize = argument[0] - 1;
var hSize = argument[1] - 1;
var newArray = 0;
for (var wLoop = wSize; wLoop >= 0; wLoop--) newArray[wLoop, hSize] = 0;
return newArray;
 

Simon Gust

Member
If you mean standard 2D array not using buffers...

Code:
// array_create_2d(size, size)
var wSize = argument[0] - 1;
var hSize = argument[1] - 1;
var newArray = 0;
for (var wLoop = wSize; wLoop >= 0; wLoop--) newArray[wLoop, hSize] = 0;
return newArray;
looks good to me. It's also plenty fast for me.
What exactly are you using the buffer or array for?
Anyway, buffers are slower than arrays, there is nothing faster than an array in game maker.
 
K

kevins_office

Guest
What exactly are you using the buffer or array for?
Its for an A* Pathfinding script which needs to create 4 temp arrays and 1 queue.
The script itself takes only 200µs to run but creating the arrays adds another 30,000µs.
 
If creating the arrays is taking so long, just create them once and recycle them. Of course, you'll have to track the sizes yourself since you can only grow an array, but that's still better then the alternative.
 

CMAllen

Member
Have you tried switching from peek/poke combo to buffer_seek (from buffer_seek_start) just to see if those two functions have problems? I seem to recall other situations where peek/poke were not performing as quickly as expected.
 
K

kevins_office

Guest
If creating the arrays is taking so long, just create them once and recycle them. Of course, you'll have to track the sizes yourself since you can only grow an array, but that's still better then the alternative.
I have already tested that idea. It is slower to loop and clear arrays than it is to just create them.
 
K

kevins_office

Guest
Have you tried switching from peek/poke combo to buffer_seek (from buffer_seek_start) just to see if those two functions have problems? I seem to recall other situations where peek/poke were not performing as quickly as expected.
I just tested this, and it is slower to use seek + read/write instead of peek/poke.
I made a buffer array of 500x500, then looped all elements, set a value then read a value.

Doing this with seek + read/write the average time was 112,000µs
Doing this with peek/poke the average time was 103,000µs
vs doing it with a real 2D array which only takes 7,000µs
 
Last edited by a moderator:
K

kevins_office

Guest
Still trying to speed up access time, i rewrote the peek/poke functions to use less CPU steps with the side effect of ugly code...
Code:
/// @func fast_array_set(array_id, cell_x, cell_y, value)
/// @arg array_id
/// @arg cell_x
/// @arg cell_y
/// @arg value

var byteSize = buffer_get_alignment(argument[0]);
if (byteSize == 2) var bufferType = buffer_u16;
else if (byteSize == 4) var bufferType = buffer_s32;
else var bufferType = buffer_u8;
buffer_poke(argument[0], (argument[2] * buffer_peek(argument[0], 0, buffer_u32) * byteSize) + (argument[1] * byteSize) + 4, bufferType, argument[3]);
return;
Code:
/// @func fast_array_get(array_id, cell_x, cell_y)
/// @arg array_id
/// @arg cell_x
/// @arg cell_y

var byteSize = buffer_get_alignment(argument[0]);
if (byteSize == 2) var bufferType = buffer_u16;
else if (byteSize == 4) var bufferType = buffer_s32;
else var bufferType = buffer_u8;
return buffer_peek(argument[0], (argument[2] * buffer_peek(argument[0], 0, buffer_u32) * byteSize) + (argument[1] * byteSize) + 4, bufferType);
When doing the 500x500 array poke/peek test it reduced the averages from 103,000µs to 98,000µs. Not a huge improvement.
Anyone else have any ideas to speed up buffer access time?
 
Last edited by a moderator:
why muck around with different buffer alignments? If you have different kinds of buffers, why not write different access scripts for them?

The new scripts you wrote don't seem to bear much resemblance to what you had before, at least not at first glance. Why are there two buffer_peeks going on now?


Anyway, buffers are slower than arrays, there is nothing faster than an array in game maker.
Faster than just regular instance variable access?
 
K

kevins_office

Guest
why muck around with different buffer alignments? If you have different kinds of buffers, why not write different access scripts for them?
The speed difference of having one script that can handle different sized data types vs making multiple sets of scripts is less than 1µs.
Which is less than 1 / 1,000,000th of a second. I do not feel that speed difference is worth giving up the convenience of having a script that can handle multiple data sizes.


The new scripts you wrote don't seem to bear much resemblance to what you had before, at least not at first glance.
Both of the peek and poke scripts at the top and below are identical.
I just removed the local vars plugging arguments directly in where they are needed and optimizing the if() statement.
This saves time in creating, assigning and destroying temp variables.


Why are there two buffer_peeks going on now?
What two peeks? There is only one peek happening. What are you referring to?

[...edit...]

Oh, that two peeks. If you look at the script above the 2nd peek was for the local var wSize. This is needed to compute the 2D array offset.
Both versions of the script need to looked this up, so its not adding additional overhead.
The 2nd version is doing both peeks on the same line instead of before on two separate lines which then requires a temp var to store it in.
 
Last edited by a moderator:
What two peeks? There is only one peek happening. What are you referring to?
Nevermind, i guess the first peek is to measure the width of the buffer. Still, if you just knew the width of the buffer, it would save you from having to read it every time you access the buffer. I'm also really surprised eliminating the if statements doesn't improve things measurably.
 
K

kevins_office

Guest
Nevermind, i guess the first peek is to measure the width of the buffer. Still, if you just knew the width of the buffer, it would save you from having to read it every time you access the buffer.
The wSize offset being looked up by the peek, it has nothing to do with data type or alignment size. Even if those things were fixed you would still have to look wSize which is the width of the 2D array.
If you wanted an array that was array_create(12, 15) or array_create(32, 5) the offset amount would be different because one is 15 rows of 12, and the other is 5 rows of 32.

The only way to eliminate that offset being stored is to require that every array created was always the same size like var everyArray = array_create(50,50), you could never do array_create(something other than 50 x 50).
 
what I'm saying is since we've discovered that accessing the buffer with buffer_peek is apparently slow, why put an extra buffer_peek into every buffer access?
 
K

kevins_office

Guest
what I'm saying is since we've discovered that accessing the buffer with buffer_peek is apparently slow, why put an extra buffer_peek into every buffer access?
How else would you do it?
I am here asking for ideas that are faster than mine.
 
I don't know the details of your project, but I would probably just record the width of the buffer somewhere, maybe pass it in as an argument to the access scripts.

But I guess what I really wonder is if arrays are faster to access, why not use them? Are you creating new arrays/buffers every single step such that you have to worry about how long it takes to create them? 16MB is kind of a lot of space for a single array, but as long as you don't have too many of them, it shouldn't be a problem.
 
K

kevins_office

Guest
But I guess what I really wonder is if arrays are faster to access, why not use them?
Arrays are faster to access but slower to create and destroy.
Creating five 2D arrays of 500x500 takes 11,000µs
Creating five buffers of the same size takes 300µs.
That is why i went down this road of using buffers in the first place, because they definitely are not easier.

But that is my ultimate question, how can i have a handful of 2D arrays without the expensive cost of array creation and destruction time.
You know another option other than using buffers?
 

Simon Gust

Member
Arrays are faster to access but slower to create and destroy.
Creating five 2D arrays of 500x500 takes 11,000µs
Creating five buffers of the same size takes 300µs.
That is why i went down this road of using buffers in the first place, because they definitely are not easier.

But that is my ultimate question, how can i have a handful of 2D arrays without the expensive cost of array creation and destruction time.
You know another option other than using buffers?
Why do you need 5 arrays or buffers and not just 1? Is this all for pathfinding?
Or are they representative of chunks or areas in your project?
 
K

kevins_office

Guest
Why do you need 5 arrays or buffers and not just 1? Is this all for pathfinding?
Or are they representative of chunks or areas in your project?
How much can you reuse the arrays without clearing their contents?

Before reinventing the wheel, a lot of this has been addressed in another thread i linked above.
https://forum.yoyogames.com/index.php?threads/data-types.11973/page-2
My post is at the bottom of page two.
 

Simon Gust

Member
Before reinventing the wheel, a lot of this has been addressed in another thread i linked above.
https://forum.yoyogames.com/index.php?threads/data-types.11973/page-2
My post is at the bottom of page two.
So for your pathfinding only. Just a tip; you can store multiple numbers in an array and still only use 1 array worth of memory and create / destroy time.
If you bitmask those 5 values you need for pathfinding whatever they are (if they're not more than 12 bits in your case).
I tested the speed difference recently, it should be faster than a buffer.
 
ok, what about chunking? It would slow down accessing the arrays as you first have to compute which chunk you are in, but could dramatically reduce the amount of array clearing that needs to be done as the player moves around. You also have to wonder how much do you need A*? Do you need it all the time? Is it even realistic? Imagine most people wandering through a world they are not very familiar with, most of the time they will be stumbling around rather inefficiently, moving only on a best-guess basis toward their goal. Would that make sense for your pathfinders?
 
K

kevins_office

Guest
ok, what about chunking? It would slow down accessing the arrays as you first have to compute which chunk you are in, but could dramatically reduce the amount of array clearing that needs to be done as the player moves around. You also have to wonder how much do you need A*? Do you need it all the time? Is it even realistic? Imagine most people wandering through a world they are not very familiar with, most of the time they will be stumbling around rather inefficiently, moving only on a best-guess basis toward their goal. Would that make sense for your pathfinders?
I do not think players will tolerate stupid AI. Would you enjoy selecting a handful of units and click them to retreat into your base, but instead they get stuck in a clusters of tree and get killed by the enemy that was chasing them.
 
K

kevins_office

Guest
you can store multiple numbers in an array ... If you bitmask those 5 values
All together im storing...

A* G value as buffer_u16 0 to 65,535
A* H value as buffer_u16 0 to 65,535
Closed & Forbidden as buffer_u8 0 to 255 --OR-- Closed as bool AND Forbidden as bool
Parent x as buffer_u16 0 to 65,535
Parent y as buffer_u16 0 to 65,535

The open list is stored in a priority list so it isn't part of this optimizing.
How could i bit shift all of those variables into a single value and shift it back apart?
 

CMAllen

Member
I just tested this, and it is slower to use seek + read/write instead of peek/poke.
I made a buffer array of 500x500, then looped all elements, set a value then read a value.

Doing this with seek + read/write the average time was 112,000µs
Doing this with peek/poke the average time was 103,000µs
vs doing it with a real 2D array which only takes 7,000µs
How exactly are you using buffer_seek()?

In any case, it is bizarre that arrays are so fast while buffers are so slow.
 

Simon Gust

Member
All together im storing...

A* G value as buffer_u16 0 to 65,535
A* H value as buffer_u16 0 to 65,535
Closed & Forbidden as buffer_u8 0 to 255 --OR-- Closed as bool AND Forbidden as bool
Parent x as buffer_u16 0 to 65,535
Parent y as buffer_u16 0 to 65,535

The open list is stored in a priority list so it isn't part of this optimizing.
How could i bit shift all of those variables into a single value and shift it back apart?
I mean like the real size these values get. I'm pretty sure you never need higher than grid size in number.
The grid size is 500, so 10 bits should be enough (1024)
and boolean value can just reserve 1 bit
and directional values also only need 3 bits.
This would make just 42 bits which fits into the 64 bit mask you have available.

EDIT: Yes, so as I thought; bitmasking would still be faster than buffers.
while buffers recorded 55ms average, bitmasking recorded 10ms.
Arrays like that scored only 3ms but as you said, creating 5 arrays also takes some time and a lot of memory.
 
Last edited:

dphsw

Member
Just to test the speed of various methods of accessing, I tried adding a million numbers using a few different methods. These results are all from using YYC in Windows.

If they were stored in an array, and being added up all in order, it took 146ms.
If they were stored in an array, and being added up out of order (so for val[y*height + x], the y loop is now on the inside), it took 170ms.
If they were stored in a buffer, added in order using buffer_read, it took 128ms.
If they were stored in a buffer, added out of order using buffer_peek, it took 266ms.
If they were stored in a buffer, added in order using buffer_peek, it took 209ms.
If I put them all in a ds_grid and just added using ds_grid_get_sum, it took 180ms. (That surprised me, since I thought the internal addition code would be more efficient.)

These tests had been done without declaring my variables using the var keyword, so they were object variables instead of local variables. Declaring the variables with var got the quickest test, storing in a buffer and accessing sequentially with buffer_read, down to 81ms.
Finally, instead of adding one variable at a time, I tried adding 4 per loop (so that the compiler should theoretically use an SIMD instruction to add them). That got the time down to 61ms. The final code looked like this (minus the part to put the random numbers in the buffer to start with):

Code:
t=current_time;
var i,j;
for(i=0;i<n3;i++){
    sv0+=buffer_read(a,buffer_f64);
    sv1+=buffer_read(a,buffer_f64);
    sv2+=buffer_read(a,buffer_f64);
    sv3+=buffer_read(a,buffer_f64);
}
s=sv0+sv1+sv2+sv3;
t=current_time-t;
Of course, given that I tried basically the same code in C# and got a speed of 6ms without optimising at all (the worst-case-scenario, adding it all out of order, took 19ms - only 9ms with a stride of 1000 through the array, but 19ms with a stride of 1024, presumably because 1024 doubles take up 8Kb, exactly the size of my CPU's L1 cache line, causing constant cache-misses), and about 1.5ms after adding the SIMD optimisation, GML's best time, 61ms (and that's using the YYC), is still dismally poor. But the points here are:
  • I've seen a lot of disagreement over whether array or buffer is faster - I think that's because buffers are faster if you read/write them in order, otherwise arrays are faster.
  • While using buffers for random access is slower, it isn't 5x slower as you claim in your initial post - that slowdown is coming partly from your accessing them through your own function calls. Function calls in GML are slow as well. I once had a function to check for level collisions that was about a single complicated line of code, and was used repeatedly by the enemies and players, which was using 40% of the CPU time - it sped up the game dramatically to just copy and paste that line of code to everywhere in the game where level collisions needed to be checked for, because the function call, and passing the arguments, was using up more time than the processing that the line of code actually did.
  • In any language, it's always faster to access data in the order it's stored (or at least that's near the last bit of data accessed - for more info, Google 'CPU caches') - it can be worth rethinking your algorithm a bit, and how things are stored, to try to ensure this is the case as often as possible, even if it can't be the case 100% of the time. (I suspect the reason you get limited benefit from this in GML is because the variables are all structs of pointers to pointers, and even though bits of arrays should be contiguous given how your code is written, GML can have stored them in different places, forcing the CPU to constantly hunt all over memory-space for variables and bits of code - my guess is this is what makes it so much slower than other languages.)
  • If you haven't used var to declare the variables you're working with yet, that's probably the most significant speedup you can get in GML.
  • You might get a slight speedup from trying to do four of the same operation at once (think of it as working with vectors - it should be 4x as fast, if compiled properly; the YYC doesn't compile well, but it still seems to get a bit of advantage from this).
  • If you aren't concerned about running your code on all platforms, the most significant speedup - about 40x faster - is to write the algorithm in a DLL. (A shader might work too, depending on the algorithm, though it's more awkward getting data back from it, and the instruction limits vary across computers.)
 

CMAllen

Member
Of course, given that I tried basically the same code in C# and got a speed of 6ms without optimising at all (the worst-case-scenario, adding it all out of order, took 19ms - only 9ms with a stride of 1000 through the array, but 19ms with a stride of 1024, presumably because 1024 doubles take up 8Kb, exactly the size of my CPU's L1 cache line, causing constant cache-misses), and about 1.5ms after adding the SIMD optimisation, GML's best time, 61ms (and that's using the YYC), is still dismally poor.
...oi. That is most distressing. What is going on with GMS's data management back-end? I mean, I expect some slow down running user code through an interpreter, but GMS's own native functions I would expect far better results than that. That's roughly a 40x speed difference.
 
Top