1. Hey! Guest! The 35th GMC Jam will take place between November 28th, 12:00 UTC - December 2nd, 12:00 UTC. Why not join in! Click here to find out more!
    Dismiss Notice
  2. Hello Guest! It's with a heavy heart that we must announce the removal of the Legacy GMC Archive. If you wish to save anything from it, now's the time! Please see this topic for more information.
    Dismiss Notice

GMS 2 Using buffers as 2D Arrays

Discussion in 'Programming' started by kevins_office, Feb 12, 2018.

  1. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    I am trying to get faster creation and destruction times out of 2D arrays.
    When using normal GML 2D arrays[x , y] they get created with the default GML Real data type, not only does this bloat the memory on large arrays[1000, 1000] it makes it very slow.

    I tried making buffers like an array using peek/poke with offset to store and fetch values. This greatly improved creation time from 11,000µs to 300µs. However access time is 5 times slower than using 2D arrays.

    So before i toss the baby out with the bath water i want to ask the community if buffers actually are slower to use or if it is my code slowing it down. Here are the scripts...

    Code:
    /// @func fast_array_create(w_size, h_size, byte_size)
    /// @arg w_size
    /// @arg h_size
    /// @arg byte_size
    
    // byte_size 1: 0 to 255 (buffer_u8)
    // byte_size 2: 0 to 65,535 (buffer_u16)
    // byte_size 4: -2,147,483,648 to 2,147,483,647 (buffer_s32)
    
    var wSize = argument[0];
    var hSize = argument[1];
    var byteSize = argument[2];
    var array_id = buffer_create((wSize * hSize * byteSize) + 4, buffer_fixed, byteSize);
    buffer_poke(array_id, 0, buffer_u32, wSize);
    return array_id;
    
    Code:
    /// @func fast_array_destroy(array_id)
    /// @arg array_id
    
    buffer_delete(argument[0]);
    return;
    
    Code:
    /// @func fast_array_set(array_id, cell_x, cell_y, value)
    /// @arg array_id
    /// @arg cell_x
    /// @arg cell_y
    /// @arg value
    
    var array_id = argument[0];
    var cell_x = argument[1];
    var cell_y = argument[2];
    var value = argument[3];
    
    var wSize = buffer_peek(array_id, 0, buffer_u32);
    var byteSize = buffer_get_alignment(array_id);
    var bufferType = buffer_u8;
    if (byteSize == 2) bufferType = buffer_u16;
    if (byteSize == 4) bufferType = buffer_s32;
    var pos = (cell_y * wSize * byteSize) + (cell_x * byteSize) + 4;
    buffer_poke(array_id, pos, bufferType, value);
    return;
    
    Code:
    /// @func fast_array_get(array_id, cell_x, cell_y)
    /// @arg array_id
    /// @arg cell_x
    /// @arg cell_y
    
    var array_id = argument[0];
    var cell_x = argument[1];
    var cell_y = argument[2];
    
    var wSize = buffer_peek(array_id, 0, buffer_u32);
    var byteSize = buffer_get_alignment(array_id);
    var bufferType = buffer_u8;
    if (byteSize == 2) bufferType = buffer_u16;
    if (byteSize == 4) bufferType = buffer_s32;
    var pos = (cell_y * wSize * byteSize) + (cell_x * byteSize) + 4;
    return buffer_peek(array_id, pos, bufferType);
    
    Is there a faster way to accomplish this? Another method beside buffers?
     
    Last edited: Feb 15, 2018
  2. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    How are you creating the array?
     
  3. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    If you mean standard 2D array not using buffers...

    Code:
    // array_create_2d(size, size)
    var wSize = argument[0] - 1;
    var hSize = argument[1] - 1;
    var newArray = 0;
    for (var wLoop = wSize; wLoop >= 0; wLoop--) newArray[wLoop, hSize] = 0;
    return newArray;
    
     
  4. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    looks good to me. It's also plenty fast for me.
    What exactly are you using the buffer or array for?
    Anyway, buffers are slower than arrays, there is nothing faster than an array in game maker.
     
  5. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    Its for an A* Pathfinding script which needs to create 4 temp arrays and 1 queue.
    The script itself takes only 200µs to run but creating the arrays adds another 30,000µs.
     
  6. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    So im out of luck since arrays can only be the bloated Real data type?
     
  7. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    Yes, every cell takes 4 or 8 bytes I don't remember.
    But why do you need such a big array for pathfinding?
     
  8. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
  9. stainedofmind

    stainedofmind Member

    Joined:
    Jun 20, 2016
    Posts:
    701
    If creating the arrays is taking so long, just create them once and recycle them. Of course, you'll have to track the sizes yourself since you can only grow an array, but that's still better then the alternative.
     
  10. CMAllen

    CMAllen Member

    Joined:
    Mar 2, 2017
    Posts:
    856
    Have you tried switching from peek/poke combo to buffer_seek (from buffer_seek_start) just to see if those two functions have problems? I seem to recall other situations where peek/poke were not performing as quickly as expected.
     
  11. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    I have already tested that idea. It is slower to loop and clear arrays than it is to just create them.
     
    stainedofmind likes this.
  12. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    I just tested this, and it is slower to use seek + read/write instead of peek/poke.
    I made a buffer array of 500x500, then looped all elements, set a value then read a value.

    Doing this with seek + read/write the average time was 112,000µs
    Doing this with peek/poke the average time was 103,000µs
    vs doing it with a real 2D array which only takes 7,000µs
     
    Last edited: Feb 13, 2018
  13. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    Still trying to speed up access time, i rewrote the peek/poke functions to use less CPU steps with the side effect of ugly code...
    Code:
    /// @func fast_array_set(array_id, cell_x, cell_y, value)
    /// @arg array_id
    /// @arg cell_x
    /// @arg cell_y
    /// @arg value
    
    var byteSize = buffer_get_alignment(argument[0]);
    if (byteSize == 2) var bufferType = buffer_u16;
    else if (byteSize == 4) var bufferType = buffer_s32;
    else var bufferType = buffer_u8;
    buffer_poke(argument[0], (argument[2] * buffer_peek(argument[0], 0, buffer_u32) * byteSize) + (argument[1] * byteSize) + 4, bufferType, argument[3]);
    return;
    
    Code:
    /// @func fast_array_get(array_id, cell_x, cell_y)
    /// @arg array_id
    /// @arg cell_x
    /// @arg cell_y
    
    var byteSize = buffer_get_alignment(argument[0]);
    if (byteSize == 2) var bufferType = buffer_u16;
    else if (byteSize == 4) var bufferType = buffer_s32;
    else var bufferType = buffer_u8;
    return buffer_peek(argument[0], (argument[2] * buffer_peek(argument[0], 0, buffer_u32) * byteSize) + (argument[1] * byteSize) + 4, bufferType);
    
    When doing the 500x500 array poke/peek test it reduced the averages from 103,000µs to 98,000µs. Not a huge improvement.
    Anyone else have any ideas to speed up buffer access time?
     
    Last edited: Feb 13, 2018
  14. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    why muck around with different buffer alignments? If you have different kinds of buffers, why not write different access scripts for them?

    The new scripts you wrote don't seem to bear much resemblance to what you had before, at least not at first glance. Why are there two buffer_peeks going on now?


    Faster than just regular instance variable access?
     
  15. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    no, but you can't emulate a 2D grid out of instance variables so easily.
     
  16. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    just clearing up the ambiguity.
     
    Simon Gust likes this.
  17. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    The speed difference of having one script that can handle different sized data types vs making multiple sets of scripts is less than 1µs.
    Which is less than 1 / 1,000,000th of a second. I do not feel that speed difference is worth giving up the convenience of having a script that can handle multiple data sizes.


    Both of the peek and poke scripts at the top and below are identical.
    I just removed the local vars plugging arguments directly in where they are needed and optimizing the if() statement.
    This saves time in creating, assigning and destroying temp variables.


    What two peeks? There is only one peek happening. What are you referring to?

    [...edit...]

    Oh, that two peeks. If you look at the script above the 2nd peek was for the local var wSize. This is needed to compute the 2D array offset.
    Both versions of the script need to looked this up, so its not adding additional overhead.
    The 2nd version is doing both peeks on the same line instead of before on two separate lines which then requires a temp var to store it in.
     
    Last edited: Feb 13, 2018
  18. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    Nevermind, i guess the first peek is to measure the width of the buffer. Still, if you just knew the width of the buffer, it would save you from having to read it every time you access the buffer. I'm also really surprised eliminating the if statements doesn't improve things measurably.
     
  19. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    The wSize offset being looked up by the peek, it has nothing to do with data type or alignment size. Even if those things were fixed you would still have to look wSize which is the width of the 2D array.
    If you wanted an array that was array_create(12, 15) or array_create(32, 5) the offset amount would be different because one is 15 rows of 12, and the other is 5 rows of 32.

    The only way to eliminate that offset being stored is to require that every array created was always the same size like var everyArray = array_create(50,50), you could never do array_create(something other than 50 x 50).
     
  20. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    what I'm saying is since we've discovered that accessing the buffer with buffer_peek is apparently slow, why put an extra buffer_peek into every buffer access?
     
  21. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    How else would you do it?
    I am here asking for ideas that are faster than mine.
     
  22. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    I don't know the details of your project, but I would probably just record the width of the buffer somewhere, maybe pass it in as an argument to the access scripts.

    But I guess what I really wonder is if arrays are faster to access, why not use them? Are you creating new arrays/buffers every single step such that you have to worry about how long it takes to create them? 16MB is kind of a lot of space for a single array, but as long as you don't have too many of them, it shouldn't be a problem.
     
  23. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    Arrays are faster to access but slower to create and destroy.
    Creating five 2D arrays of 500x500 takes 11,000µs
    Creating five buffers of the same size takes 300µs.
    That is why i went down this road of using buffers in the first place, because they definitely are not easier.

    But that is my ultimate question, how can i have a handful of 2D arrays without the expensive cost of array creation and destruction time.
    You know another option other than using buffers?
     
  24. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    Why do you need 5 arrays or buffers and not just 1? Is this all for pathfinding?
    Or are they representative of chunks or areas in your project?
     
  25. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    How much can you reuse the arrays without clearing their contents?
     
  26. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224

    Before reinventing the wheel, a lot of this has been addressed in another thread i linked above.
    https://forum.yoyogames.com/index.php?threads/data-types.11973/page-2
    My post is at the bottom of page two.
     
  27. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    So for your pathfinding only. Just a tip; you can store multiple numbers in an array and still only use 1 array worth of memory and create / destroy time.
    If you bitmask those 5 values you need for pathfinding whatever they are (if they're not more than 12 bits in your case).
    I tested the speed difference recently, it should be faster than a buffer.
     
  28. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    ok, what about chunking? It would slow down accessing the arrays as you first have to compute which chunk you are in, but could dramatically reduce the amount of array clearing that needs to be done as the player moves around. You also have to wonder how much do you need A*? Do you need it all the time? Is it even realistic? Imagine most people wandering through a world they are not very familiar with, most of the time they will be stumbling around rather inefficiently, moving only on a best-guess basis toward their goal. Would that make sense for your pathfinders?
     
  29. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    I do not think players will tolerate stupid AI. Would you enjoy selecting a handful of units and click them to retreat into your base, but instead they get stuck in a clusters of tree and get killed by the enemy that was chasing them.
     
  30. flyingsaucerinvasion

    flyingsaucerinvasion Member

    Joined:
    Jun 20, 2016
    Posts:
    2,199
    How complicated is the terrain? How easy is it to get stuck? Lots of games used to have stupid ai. The player would compensate for that.
     
  31. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    All together im storing...

    A* G value as buffer_u16 0 to 65,535
    A* H value as buffer_u16 0 to 65,535
    Closed & Forbidden as buffer_u8 0 to 255 --OR-- Closed as bool AND Forbidden as bool
    Parent x as buffer_u16 0 to 65,535
    Parent y as buffer_u16 0 to 65,535

    The open list is stored in a priority list so it isn't part of this optimizing.
    How could i bit shift all of those variables into a single value and shift it back apart?
     
  32. CMAllen

    CMAllen Member

    Joined:
    Mar 2, 2017
    Posts:
    856
    How exactly are you using buffer_seek()?

    In any case, it is bizarre that arrays are so fast while buffers are so slow.
     
  33. kevins_office

    kevins_office Member

    Joined:
    Dec 19, 2017
    Posts:
    224
    I used it like this...
    Code:
    buffer_seek(array_id, buffer_seek_start, myOffset);
    buffer_read(array_id, buffer_u16);
    
     
  34. CMAllen

    CMAllen Member

    Joined:
    Mar 2, 2017
    Posts:
    856
    That's pretty much what I figured you were doing (and what I was suggesting), so that answers that.
     
  35. Simon Gust

    Simon Gust Member

    Joined:
    Nov 15, 2016
    Posts:
    3,201
    I mean like the real size these values get. I'm pretty sure you never need higher than grid size in number.
    The grid size is 500, so 10 bits should be enough (1024)
    and boolean value can just reserve 1 bit
    and directional values also only need 3 bits.
    This would make just 42 bits which fits into the 64 bit mask you have available.

    EDIT: Yes, so as I thought; bitmasking would still be faster than buffers.
    while buffers recorded 55ms average, bitmasking recorded 10ms.
    Arrays like that scored only 3ms but as you said, creating 5 arrays also takes some time and a lot of memory.
     
    Last edited: Feb 14, 2018
  36. dphsw

    dphsw Member

    Joined:
    Oct 19, 2016
    Posts:
    82
    Just to test the speed of various methods of accessing, I tried adding a million numbers using a few different methods. These results are all from using YYC in Windows.

    If they were stored in an array, and being added up all in order, it took 146ms.
    If they were stored in an array, and being added up out of order (so for val[y*height + x], the y loop is now on the inside), it took 170ms.
    If they were stored in a buffer, added in order using buffer_read, it took 128ms.
    If they were stored in a buffer, added out of order using buffer_peek, it took 266ms.
    If they were stored in a buffer, added in order using buffer_peek, it took 209ms.
    If I put them all in a ds_grid and just added using ds_grid_get_sum, it took 180ms. (That surprised me, since I thought the internal addition code would be more efficient.)

    These tests had been done without declaring my variables using the var keyword, so they were object variables instead of local variables. Declaring the variables with var got the quickest test, storing in a buffer and accessing sequentially with buffer_read, down to 81ms.
    Finally, instead of adding one variable at a time, I tried adding 4 per loop (so that the compiler should theoretically use an SIMD instruction to add them). That got the time down to 61ms. The final code looked like this (minus the part to put the random numbers in the buffer to start with):

    Code:
    t=current_time;
    var i,j;
    for(i=0;i<n3;i++){
        sv0+=buffer_read(a,buffer_f64);
        sv1+=buffer_read(a,buffer_f64);
        sv2+=buffer_read(a,buffer_f64);
        sv3+=buffer_read(a,buffer_f64);
    }
    s=sv0+sv1+sv2+sv3;
    t=current_time-t;
    
    Of course, given that I tried basically the same code in C# and got a speed of 6ms without optimising at all (the worst-case-scenario, adding it all out of order, took 19ms - only 9ms with a stride of 1000 through the array, but 19ms with a stride of 1024, presumably because 1024 doubles take up 8Kb, exactly the size of my CPU's L1 cache line, causing constant cache-misses), and about 1.5ms after adding the SIMD optimisation, GML's best time, 61ms (and that's using the YYC), is still dismally poor. But the points here are:
    • I've seen a lot of disagreement over whether array or buffer is faster - I think that's because buffers are faster if you read/write them in order, otherwise arrays are faster.
    • While using buffers for random access is slower, it isn't 5x slower as you claim in your initial post - that slowdown is coming partly from your accessing them through your own function calls. Function calls in GML are slow as well. I once had a function to check for level collisions that was about a single complicated line of code, and was used repeatedly by the enemies and players, which was using 40% of the CPU time - it sped up the game dramatically to just copy and paste that line of code to everywhere in the game where level collisions needed to be checked for, because the function call, and passing the arguments, was using up more time than the processing that the line of code actually did.
    • In any language, it's always faster to access data in the order it's stored (or at least that's near the last bit of data accessed - for more info, Google 'CPU caches') - it can be worth rethinking your algorithm a bit, and how things are stored, to try to ensure this is the case as often as possible, even if it can't be the case 100% of the time. (I suspect the reason you get limited benefit from this in GML is because the variables are all structs of pointers to pointers, and even though bits of arrays should be contiguous given how your code is written, GML can have stored them in different places, forcing the CPU to constantly hunt all over memory-space for variables and bits of code - my guess is this is what makes it so much slower than other languages.)
    • If you haven't used var to declare the variables you're working with yet, that's probably the most significant speedup you can get in GML.
    • You might get a slight speedup from trying to do four of the same operation at once (think of it as working with vectors - it should be 4x as fast, if compiled properly; the YYC doesn't compile well, but it still seems to get a bit of advantage from this).
    • If you aren't concerned about running your code on all platforms, the most significant speedup - about 40x faster - is to write the algorithm in a DLL. (A shader might work too, depending on the algorithm, though it's more awkward getting data back from it, and the instruction limits vary across computers.)
     
    Simon Gust and CMAllen like this.
  37. CMAllen

    CMAllen Member

    Joined:
    Mar 2, 2017
    Posts:
    856
    ...oi. That is most distressing. What is going on with GMS's data management back-end? I mean, I expect some slow down running user code through an interpreter, but GMS's own native functions I would expect far better results than that. That's roughly a 40x speed difference.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice