GML Data structures vs 2d arrays for storing room layer info?

Dario8676 · Sep 14, 2020

Hi everyone, so i have 8 different background layers in my room. Clouds. Mountains, trees. Etc all comprised of sprites.... For each of those layers I want to store the following info.
1. Layer id
2. Sprite width
3. Horizontal speed
4. Offset amount // For use with parallex scrolling.

Currently I'm using a 2d array to store these values.
Backgrounds[7,3]

I am storing these values all into a 2d array when the room is created and then passing it to a 2d array in my Camera object for use in the Cameras step event.

I know so many people use data structures in gamemaker instead of arrays so I feel like I could be doing this much more efficiently, but I've never worked with ds lists, grids, maps, etc...

What do you guys suggest?

I should mention that I'm using the latest version of gamemaker and I know they also recently introduced structs.

Curious on how you guys would handle it.

RefresherTowel · Sep 14, 2020

Well, arrays are a data structure, they just lack the functionality of some of the other data structures (such as shuffling, deleting single entries easily (not impossible, but no built-in method for it), etc), they can't be saved into JSON format properly and also they are automatically garbage collected, whereas other data structures need to be manually deleted. They also happen to be the fastest DS' to use (unless they are really big, a 100 000 entry array is likely to be slower to access than a 100 000 entry ds_map). Structs are also slow in the big scheme of things (or so I have heard). So with all that said, I see no reason why you would need to convert to a DS or a struct in your use case. However, for readabilities sake, I would set up some enums and use them in place of magic numbers:

Code:

enum bg_data {
   LAY_ID,
   SPR_WIDTH,
   HOR_SPEED,
   OFFSET
}

Then whenever you read or write to array you use the enum values instead of the number. For instance, if the row (or the second number) in the array is referring to the values referenced above, then you would do thisBackgrounds[7,bg_data.HOR_SPEED];. bg_data.HOR_SPEED gets changed into the number 3 2 (as it is the third entry in the enum) by the computer, so it's identical to Backgrounds[7,2], it's just easier to read and remember what goes where.

Also, remember in 2.3 now you would access the array like so Backgrounds[7][bg_data.HOR_SPEED], instead of using the comma variant that is used in <2.3.

Bulletech · Sep 14, 2020

* bg_data.HOR_SPEED gets changed into the number 2
enum default value assignment starts at 0.

Otherwise agree with everything RefresherTowel said.
Also see halfway down this page for YYG's guidance on arrays vs DS's https://www.yoyogames.com/blog/63/best-practices-when-coding-in-gamemaker-studio-2

RefresherTowel · Sep 14, 2020

BulleTech Studios said:
* bg_data.HOR_SPEED gets changed into the number 2
enum default value assignment starts at 0.

Good point, silly oversight on my part.

Nidoking · Sep 14, 2020

RefresherTowel said:
a 100 000 entry array is likely to be slower to access than a 100 000 entry ds_map

I don't believe this is true. While a map is constant time access, a array is also just a direct access to a memory location, with the only difference being that it's a multiplication of array index times entry size rather than a hash function to get the offset. If anything, the array should be strictly faster, since the hash function should take longer than the multiplication. I don't know enough about the internals of Game Maker to speculate on the specifics of how a constant entry size works for multiple types (and wouldn't even if I did), but the same consideration for element access redirection must surely exist for both structures.

Dario8676 · Sep 14, 2020

RefresherTowel said:
Well, arrays are a data structure, they just lack the functionality of some of the other data structures (such as shuffling, deleting single entries easily (not impossible, but no built-in method for it), etc), they can't be saved into JSON format properly and also they are automatically garbage collected, whereas other data structures need to be manually deleted. They also happen to be the fastest DS' to use (unless they are really big, a 100 000 entry array is likely to be slower to access than a 100 000 entry ds_map). Structs are also slow in the big scheme of things (or so I have heard). So with all that said, I see no reason why you would need to convert to a DS or a struct in your use case. However, for readabilities sake, I would set up some enums and use them in place of magic numbers:

Code:

enum bg_data { LAY_ID, SPR_WIDTH, HOR_SPEED, OFFSET }

Then whenever you read or write to array you use the enum values instead of the number. For instance, if the row (or the second number) in the array is referring to the values referenced above, then you would do thisBackgrounds[7,bg_data.HOR_SPEED];. bg_data.HOR_SPEED gets changed into the number 3 2 (as it is the third entry in the enum) by the computer, so it's identical to Backgrounds[7,2], it's just easier to read and remember what goes where.

Also, remember in 2.3 now you would access the array like so Backgrounds[7][bg_data.HOR_SPEED], instead of using the comma variant that is used in <2.3.

This makes a ton if sense. Thank you

TheouAegis · Sep 14, 2020

RefresherTowel said:
unless they are really big, a 100 000 entry array is likely to be slower to access than a 100 000 entry ds_map

Not sure about reading an array, but any noticeable speed disparity between an array with 100 indexes and one with 100k indexes is due to the 100k array is actually FOUR arrays, not one, and in a lot of cases you are just not handling it properly. I'm not saying you are wrong - I'd need to speed test for myself (and I'm limited to GMS1 now) - but just pointing out when writing to such an array, people typically do it wrong. An array can only technically go up to index 31999. Beyond that, an additional array is created and chained to the other one. Arrays are also created in increments of powers of 2. Couple that with array chaining, you are looking at major slowdown when improperly writing arrays. I don't see how reading an array would be affected, but, again, I'd need to run speed tests and I could see where some bottlenecking could potentially exist.

Just a reminder for experienced users and a lesson for new users:

Arrays memory use is by powers of 2. An array with 10 indexes uses the same memory as an array with 16 indexes.
If an array currently uses less memory than the equivalent of an index being written to, the array memory will be expanded to the next power of 2. If an array has 16 indexes and you write to index 16, memory will be set aside for 32 indexes.
An array can only take up the equivalent memory of 32000 variables. Beyond that, another independent array is created and linked to the previous array, subject to its own rules.
Memory allocation does not equal initialization. If index initialized, even though the array is allocated the memory for 8 indexes, index 7 is not initialized and thus does not exist.
Initializing any index will initialize all uninitialized lower indexes inside that array to 0. If an array with up to index 3 already set has index 7 initialized, indexes 4 thru 6 will be set to 0.
ALWAYS initialize an array using its own highest index. You do not need to fill the array in reverse order, just initialize the highest index (e.g., array[31999]=0).
Initialize as many indexes as you may need, not the as many as you immediately need. If you only need 20 indexes at the start of the game, but the player may need access to 256 indexes by the end of the game, initialize all 256 indexes at the start.
Memory expansion for an array is slow. This is the biggest killer of large arrays.

32000 may seem like a lot, but it's not. We can put it into perspective while also analyzing the rules for array write optimization. Let's say you have a grid if data that is a mere 256 cells by 256 cells. Whatever it is intended for, it's just 256x256 - a nice pair of 8bit numbers. You could use a 2D array, but you are cool and hate nested for-loops, so you want to use a 1D array. Do the math and a 256x256 grid has 65536 cells you need to initialize.

If you filled the array with a loop starting at 0 like most beginners, memory would be allocated for 1 variable and set, then increased for 2 variables and set, then increased for 4 variables, then set over two iterations. Then memory would be allocated for 8 variables, then set over 4 iterations. And so forth. Every 32000 indexes, a new array gets created. You test the code and the slowdown is excruciating.

But you're not a beginner and know just setting the highest index will set the rest to 0. So you initialize index 65535. Later, you test your code and when it tries to read index 1, the program crashes and spits out an unknown variable error. Or was it out of range? Whichever it is, it crashed. You debug and find out only indexes 0, 32000, and 64000 & higher were initialized.

You play things safe use a loop to set all 65536 variables in reverse. You test the code and-- There's some noticeable lag once again, but not as bad as before. Why? First memory is allotted for index 66048 -- yes, 66048, not 65535 -- as well as new arrays at 0 and 32000. After the loop writes to 64000, memory is expanded for index 63999. After the loop writes to 32000, more memory is allotted for index 31999, then the loop finishes up.

Pop Quiz: How would you initialize all 65536 indexes with no noticeable slowdown?

The solution is to initialize indexes 31999, 63999, and 65535 separately in that order. After the 3 arrays have been initialized, then you can fill all the indexes in a loop if you want.

chamaeleon · Sep 14, 2020

TheouAegis said:
The solution is to initialize indexes 31999, 63999, and 65535 separately in that order. After the 3 arrays have been initialized, then you can fill all the indexes in a loop if you want.

Or you could just a = array_create(100000); before your loop and iterate in whichever direction you want.

TheouAegis · Sep 14, 2020

Have you actually speed tested that? Don't assume just because it's an included function that it's actually faster than manual coding.

chamaeleon · Sep 14, 2020

TheouAegis said:
Have you actually speed tested that? Don't assume just because it's an included function that it's actually faster than manual coding.

Yes.

GML:

for (var n = 10000; n <= 100000; n += 10000) {
    var t1 = get_timer();
    for (var i = 0; i < n; i++) {
        a[i] = 0;
    }
    var t2 = get_timer();
    var uptime = (t2-t1)/1000000;
    
    var t1 = get_timer();
    for (var i = n-1; i >= 0; i--) {
        b[i] = 0;
    }
    var t2 = get_timer();
    var downtime = (t2-t1)/1000000;

    show_debug_message(string(n) + " " + string(uptime) + " " + string(downtime));
}

Code:

10000 0.07 0.00
20000 0.23 0.01
30000 0.40 0.01
40000 1.67 0.02
50000 2.43 0.02
60000 3.05 0.02
70000 3.55 0.03
80000 3.97 0.03
90000 4.49 0.03
100000 5.28 0.04

GML:

for (var n = 10000; n <= 100000; n += 10000) {
    a= array_create(100000);
    var t1 = get_timer();
    for (var i = 0; i < n; i++) {
        a[i] = 0;
    }
    var t2 = get_timer();
    var uptime = (t2-t1)/1000000;
    
    var t1 = get_timer();
    for (var i = n-1; i >= 0; i--) {
        b[i] = 0;
    }
    var t2 = get_timer();
    var downtime = (t2-t1)/1000000;

    show_debug_message(string(n) + " " + string(uptime) + " " + string(downtime));
}

Code:

10000 0.00 0.00
20000 0.01 0.01
30000 0.01 0.02
40000 0.02 0.02
50000 0.02 0.02
60000 0.02 0.03
70000 0.04 0.03
80000 0.03 0.03
90000 0.03 0.03
100000 0.03 0.03

TheouAegis · Sep 14, 2020

No, I didn't ask you to check if array_create() is faster than looping through all indices and setting them, I asked you to check if setting every 32000 indexes before setting the array is faster. And looking at your numbers, I would venture my method is faster. Your numbers show that array_create() is identical to n-1 looping. In fact, your test shows that is exactly what GM is doing inside that function. if you actually read my post, I explained why that is not the correct way to initialize an array. Now yes, if you want to set the array to the same non-zero value, you should take the simpler route and use array_create(). But if your only goal is to just initialize the array with 0, then array_create() is worse than my method.

chamaeleon · Sep 14, 2020

TheouAegis said:
No, I didn't ask you to check if array_create() is faster than looping through all indices and setting them, I asked you to check if setting every 32000 indexes before setting the array is faster. And looking at your numbers, I would venture my method is faster. Your numbers show that array_create() is identical to n-1 looping. In fact, your test shows that is exactly what GM is doing inside that function. if you actually read my post, I explained why that is not the correct way to initialize an array. Now yes, if you want to set the array to the same non-zero value, you should take the simpler route and use array_create(). But if you're only goal is to just initialize the array with 0, then array_create() is worse than my method.

I jumped to conclusions and envisioned the point being to populate the array. My mistake.

GML:

var t1 = get_timer();
for (var i = 0; i < 2000; i++) {
    var a = [];
    a = array_create(100000);
    //a[31999] = 0;
    //a[63999] = 0;
    //a[65535] = 0;
    
}
var t2 = get_timer();
show_debug_message("Time = " + string((t2-t1)/1000000));

Code:

Time = 0.66

GML:

var t1 = get_timer();
for (var i = 0; i < 2000; i++) {
    var a = [];
    //a = array_create(100000);
    a[31999] = 0;
    a[63999] = 0;
    a[65535] = 0;
    
}
var t2 = get_timer();
show_debug_message("Time = " + string((t2-t1)/1000000));

Code:

Time = 1.33

If I am failing to do what you propose, I'd appreciate a specific pointer. I can't go to 3000 iterations as I'm running out of memory, due to the gc not kicking in (which would quite possibly skew the comparison anyway..)

TheouAegis · Sep 14, 2020

Interesting. Are you in 2.2 or 2.3? I remember testing this up to 2.2. if I had to take a gander at the huge discrepancy, I would say it probably has something to do with you are reusing the same variable. array_create() will create a new array reference, whereas setting a to an empty array is probably reusing the same array reference and as a result it is reallocating memory to the same array, which as I stated is the biggest performance hit.

chamaeleon · Sep 14, 2020

TheouAegis said:
Interesting. Are you in 2.2 or 2.3? I remember testing this up to 2.2. if I had to take a gander at the huge discrepancy, I would say it probably has something to do with you are reusing the same variable. array_create() will create a new array reference, whereas setting a to an empty array is probably reusing the same array reference and as a result it is reallocating memory to the same array, which as I stated is the biggest performance hit.

I am using 2.3. It just seems to me that array_create() should be optimal at allocating memory required (whether it's one contiguous memory area, or some number of areas of bounded size), and that initializing some particular array entries to force allocation would have the same amount of required work, plus the miniscule array index computation and assignment overhead. Unless there's something about array_create() guaranteeing initialization of all elements vs using the indexed method not doing that which throws a wrench in the thought experiment.

GML:

var t1 = get_timer();
for (var i = 0; i < 2000; i++) {
    a = [];
    //a = array_create(65536);
    a[31999] = 0;
    a[63999] = 0;
    a[65535] = 0;
}
var t2 = get_timer();
show_debug_message("Time = " + string((t2-t1)/1000000));
show_debug_message(gc_get_stats());

Code:

Time = 1.40
{ generation_collected : 0, num_generations : 4, num_objects_in_generation : [ 2086,0,0,1 ], objects_touched : 0, objects_collected : 0, traversal_time : 0, collection_time : 1, gc_frame : 0 }
Total memory used = 2104745902(0x7d73dfae) bytes

GML:

var t1 = get_timer();
for (var i = 0; i < 2000; i++) {
    //a = [];
    a = array_create(65536);
    //a[31999] = 0;
    //a[63999] = 0;
    //a[65535] = 0;
}
var t2 = get_timer();
show_debug_message("Time = " + string((t2-t1)/1000000));
show_debug_message(gc_get_stats());

Code:

Time = 0.47
{ generation_collected : 0, num_generations : 4, num_objects_in_generation : [ 2086,0,0,1 ], objects_touched : 0, objects_collected : 0, traversal_time : 0, collection_time : 1, gc_frame : 0 }
Total memory used = 2104745684(0x7d73ded4) bytes

Same number of objects registered in the gc, and close to 2GB of memory allocated, so I'm guessing the same allocations are taking place under the hood, making the two code samples an apples to apples comparison (which doesn't guarantee it is the comparison you have had in the past of course).

GML Data structures vs 2d arrays for storing room layer info?

Dario8676

Guest

RefresherTowel

Member

Bulletech

Member

RefresherTowel

Member

Nidoking

Member

Dario8676

Guest

TheouAegis

Member

chamaeleon

Member

TheouAegis

Member

chamaeleon

Member

TheouAegis

Member

chamaeleon

Member

TheouAegis

Member

chamaeleon

Member