• Hey Guest! Ever feel like entering a Game Jam, but the time limit is always too much pressure? We get it... You lead a hectic life and dedicating 3 whole days to make a game just doesn't work for you! So, why not enter the GMC SLOW JAM? Take your time! Kick back and make your game over 4 months! Interested? Then just click here!

GameMaker Array Pre-Allocation/Reverse Initialization

dapper

Member
I had always thought that initializing the last element in an array would allocate all the space an array would need. Is this not correct? For example:

Code:
var arr;

arr[99999] = 1;

var i;

for (i = 0; i < 100000; i++) {
    arr[i] = 1;
}
However, when I compare that code block (which uses in-order initialization) to this code block (which uses reverse initialization):
Code:
var arr;

arr[99999] = 1;

var i;

for (i = 99999; i >= 0; i--) {
    arr[i] = 1;
}
The former runs in 1,424,424 microseconds (on my machine)- the latter runs in 15,144 microseconds (100 times faster).

Edit: the same behavior results if I use array_create to allocate the arrays- and using array_create by itself to also initialize them is 10 times faster still than manual reverse initialization. And when I manually initialize an array sequentially (0-100k), it runs in about the same time as the first code snippet.

So, on one hand, reverse initialization seems like the faster access pattern, even when the arrays have already been created. On the other, manually initializing allocated arrays is just as slow as manually initializing unallocated arrays.

I really was not expecting this- I was under the impression that the majority of the time used in array initialization was due to array re-sizing. Any insight into these results would be appreciated.
 
Last edited:

Yal

🐧 *penguin noises*
GMC Elder
Interesting findings! I think there's more stuff than memory allocations to worry about, like branch prediction, cache invalidations and the like.... the underlying stuff that the machine code does. Kinda hard to tell without knowing how the implementation was done. My guess is that since allocating the space fills the array with default values, writing new different values forces an update that could cause writes to the proper memory instead of the cache. But that's just a guess.
 

TheouAegis

Member
There are a couple issues here.

1) An array of 10000 elements is enormously taxing on the system. That's why there's a disparity of over 100x. The smaller the array, the less the disparity.
2) Pre-initializing the final element of an array does indeed speed things up. Not doing so in your first code would drastically slow the code down. It's irrelevant in the second code, however.
 

Yal

🐧 *penguin noises*
GMC Elder
1) An array of 10000 elements is enormously taxing on the system. That's why there's a disparity of over 100x. The smaller the array, the less the disparity.
Is it really? If the array contain numbers, it's 10k doubles... or 80 kB. Kilobytes.
 

Joe Ellis

Member
This isn't that unusual, they say you should initialize arrays backwards, and there's obviously some reason why it's 100 times faster,
Allocating the max array index first is a good move, at least from what I've heard, and also makes sense with it having to reallocate a slot once the length of the array changes, but the reason why iterating through it backwards is faster is something I don't know why, and I'd like to know, @anyone?
 

FrostyCat

Redemption Seeker
One potential explanation happens at the assembly level, where arithmetic comparisons against 0 can be done with a single command while the equivalent for other numeric values in general requires an additional "load literal" command.

I can also confirm the result on my own laptop, and while I find it a little hard to believe, in any case the 100x performance difference is definitely proven and remarkable.
 

Joe Ellis

Member
@FrostyCat I thought that about the i >= 0 part, that could easily be compiled into a much simpler thing than if < 100000, and that could have a very serious part in the comparison,
cus as far as I know, pre-allocating the space for an array sets it in stone, then I didn't think which direction you wrote to each index in matters,
This being said, most of the time when you read or write to an array, it's sparse, certain indices at specific moments, and this just boils down to the speed of reading and writing to arrays, the actual indices you deal with is inherently irrelevant, it's just doing what it needs to do, and in most circumstances, arrays are lightning fast
 

dapper

Member
2) Pre-initializing the final element of an array does indeed speed things up. Not doing so in your first code would drastically slow the code down. It's irrelevant in the second code, however.
That's the thing- I'm finding it has basically no impact at all (tests 1, 2 and 5 below)
  1. Growing array with loop; 1,364,877 microseconds
  2. Initializing last element (arr[last index] = 1) before loop; 1,358,466 microseconds
  3. Same as above, with reverse iteration; 15,836 microseconds
  4. Array create with a given default value; 1702 microseconds
  5. Using array create to allocate but initializing by iterating forward 1,345,584 microseconds
  6. Using array create to allocate but initializing by iterating in reverse; 14,567 microseconds
Upon further testing, array initialization using array_create and passing a default value (i.e., array_create(100, 0)) provides a speed up. However, just using array_create (i.e., array_create(100)) does not- at least, for me.

Script: benchmark_array_create_append; 1414047 microseconds
Script: benchmark_array_create_init_append; 16698 microseconds (create with a default)
Script: benchmark_array_create_reverse; 15292 microseconds
Script: benchmark_array_create_init_reverse; 16570 microseconds (create with a default)

My own testing code is here if anybody wants to clone and run for themselves (warning: WIP).

One potential explanation happens at the assembly level, where arithmetic comparisons against 0 can be done with a single command while the equivalent for other numeric values in general requires an additional "load literal" command.
It'd be satisfying if this (or something similarly clean and concise) was the answer. My first question would then be, "Why doesn't this happen in other programming languages?" But, to be fair, I've really never tested or thought about this in other programming languages.
 
Last edited:

Joe Ellis

Member
I don't think you need to worry about this, however fast either way is, is fast enough to do without any lag, of course I always want to know which way is fastest, but I also know that the built in "array_create" is always fastest cus it's not an interpreted and executed script, it's a function, which are more hard coded and un-flexible, which makes them faster cus they're not dynamic like the rest of the game code.
But anything you do with arrays should be perfectly fast enough to work, as long as you're not needlessly reading through them say every step, which you're not cus you seem to be pretty on the ball with being fast, so yeah, don't worry about it. There will be reasons why certain things are faster but no one on here will know why, I doubt even yoyo know why, it's something to do with the way c++ works
 

TsukaYuriko

☄️
Forum Staff
Moderator
One potential explanation happens at the assembly level, where arithmetic comparisons against 0 can be done with a single command while the equivalent for other numeric values in general requires an additional "load literal" command.

I can also confirm the result on my own laptop, and while I find it a little hard to believe, in any case the 100x performance difference is definitely proven and remarkable.
I tested this theory by changing the second code's loop to compare i against a few values other than 0 (1, 512, 90000...) and it seems like it doesn't slow it down.

Upon further testing, array initialization using array_create and passing a default value (i.e., array_create(100, 0)) provides a speed up. However, just using array_create (i.e., array_create(100)) does not- at least, for me.
I can observe the difference between array_create with and without a default value on my end as well - with it, the first code is roughly as fast as the second code.

My theory would be that array_create, or any form of array initialization, when not passed a default value or otherwise explicitly assigning a value to every index, allocates the array, but leaves the contents of its indices in some sort of "limbo" state for the time being, causing the first time an array element is accessed to take longer to complete than subsequent accesses as the index first needs to be assigned a value. I tested it with this code:
Code:
show_debug_message("------");

var start    = get_timer(),
    arr      = array_create(99999),
    i;


show_debug_message("Array creation: " + string(get_timer() - start));
start = get_timer();

for (i = 0; i < 50000; i++)
{
    arr[i] = 1;
}

show_debug_message("Setting half of it: " + string(get_timer() - start));
start = get_timer();

for (i = 0; i < 50000; i++)
{
    arr[i] = 1;
}

show_debug_message("Setting half of it (again): " + string(get_timer() - start));
start = get_timer();

for (i = 50000; i < 100000; i++)
{
    arr[i] = 1;
}

show_debug_message("Setting the rest of it: " + string(get_timer() - start));
start = get_timer();

for (i = 50000; i < 100000; i++)
{
    arr[i] = 1;
}

show_debug_message("Setting the rest of it (again): " + string(get_timer() - start));

I ran it in a loop 5 times to ensure my CPU isn't in some sort of power-saving state. Output:
------
Array creation: 545
Setting half of it: 486812
Setting half of it (again): 5912
Setting the rest of it: 980200
Setting the rest of it (again): 5714
------
Array creation: 30
Setting half of it: 574942
Setting half of it (again): 6441
Setting the rest of it: 814719
Setting the rest of it (again): 6209
------
Array creation: 59
Setting half of it: 573550
Setting half of it (again): 5862
Setting the rest of it: 810276
Setting the rest of it (again): 6105
------
Array creation: 40
Setting half of it: 579561
Setting half of it (again): 5860
Setting the rest of it: 804321
Setting the rest of it (again): 5951
------
Array creation: 32
Setting half of it: 576774
Setting half of it (again): 6222
Setting the rest of it: 805297
Setting the rest of it (again): 6272

Then I changed the code so that array_create has a default value of 0:

------
Array creation: 1514
Setting half of it: 5967
Setting half of it (again): 5663
Setting the rest of it: 5929
Setting the rest of it (again): 5716
------
Array creation: 1163
Setting half of it: 5774
Setting half of it (again): 5925
Setting the rest of it: 5868
Setting the rest of it (again): 5600
------
Array creation: 1097
Setting half of it: 6003
Setting half of it (again): 5584
Setting the rest of it: 5910
Setting the rest of it (again): 6150
------
Array creation: 1096
Setting half of it: 5930
Setting half of it (again): 5865
Setting the rest of it: 5679
Setting the rest of it (again): 5835
------
Array creation: 1061
Setting half of it: 5870
Setting half of it (again): 5695
Setting the rest of it: 5838
Setting the rest of it (again): 5863

If anyone would like to reproduce this on their end, that would be very appreciated.

If this is correct, it indicates that there is indeed a slowdown the first time an array index that's in "limbo" is accessed, but I'm unsure exactly why. It might also be related to caching, though - I'd need to test this further.
 

dapper

Member
I don't think you need to worry about this, however fast either way is, is fast enough to do without any lag, of course I always want to know which way is fastest, but I also know that the built in "array_create" is always fastest cus it's not an interpreted and executed script, it's a function, which are more hard coded and un-flexible, which makes them faster cus they're not dynamic like the rest of the game code.
I totally agree- it's (exceedingly rarely) something to be actually worried about when using GM, and a premature optimization for anybody who wants to make games that scale well to multiple platforms. I just think it's interesting- I've never seen this loop behavior before in another language.

I also don't want to go around filling my code with messy last-element initialization statements if it's not actually going to make a difference haha.
 

TheouAegis

Member
Well as I said, in my tests, your array was just too big. LOL i reduced the size of the array to 10% (removed 1 zero) and the differences in speed were negligible. Furthermore, once the array got down to even 10000 elements, the gains from pre-initializing were more apparent. That significant performance hit from exceeding 32000 elements nearly negates any pre-init gains because by then the measurements are so long. With that said I concede the possibility that preinitializing only yields "gains" for arrays under 32000 elements.

And when I do tests, i loop them 10 to 100 times (depending on the complexity) and output the averages, then I close the program, rerun it, and compare results.

Addendum: I think the old array size limit may be at play here, now that I consider it. We had a limit of 32,000 in the old versions, then they removed the limit, right? You are building an array of size 100000, which is 3.125x larger than the old limit. So if GM is still operating within its former constraints, that means a 100000 element array is actually comprised of 4 array ranges. So if I had to take a gander, I'd say the preinitialization only initializes ONE of the array ranges. Of course this might be easily countered if one were to access elements 0, 32000, 64000, 96000, and 99998 in separate tests and have unfettered access to each one, but such is my hypothesis there. This has nothing to do with array_create() matters, just normal array codes in general. But in any case, i would predict that a 32000 element array would yield narrow discrepancies, a 64000 array would yield wider discrepancies, a 96000 would yield more, and the 100000 element array would yield the greatest of them all. I predict 100000 and 127000 would nearly identical results, while 129000 elements would be drastically slower.

I went back and tested the array initializations in GM8, thinking maybe the mechanics changed between legacy and Studio. I got pretty much the same results, though -- preinit plus low-start was the same speed as preinit plus high-start, but without the preinit, low-start was slower, and on average a preinit plus low-start was still as fast as high-start with no init. Which again leads me to believe the issue we are seeing is large arrays are composites of other arrays which themselves do not get initialized.

Update: odd to say cuz I never posted this yet... But I verified my findings. Only the highest 32000 range gets preinitialized. So in this case, index 96000 and beyond is set to 0, whilst 95999 and lower are uninitialized. Thus initializing every last element of each range will nearly negate the discrepancy. So set 31999, 63999, and 95999 along with 99999 and the performance hit is gone.

Case closed. You can mark this thread as <SOLVED> :cool:
 
Top