Implementation of Built-In Data Structures

dapper

Member
Hi all- I'm having a harder time finding information on the time complexity and implementation of the built-in data structures than I thought, and I was hoping somebody could point me to an authoritative reference.

My primary concern at the moment is that if, say, the built-in ds_list implementation is a linked list, and I want to iterate over that linked list, I don't see a trivial (and fast) way of doing that. Absent some interesting implementation, iteration over the entire list should be very slow for large lists (O(n^2)).

I suppose I can get around this by, say, removing each entry from the list as I am done with it. and perhaps proactively recreating the list... but I think that's something of a pain! Is there an easier way of iterating over the entire list in O(n) time?

Thanks!

Update

Here's a TL; DR, of what we know.
  • Arrays:
    • C++ Platforms:
      • Implemented as a wrapper around an array, with size and out-of-bounds write detection
    • HTML5
      • Likely the same
  • Lists
    • C++ Platforms:
      • Essentially an array list (per Juju, per Mike Dailly)
    • HTML5:
      • Unknown, probably the same?
  • Grids
    • Almost certainly a 2D array.
  • Stacks
    • Probably backed by an array, with an internal index that is updated to the last element.
  • Queues
    • Unknown
  • Maps (Per Mike Dailly: see http://gmc.yoyogames.com/index.php?showtopic=544246)
    • Used to be sorted lists (O(log(n)) lookup; O(n) ordered linear iteration)
    • Now are true maps (O(1) lookup; O(n) unordered linear iteration)
    • In both cases iterating through an entire map should be about the same, but now, finding the first/next element may be slower, and won't return elements in sorted order anymore.
    • Hashing algorithm unknown; likely hashes on object ID
  • Priority Queue
    • Unknown
Edit: by authoritative reference, I don't mean on data structures in general: I mean on the time complexity of the data structures that GM provides.

Edit 2: I've edited the title because I think broadening the scope in this topic would be useful. I'm really curious about every implementation, not just lists.

Edit 3: Re-reading this, it's obvious that I had a very strong (albeit erroneous) assumption that the ds_list implementation was a linked list- so much so that I took it for granted. I hope this edit provides that missing context.
 
Last edited:
H

Homunculus

Guest
Edit: nevermind, I thought you were talking about arrays.

I don’t know how ds_lists are implemented, but if it’s a linked list under the hood, you may be right. I don’t think that’s the case though, it’s probably O(1) to access an element. If that’s a concern for you, arrays are the way to go.

It would be nice go know how data structures work under the hood though.
 
Last edited by a moderator:

dapper

Member
I don’t know how ds_lists are implemented, but if it’s a linked list under the hood, you may be right. I don’t think that’s the case though, it’s probably O(1) to access an element. If that’s a concern for you, arrays are the way to go.
Considering how most people seem to use them, that would make sense- but that would seem to largely defeat the point of using lists over arrays to begin with, other than the built-in find, shuffle, and sorting algorithms. There has to be some compromise, and knowing what that is could be absolutely critical to good performance- we're talking three orders of magnitude running time difference in iterating over a list with just 1000 elements!

(Since I can't post links yet- a discussion on this can be seen at the CS Stack Exchange, question 91156.

It would be nice go know how data structures work under the hood though.
I really don't think it would be out of place in the "Data Structures" section of GMS2's documentation- thrown under an "Advanced Use" section, if need be.

Edit: I've opened a ticket with YoYo Games as well, asking for the implementation and time complexity of all six built-in data structures. I'll post that information here, if I get it. I've also requested that it be added to the documentation- I really do think that it would be useful.

Edit 2: I made a really dumb point here (excised):

"that would also mean that GM doesn't have a list structure with O(1) insert (as, as far as I am aware, there is no data structure with constant-time access, insertion, and deletion)."

You wouldn't normally expect a list to have O(1) insert anyway. Not sure what I was thinking.
 
Last edited:

dapper

Member
This is pretty old data from pre-GMS2, but it may be helpful as a rough guide.
Thanks for digging this up! In general, I'm wary of benchmarks, as they come with dozens of stipulations (processor, processor load, RAM, OS, the structure of the benchmark itself, what optimizations you apply during compilation, and very much on how you time it- and what you count when you do). That's why I'm really interested in the theoretical time complexity, and how well these algorithms scale.

That said, I'd love to see more comprehensive benchmarks across multiple platforms/machines, if anybody has them- they keep you grounded in reality, and help you understand how much you can get away with on a given technology stack. Based on these benchmarks, it takes nearly an entire step just to initialize that array of 100,000 elements ! My test in C is 40 times faster (all things considered, GM is still pretty fast).

https://pastebin.com/sejDPps8

I have to admit, I'm also slightly suspicious of these benchmarks- if the ds_list is indeed a linked list, it seems very odd to me that the the access test would only be 3 milliseconds slower- it would (theoretically) require on the order of 10^4 times as many accesses just to traverse the list. GM has an entire runtime of its own- I really wonder how much of the running time in the timing function above is due to that, if any. Their timing function also measures real time (which measures the time passed since your computer has run this code and any other code it also decided to run) rather than usr + sys time (which only measures the amount of time spent running your program).
 

Nocturne

Friendly Tyrant
Forum Staff
Admin
On all platforms except HTML5, initialising an array using array_create() or in reverse order is MUCH faster as all the memory is allocated at once, while if you start at 0 and initialise up, there is a constant memory re-allocation.

EDIT: I've moved this to the advanced programming forum as I think it is a more appropriate place for such a discussion. All users that have posted so far have been given the appropriate permissions to keep posting.
 

rIKmAN

Member
Thanks for digging this up! In general, I'm wary of benchmarks, as they come with dozens of stipulations (processor, processor load, RAM, OS, the structure of the benchmark itself, what optimizations you apply during compilation, and very much on how you time it- and what you count when you do). That's why I'm really interested in the theoretical time complexity, and how well these algorithms scale.

That said, I'd love to see more comprehensive benchmarks across multiple platforms/machines, if anybody has them- they keep you grounded in reality, and help you understand how much you can get away with on a given technology stack. Based on these benchmarks, it takes nearly an entire step just to initialize that array of 100,000 elements ! My test in C is 40 times faster (all things considered, GM is still pretty fast).

https://pastebin.com/sejDPps8

I have to admit, I'm also slightly suspicious of these benchmarks- if the ds_list is indeed a linked list, it seems very odd to me that the the access test would only be 3 milliseconds slower- it would (theoretically) require on the order of 10^4 times as many accesses just to traverse the list. GM has an entire runtime of its own- I really wonder how much of the running time in the timing function above is due to that, if any. Their timing function also measures real time (which measures the time passed since your computer has run this code and any other code it also decided to run) rather than usr + sys time (which only measures the amount of time spent running your program).
I remembered seeing the post ages ago and as I said - thought it might be as a rough guide.

The code used to generate those benchmarks is posted on the Twitter thread if you wanted to run it yourself on your end and post the results for GMS2 - things may have improved or changed since those tests were done.

If you do, please post the results for comparison.
 

dapper

Member
I remembered seeing the post ages ago and as I said - thought it might be as a rough guide.

The code used to generate those benchmarks is posted on the Twitter thread if you wanted to run it yourself on your end and post the results for GMS2 - things may have improved or changed since those tests were done.

If you do, please post the results for comparison.
Please don't misunderstand me- I'm very appreciative that you went out of your way to help me. And, they do function well as a rough guide- these tests may in fact be showing that the ds_list is in fact not a linked list at all- or at the very least, that even if they are, the traversal is negligible compared to other factors. I'd still prefer to know the implementation details, but absent those, and even with them, benchmarks are also helpful.


Edit:
Interestingly enough, the first time I ran the benchmarks (cold run, first compile) creating and initializing an array of 100,000 entries, as well as accessing each element of both of these structures, took between 30-40 milliseconds on my computer.

However, by the third time I ran the benchmark, I got the following results:

|-----------------------------
| Array Create: 21 ms
|-----------------------------
|-----------------------------
| List Create: 25 ms
|-----------------------------
|-----------------------------
| Array Access: 16 ms
|-----------------------------
|-----------------------------
| List Access: 13 ms
|-----------------------------

The power of modern CPU caching is insane- and misleading, it would seem.

Of interest: it's actually faster to access the list.

Edit 2: I'm cutting out very wrong assertion I made in my last edit, talking about insertions to the list taking a long time proving that ds_list is not a linked list. The actual insertion is O(1)- but you have to traverse to the right place, and that's O(n). Duh.
 
Last edited:

kraifpatrik

(edited)
GameMaker Dev.
Code:
#macro SIZE 100000

var _t = get_timer();
var _arr = array_create(SIZE, 0);
for (var i = 0; i < SIZE; ++i)
{
    _arr[i] = i;
}
show_debug_message(get_timer() - _t);

var _t = get_timer();
for (var i = 0; i < SIZE; ++i)
{
    var _val = _arr[i];
}
show_debug_message(get_timer() - _t);

var _t = get_timer();
var _list = ds_list_create();
for (var i = 0; i < SIZE; ++i)
{
    ds_list_add(_list, i);
}
show_debug_message(get_timer() - _t);

var _t = get_timer();
for (var i = 0; i < SIZE; ++i)
{
    var _val = _list[| i];
}
show_debug_message(get_timer() - _t);
Results

Code:
Array create 3861 μs
Array iter   1564 μs
List create  7135 μs
List iter    4728 μs
I guess you're doing something wrong.
 

dapper

Member
Code:
Array create 3861 μs
Array iter   1564 μs
List create  7135 μs
List iter    4728 μs
I guess you're doing something wrong.
Code:
var time, a;
time = current_time;

var i, array;
for(i = 100000; i >= 0; i--){
    array[i] = 1;
}

time = current_time - time;
show_debug_message("|-----------------------------");
show_debug_message("| Array Create: " + string(time) + " ms");
show_debug_message("|-----------------------------");

/////////////////////////////

time = current_time;

var list = ds_list_create();

for(i = 0; i < 99999; i++){
    ds_list_add(list, 1);
}

time = current_time - time;
show_debug_message("|-----------------------------");
show_debug_message("| List Create: " + string(time) + " ms");
show_debug_message("|-----------------------------");

/////////////////////////////

time = current_time;


for(i = 99999; i >= 0; i--){
    a = array[i];
}

time = current_time - time;
show_debug_message("|-----------------------------");
show_debug_message("| Array Access: " + string(time) + " ms");
show_debug_message("|-----------------------------");

/////////////////////////////

time = current_time;

for(i = 0; i < 99999; i++){
    a = ds_list_find_value(list, i);
}

time = current_time - time;
show_debug_message("|-----------------------------");
show_debug_message("| List Access: " + string(time) + " ms");
show_debug_message("|-----------------------------");

Maybe- what OS/processor did you test on? I'm on MacOS Mojave, with a 2.7GHz i7- I probably had a bunch of other programs running too. I'm really only glancing, but I'm not seeing a big difference between what I ran and what you ran, unless there's some hidden transpilation optimizations that get applied in your code and not mine.

Lists are arrays internally, not linked lists.
Thanks- that's good to know! I guess I have three questions, if you can answer them:
  • Where were you able to get that information? Did you have to read the transpiled code?
  • Do you know if it's the same on other platforms?
  • Do you have insight into any other data structures?
I realize that the initial scope of this topic was the implementation of ds_lists, but given that it seems like it's the only topic of its kind, I'm going to edit the original post to expand to the other data structures as well.

In the interest of discussion, I have to say, if ds_lists are just arrays...well, it really leaves me confused as to what the point of having ds_lists are. I know that there are a lot of tutorials/questions about that very thing- and most of the time, they focus on the fact that some of the time, lists are faster, and some of the time, arrays are faster- the differences appear to be minor for most use cases.

But, at the core, we now have:

Arrays:
  • Dynamically expand to accommodate more entries
  • Syntactic sugar, with accessors (@), and bracket definition
  • Automatic garbage collection
  • O(1) Access, O(n) insertion, O(n) deletion
Lists:
  • Dynamically expand to accommodate more entries
  • I'm unsure if they shrink to accommodate fewer entries- but I'd doubt it.
  • Limited syntactic sugar (accessor notation)
  • Manual garbage collection
  • Built-in functions (that could in fairness be implemented in 30 minutes for arrays)
  • Forced to use lists for built-in JSON functions
  • O(1) Access, O(n) insertion, O(n) deletion
In light of this information alone (that is, without knowing anything about the backend differences), it's hard to understand why GameMaker even includes ds_list, except maybe as a learning tool to ease you into the other data structures. It is functionally identical to arrays- its only advantage can be eliminated with a script library that takes no time to write.

I can imagine that changing the implementation (not the interface) could cause a big slowdown to a lot of games that assume or have accounted for the array-backed implementation.

I wonder what the effort would be in adding a new data structure across all platforms, e.g. having both a ds_linked_list and ds_array_list.
 
Last edited:
H

Homunculus

Guest
A few things to note about your test @dapper :
  • You are adding items to arrays and ds_lists in reverse order while kraifpatrik does the opposite. Both valid tests but you can't really compare the results given the different setup. Moreover, while reverse initializing is known to be more efficient, you can't always know in advance the size (which is the point of having dynamically expanding lists)
  • You are using current_time and not get_timer. Probably doesn't make a huge difference, but get_timer is definitely more precise

I agree that right now arrays and ds_lists overlap in functionality for some use cases, especially since GM arrays are not fixed size. Probably better array management in GML could one day replace ds_lists, but we're not there yet (also for legacy compatibility reasons). The main difference for me, other than the extra functionality like sorting and shuffling, is that ds_lists are expected to shrink when removing element, while arrays (as of now) don't, since you can't really remove anything.

I'd like to stress though that a linked list is just a type of list structure, but definitely not the only one, I never expected ds_lists to be linked lists to begin with. If you are familiar with java, you can probably compare ds_lists to the ArrayList class.
 

dapper

Member
A few things to note about your test @dapper :
  • You are adding items to arrays and ds_lists in reverse order while kraifpatrik does the opposite. Both valid tests but you can't really compare the results given the different setup. Moreover, while reverse initializing is known to be more efficient, you can't always know in advance the size (which is the point of having dynamically expanding lists)
  • You are using current_time and not get_timer. Probably doesn't make a huge difference, but get_timer is definitely more precise
Thank you for pointing these out. I completely agree that the two results should not be compared.
  • Actually though, I only added in reverse order to arrays- I was still operating under my initial presumption that lists were linked lists, in which case, it shouldn't have made a difference in their timing.
A few things to note about your test @dapper :
I agree that right now arrays and ds_lists overlap in functionality for some use cases, especially since GM arrays are not fixed size. Probably better array management in GML could one day replace ds_lists, but we're not there yet (also for legacy compatibility reasons). The main difference for me, other than the extra functionality like sorting and shuffling, is that ds_lists are expected to shrink when removing element, while arrays (as of now) don't, since you can't really remove anything.
They do actually shrink? Huh, I was being kind of facetious. I don't think the documentation mentions that at all.

A few things to note about your test @dapper :
I'd like to stress though that a linked list is just a type of list structure, but definitely not the only one, I never expected ds_lists to be linked lists to begin with. If you are familiar with java, you can probably compare ds_lists to the ArrayList class.
I apologize if I've been coming across implying: "linked lists are the only thing that count as lists, and GameMaker is obviously doing lists wrong". My primary concern is just that knowing the time complexity of the data structures can be critical to good performance.

Of course, there's absolutely nothing stopping me or anybody else from implementing their own data structures in GameMaker. Nor are they necessarily hard to make- it's just that, absent proper documentation, it's hard to take full advantage of what is built in.

I might try my hand at a few and see what kind of performance I can get. If nothing else, a well-documented community repository of absent data structures would be of some value, I think.

Edit: I've started up a (sparse at the moment) repository for that purpose here:

https://github.com/dapper-gmc/gml-data-structures

Edit 2: I cut out a bunch of nonsense. For whatever reason, I've been thinking linked lists are O(1) to insert/delete- they aren't, unless you already have a reference to the right location.
 
Last edited:

kraifpatrik

(edited)
GameMaker Dev.
What kind of deleting values from an array are we talking about? Arrays in GM are the same as C++ arrays, except it's also wrapped in a structure that remembers its size and does the reallocation when you write outside of its range for you. Array reallocation is O(n). Reading and writing is O(1). As for the lists, we don't know their implementation. I don't know where Juju got that info from. The available C++ source codes do not mention any of the ds_.
 

Juju

Member
  • Where were you able to get that information? Did you have to read the transpiled code?
  • Do you know if it's the same on other platforms?
  • Do you have insight into any other data structures?
  • I asked Mike Dailly (before he left YYG).
  • Not sure about HTML5, but it's the same across the other platforms. It'll be possible to work out what HTML5 is doing from reading the obfuscated JS.
  • Grids are 2D arrays, Maps are hashtables. Other data structures I don't know about.
 

GMWolf

aka fel666
Yeah i remember clearly that ds_lists are array lists. And ds_maps are hash maps.

I think array list was mentioned by YYG employees a few times on the forums.
And I remember that maps didn't use to be hash maps, and then there was an update and the update notes said hash maps where now used.


Ds_grid is probably a 1D array with fancy accessing functions. That would be the logical structure to use.

A stack is also most probably a growable array.

Queues are though. They could either be a ring buffer. Or they could be more complex, semicontiguous data structure.

Priority queues are probably contiguous, but the access probably isn't. I'm guessing some like a heap, or Fibonacci heap. (Measuring the performance of change priority should give us a clue)


Really, contiguous (and semi contiguous) data structures are king.
What's the point of having O(1) insertion if seeking to the place to insert takes ages?
Remember that big O is not the whole story.
Accessing something that is contiguous will be almost instant. The cache predictor will already have brought it in to l1 cache and you are wasting very few cycles.
Accessing something random, like I'm every step of a linked list will take forever. The memory has to be fetched all the way from ram. This takes many, many cycles to do. And if your cpu has nothing else to do it just has to wait.

Unless you are dealing with super large, multi gigabyte data bases. Stick with what your cpu likes to use. Not what math claims to be most efficient.
 

kraifpatrik

(edited)
GameMaker Dev.
Talking about cache, right-hand values in GM are stored in structures, so I don't think that even arrays are cache friendly.
 

dapper

Member
What kind of deleting values from an array are we talking about? Arrays in GM are the same as C++ arrays, except it's also wrapped in a structure that remembers its size and does the reallocation when you write outside of its range for you. Array reallocation is O(n). Reading and writing is O(1). As for the lists, we don't know their implementation. I don't know where Juju got that info from. The available C++ source codes do not mention any of the ds_.
I was talking about ds_list: Catan mentioned that they shrink when you remove values- I was wondering under what circumstances that might happen.

Edit: But GMWolf just snapped me back to reality and reminded me that linked lists are O(n) insertion and deletion. My point really is that means you might sometimes have a 2 O(n) operations and sometimes just one O(n) operation for array list deletion (so sometimes it takes twice as long), meanwhile, a linked list will have a single O(n) operation. So, the array list will likely be faster, most of the time- but it will sometimes spike to twice as long. Whereas, a linked list will be slower in the general case, but it won't have those 2x running time spikes.

Yeah i remember clearly that ds_lists are array lists. And ds_maps are hash maps.

Really, contiguous (and semi contiguous) data structures are king.
What's the point of having O(1) insertion if seeking to the place to insert takes ages?

Remember that big O is not the whole story.
Accessing something that is contiguous will be almost instant. The cache predictor will already have brought it in to l1 cache and you are wasting very few cycles.
Accessing something random, like I'm every step of a linked list will take forever. The memory has to be fetched all the way from ram. This takes many, many cycles to do. And if your cpu has nothing else to do it just has to wait.

Unless you are dealing with super large, multi gigabyte data bases. Stick with what your cpu likes to use. Not what math claims to be most efficient.
That's a really good point- you snapped me out of my braindead assertion in this entire thread that linked list insertion is O(1). I've gone back and cleaned up that nonsense. It seems that either way, insertion into an arbitrary place is going to be slower than prepending or appending.

And of course, data size always matters. Sometimes complexity matters, and sometimes the constant factors are too high to be worth it. I started this thread concerned that ds_list was a linked list, and that "naive" iteration would be super redundant- in this case, I think that the complexity probably matters more than the cache- but the lack of caching would make it extra slow.

Pitting a series of cache-jumping O(1) or log(n) operations against a cache-friendly O(n) operation can be a toss-up. The complexity matters more when you have an operation that could normally be O(n), but done a certain way, is O(n^2). It blows up a lot faster: suddenly 1000 operations become 1 million.

I'll admit, I'm hard-pressed to think of another situation like the one that spurred this thread. Probably something with graphs would fit the bill.

Talking about cache, right-hand values in GM are stored in structures, so I don't think that even arrays are cache friendly.
You think maybe a buffer would be faster? Just for kicks, I'm considering write my own memory manager using a buffer, and seeing about using that as dynamic memory to implement a few data structures.

And thank you, Juju and GMWolf. I'm going to update my original post to summarize what we do know.

Edit: I know it's not GML, but this Stack Overflow post does a good job discussing impacts on garbage collectors as well: https://stackoverflow.com/a/10657553

Edit 2: As another example of when array efficiency is not enough to overcome time complexity, or at least when it matters that you know the efficiency, in the repository I posted above, I tested prepending (insert at position 0) to a list, with both a linked list (that uses arrays as nodes) and a ds_list. Unsurprisingly, prepending 100k elements takes about 20 times longer using the built-in ds_list. This is not terribly surprising (the head node is probably in the cache, and by default you're doing 50k more operations on average with the ds_list)- in fact, it is probably surprising that it is so close, demonstrating how much arrays have in their favor. But this is still just 6.4 megabytes (100k reals) being stored- if you don't know that ds_lists are array lists and not linked lists, this can trip you up.
 
Last edited:

GMWolf

aka fel666
You think maybe a buffer would be faster?
I think buffers are nice for perofrmance when you are dealing with large ammounts of data that needs to all fit in cache.
otherwise its more of an IO/Memory thing.
But yeah, give it a go, let us know what you finding are
 

dapper

Member
I think buffers are nice for perofrmance when you are dealing with large ammounts of data that needs to all fit in cache.
otherwise its more of an IO/Memory thing.
My thinking is that I'll be able to eliminate some of the storage overhead and maybe even mitigate fragmentation by working at the byte level and creating bespoke management policies for particular applications.

In other news, Yoyo got back to me: long story short, as per the roadmap, lots of exciting changes are coming to GM. So, rather than document this information now and have it potentially be confusingly outdated in just a few months, they'll hold off on that effort.
 

vdweller

Member
Array access is indeed much faster than lists, I replaced lists with arrays as nodes in a pathfinding algorithm and got a large speed boost. However, as mentioned here, I am quite suspicious that even arrays aren't fast enough in GM. I hope I'm wrong.
 

kraifpatrik

(edited)
GameMaker Dev.
Even though GML transpiles to C++ with YYC, the arrays are indeed not as fast as if you wrote them by hand, not using any wrappers for types etc.

I've compared speed of generated code for this GML

Code:
// I've used repeat here because that is the fastest loop in YYC
var i = 0;
repeat (100000)
{
    array[i] = i;
    ++i;
}
with this C++

Code:
// Here the array is `int array[100000];`
for (int i = 0; i < 100000; ++i)
{
    array[i] = i;
}
using methods described in this thread and the results were

Code:
YYC 3871 μs
C++   28 μs
If you are doing something that needs to be as fast as possible and you can write a dynamic library for that, I would recommend you doing so. Just use a buffer instead of an array, pass its address and size to the library and let it do the job. Or you could try modifying the generated code, which is a bit more tedious right now.
 

Nocturne

Friendly Tyrant
Forum Staff
Admin
Does your code above initialise the array first? Or is it simply adding each iteration? So like this:

Code:
array = array_create(100000, 0);
var i = 0;
repeat (100000)
{
   array[i] = i;
   ++i;
}
Otherwise it WILL be very slow as the memory will be getting reallocated every time you increase the size of the array.
 

Pfap

Member
Even though GML transpiles to C++ with YYC, the arrays are indeed not as fast as if you wrote them by hand, not using any wrappers for types etc.

I've compared speed of generated code for this GML

Code:
// I've used repeat here because that is the fastest loop in YYC
var i = 0;
repeat (100000)
{
    array[i] = i;
    ++i;
}
with this C++

Code:
// Here the array is `int array[100000];`
for (int i = 0; i < 100000; ++i)
{
    array[i] = i;
}
using methods described in this thread and the results were

Code:
YYC 3871 μs
C++   28 μs
If you are doing something that needs to be as fast as possible and you can write a dynamic library for that, I would recommend you doing so. Just use a buffer instead of an array, pass its address and size to the library and let it do the job. Or you could try modifying the generated code, which is a bit more tedious right now.
What would you suggest to those users who have an idea what you are talking about, but know how much work it would take?
 

kraifpatrik

(edited)
GameMaker Dev.
What would you suggest to those users who have an idea what you are talking about, but know how much work it would take?
That's tough. Invest the time into it if it's worth it? Offload the job to someone else? Keep it GML only and spread the algorithm over multiple frames? Come up with a compromise? Give up on it? Switch an engine? I don't know :D
 

Pfap

Member
That's tough. Invest the time into it if it's worth it? Offload the job to someone else? Keep it GML only and spread the algorithm over multiple frames? Come up with a compromise? Give up on it? Switch an engine? I don't know :D
Thanks, I already knew just asking to be facetious... because thanks, and keep up the good work :)
 

GMWolf

aka fel666
Even though GML transpiles to C++ with YYC, the arrays are indeed not as fast as if you wrote them by hand, not using any wrappers for types etc.

I've compared speed of generated code for this GML

Code:
// I've used repeat here because that is the fastest loop in YYC
var i = 0;
repeat (100000)
{
    array[i] = i;
    ++i;
}
with this C++

Code:
// Here the array is `int array[100000];`
for (int i = 0; i < 100000; ++i)
{
    array[i] = i;
}
using methods described in this thread and the results were

Code:
YYC 3871 μs
C++   28 μs
If you are doing something that needs to be as fast as possible and you can write a dynamic library for that, I would recommend you doing so. Just use a buffer instead of an array, pass its address and size to the library and let it do the job. Or you could try modifying the generated code, which is a bit more tedious right now.
Although I'm not surprised at the result, I'm still curious to know; did you make sure that array and loop didn't just get optimized away?
 

kraifpatrik

(edited)
GameMaker Dev.
Although I'm not surprised at the result, I'm still curious to know; did you make sure that array and loop didn't just get optimized away?
Not at first, but since you mentioned it, I've just tried to compile that code into assembly with clang (the executable that comes with GM) with -O3 and it does not get thrown away, so I assume the test was the same case.
 

dapper

Member
Does your code above initialise the array first? Or is it simply adding each iteration?

Otherwise it WILL be very slow as the memory will be getting reallocated every time you increase the size of the array.
In my testing yesterday, I was finding that initialization of the array, using array_create, was actually slower, or at best, about the same.
  1. Growing array with loop: 1,339,341 microseconds
  2. Initializing last element (arr[last index] = 1) before loop: 1,704,840 microseconds
  3. Same as above, with reverse iteration: 41,010 microseconds
  4. Array create, by itself; 4,374 microseconds
  5. Using array create to allocate but initializing by iterating forward; 3,665,261 microseconds
  6. Using array create to allocate but initializing by iterating in reverse; 39,257 microseconds
Edit: I took those numbers from a relatively fresh run- after doing several other runs and heating up the cache, I get the following:
  1. Script: benchmark_array_grow_append; 1,364,877 microseconds
  2. Script: benchmark_array_alloc_append; 1,358,466 microseconds
  3. Script: benchmark_array_alloc_reverse; 15,836 microseconds
  4. Script: benchmark_array_create; 1702 microseconds
  5. Script: benchmark_array_create_append; 1,345,584 microseconds
  6. Script: benchmark_array_create_reverse; 14,567 microseconds
I haven't seen allocation (using array_create) meaningfully reduce the time it takes to iterate through the entire list and manually initialize every element (using array_create to do so, however, is extremely fast- and reverse iteration makes a huge difference, regardless).

Edit 2: Actually, I did not use
Code:
array = array_create(100000, 0);
but rather
Code:
array = array_create(100000);
In my tests, manual iteration after the first statement is very fast. The second, however, is essentially useless, unless you are doing reverse iteration.

Script: benchmark_array_create_append; 1414047 microseconds // second statement
Script: benchmark_array_create_init_append; 16698 microseconds // first statement
Script: benchmark_array_create_reverse; 15292 microseconds // second statement
Script: benchmark_array_create_init_reverse; 16570 microseconds // first statement
 
Last edited:

vdweller

Member
I have a question about ds_priority: I am working on an A* pathfinding algorithm and I am using a ds_priority data structure for the open list. In long paths, about 30% of the path calculation time goes to maintaining the ds_priority list (inserting nodes, finding min nodes). This compiles with YYC.

I was wondering if it would be worth it implementing a custom data structure like a pairing heap which is faster than a normal binary heap (I suspect ds_priority is not a pairing heap). Has any of you fiddled with such a custom data structure? I am asking because I am highly suspicious that the performance of a custom data structure as a bunch of custom scripts will be much worse than an already built-in data structure since YYC is still way slower than C++, so even if in theory binary heaps are faster, their implementation could actually be slower. What are your opinions on this?
 

GMWolf

aka fel666
I have a question about ds_priority: I am working on an A* pathfinding algorithm and I am using a ds_priority data structure for the open list. In long paths, about 30% of the path calculation time goes to maintaining the ds_priority list (inserting nodes, finding min nodes). This compiles with YYC.

I was wondering if it would be worth it implementing a custom data structure like a pairing heap which is faster than a normal binary heap (I suspect ds_priority is not a pairing heap). Has any of you fiddled with such a custom data structure? I am asking because I am highly suspicious that the performance of a custom data structure as a bunch of custom scripts will be much worse than an already built-in data structure since YYC is still way slower than C++, so even if in theory binary heaps are faster, their implementation could actually be slower. What are your opinions on this?
If you are going to update priorities, then using a priority queue is probably not worth it. It's not a fib heap or anything that fancy.

Just keeping a ds_list and linearly iterating over it will probably end up faster, especially with small ish graphs, or low number of edges per node, where the open list doesn't immediately explode.
 

vdweller

Member
If you are going to update priorities, then using a priority queue is probably not worth it. It's not a fib heap or anything that fancy.

Just keeping a ds_list and linearly iterating over it will probably end up faster, especially with small ish graphs, or low number of edges per node, where the open list doesn't immediately explode.
Fibonnaci heaps are largely unused, and are some sorth of theoretical construct. In reality they may be even more cumbersome than binary heaps, according to research papers.

Iterating is extremely slow since we are talking about lots of nodes (>10,000). In fact, iterating is almost universally the single worst thing to do when looking for a min node in pathfinding, in all programming languages. There is a reason all university courses suggest a bin heap disguised as a priority list.
 
Last edited:

GMWolf

aka fel666
Fibonnaci heaps are largely unused, and are some sorth of theoretical construct. In reality they may be even more cumbersome than binary heaps, according to research papers.

Iterating is extremely slow since we are talking about lots of nodes (>10,000). In fact, iterating is almost universally the single worst thing to do when looking for a min node in pathfinding, in all programming languages. There is a reason all university courses suggest a bin heap disguised as a priority list.
It really depends on the number of nodes you have in that openlist.
Big O performance is not everything. Cache is very important too!
But as usual, benchmark, benchmark, benchmark...
 
Top