chunk handling / negative arrays?

saffeine

Member
no prefix for this one because it's more of a general question than anything ( for what it might matter, i AM using GMS2.3+ ), but i'm in the process of developing a game and a development level editor / chunk builder, which is less important than the game itself but will help a lot further down the line. in advance, i want to make sure it's pretty well optimised from start to finish, but can't for the life of me figure out how to make a canvas that can be expanded in every direction, and i'm also interested in hearing how other people would approach the chunk handling functions.

my current plan is something along the lines of:
Code:
1.    create a new 2D array that holds a string for each chunk, or populate it with externally loaded strings, we'll call it chunkList.
    - if the level doesn't exist, the array starts with a size of ( 1, 1 ) and resizes as the level is built.
    - if the level does exist, the array either reads the size from the file or just expands as it's loaded.
   
2.    find the camera bounds, calculate which chunks should be loaded, and run a for loop from the top-left to the bottom-right.
3.    ( 2.3+ specific ) for each index of chunkList calculated from the previous step, create a new constructor instance of Chunk() and decode the string into it.
4.    add each loaded chunk to a new array that is looped through to update the contents during the step event, we'll call this loadedChunks.
5.    periodically check each index of loadedChunks, and if it's outside the view, remove it from the list.
6.    when the chunk is unloaded, save all the information to a buffer, encode the buffer, then update its index in chunkList.
7.    when the game is saved ( or on a cycle ), encode chunkList to a file and save it to a file for later reading.
    - i know i could save each chunk to its own file but i'd rather avoid the clutter.
   
8.    you get the idea by now.
it's a very rough idea of what i'm aiming for, i'm sure it has room for optimisation but that isn't the main point here.

is there a better way of keeping the unloaded chunks in memory so the disk isn't being read/written to every time chunks are loaded/saved?
although i'm not making a sandbox game ( minecraft / terraria / etc. ), how would i handle generating chunks at a negative index, seeing as arrays don't allow it?
- i did consider using an offset value but i dismissed it when i realised it'd potentially mean having to shift the whole array across the further back the player moves.
- another thought is using four different arrays where the index is multiplied by different values when loaded in as a chunk. ( negative x/y would multiply the chunk index by -1 for both to find the chunk position, for example ).

i've never really used chunks before and i've tried looking into methods of achieving it but i can't seem to find a decent answer to those questions. maybe i'm just missing something, it's entirely possible.
 
Last edited:

TailBit

Member
You could have a chunk groups, that maybe starts halfway to 1 byte, 128 or something like that, so when you would hit -1 then it goes from [group 128,chunk 0] to [group 127,chunk 32000] ..
 

saffeine

Member
You could have a chunk groups, that maybe starts halfway to 1 byte, 128 or something like that, so when you would hit -1 then it goes from [group 128,chunk 0] to [group 127,chunk 32000] ..
this is a good solution, actually. for what i'm doing this would work really well seeing as i don't really plan to have my levels go very far into the negatives.
the only reason i'm even considering negatives is in case i want to add a chunk or two of extra details to the left / top while i'm building. also to try and understand how infinite sandboxes do it, but less importantly.
 

Mercerenies

Member
It sounds like you want an array-like data structure with fast access and fast append/prepend operations (since we'll always be loading chunks on the fringe of the current chunk data). I'll discuss this in one dimension and then mention briefly at the end how you can get to two dimensions. The good news for you is that GM:S 2.3 should make a lot of this abstraction stuff a lot nicer.

My recommendation, given your requirements, is something like C++'s deque type. I'm not sure how familiar you are with data structures, but I'll explain as best I can. GM:S arrays are a lot like arrays in most other languages. It's basically a contiguous chunk of memory containing whatever you want. This is great if we know the size in advance, but not so great if we're constantly adding things, especially to the beginning, as you've already observed. On the opposite side of the spectrum, we have linked lists. In a linked list, you store the first element of the list and a pointer to the rest of the list. That pointer points to the second element and itself has a pointer to the third. Our data structure ends up scattered throughout memory. This works great if you need frequent append/prepend operations, but it doesn't give you the nice random access that arrays do.

So C++ has a compromise data structure, called a deque (short for Double Ended Queue). A deque is, at a high level, a bunch of size N arrays (you pick the value N, based on what provides the best performance), combined with a map (ds_map will suffice for this) to each small array. So, let's say we wanted to store 18 chunks, and let N=5. Then our data structure would be a ds_map consisting of arrays. It would look something like this

Code:
0 -> [chunk0, chunk1, chunk2, chunk3, chunk4]
1 -> [chunk5, chunk6, chunk7, chunk8, chunk9]
2 -> [chunk10, chunk11, chunk12, chunk13, chunk14]
3 -> [chunk15, chunk16, chunk17, undefined, undefined]
And then we store some basic housekeeping data, such as where the beginning (key 0, index 0) and the ending (key 3, index 2) are. If we want to access, say, chunk8, then we take floor(8 / N) = 1 and look at key 1 in our ds_map. Then we take 8 mod N = 3 and look at index 3 in key 1, which contains chunk8. If we want to append to the end, we increment the ending position. If there's an undefined there, we add the new data there. If not, we make a new key. Same for the beginning. If I wanted to add in chunk_1 (negative one), then I would do this.

Code:
-1 -> [undefined, undefined, undefined, undefined, chunk_1]
 0 -> [chunk0, chunk1, chunk2, chunk3, chunk4]
 1 -> [chunk5, chunk6, chunk7, chunk8, chunk9]
 2 -> [chunk10, chunk11, chunk12, chunk13, chunk14]
 3 -> [chunk15, chunk16, chunk17, undefined, undefined]
And the beginning "pointer" would be key -1, index 4.

If you wanted, you could just have a big ds_map. But if you're going to be loading and unloading chunks frequently, this can get messy, as you're constantly allocating and deallocating memory. The deque approach (with a good N value) should avoid allocating memory in a lot of cases and only have to do so occasionally, while also avoiding the messy part of using straight arrays (having to copy the existing array to resize).

Now, that's one dimension. If you write this data structure in the abstract, getting the second dimension should be pretty easy: just have a deque of deques of chunks, in the same way that you can get higher-dimensional arrays in languages that don't support it directly by having arrays of arrays.

Brief side note about modulo. I lied a little up above when I said "take K mod N" to access the correct position in the deque. That instruction is correct, if we're talking about mathematical modulo. But Game Maker (like many lower-level languages) makes a design decision that I consider questionable, and implements a faster, less correct version of modulo that behaves very oddly on negative numbers. Generally, in my games, I'll end up writing a function called "truemod" or "mathmod" that does

Code:
truemod(a, b) = (a mod b + b) mod b
This corrects for the error if the dividend is negative. Mathematically speaking, the sign of the result should match the sign of the divisor. In C, GML, and many other languages, for efficiency reasons, the sign of the result matches the sign of the dividend, which causes problems if you're taking modulo of negative numbers like we are here.
 

saffeine

Member
It sounds like you want an array-like data structure with fast access and fast append/prepend operations (since we'll always be loading chunks on the fringe of the current chunk data). I'll discuss this in one dimension and then mention briefly at the end how you can get to two dimensions. The good news for you is that GM:S 2.3 should make a lot of this abstraction stuff a lot nicer.

My recommendation, given your requirements, is something like C++'s deque type. I'm not sure how familiar you are with data structures, but I'll explain as best I can. GM:S arrays are a lot like arrays in most other languages. It's basically a contiguous chunk of memory containing whatever you want. This is great if we know the size in advance, but not so great if we're constantly adding things, especially to the beginning, as you've already observed. On the opposite side of the spectrum, we have linked lists. In a linked list, you store the first element of the list and a pointer to the rest of the list. That pointer points to the second element and itself has a pointer to the third. Our data structure ends up scattered throughout memory. This works great if you need frequent append/prepend operations, but it doesn't give you the nice random access that arrays do.

So C++ has a compromise data structure, called a deque (short for Double Ended Queue). A deque is, at a high level, a bunch of size N arrays (you pick the value N, based on what provides the best performance), combined with a map (ds_map will suffice for this) to each small array. So, let's say we wanted to store 18 chunks, and let N=5. Then our data structure would be a ds_map consisting of arrays. It would look something like this

Code:
0 -> [chunk0, chunk1, chunk2, chunk3, chunk4]
1 -> [chunk5, chunk6, chunk7, chunk8, chunk9]
2 -> [chunk10, chunk11, chunk12, chunk13, chunk14]
3 -> [chunk15, chunk16, chunk17, undefined, undefined]
And then we store some basic housekeeping data, such as where the beginning (key 0, index 0) and the ending (key 3, index 2) are. If we want to access, say, chunk8, then we take floor(8 / N) = 1 and look at key 1 in our ds_map. Then we take 8 mod N = 3 and look at index 3 in key 1, which contains chunk8. If we want to append to the end, we increment the ending position. If there's an undefined there, we add the new data there. If not, we make a new key. Same for the beginning. If I wanted to add in chunk_1 (negative one), then I would do this.

Code:
-1 -> [undefined, undefined, undefined, undefined, chunk_1]
0 -> [chunk0, chunk1, chunk2, chunk3, chunk4]
1 -> [chunk5, chunk6, chunk7, chunk8, chunk9]
2 -> [chunk10, chunk11, chunk12, chunk13, chunk14]
3 -> [chunk15, chunk16, chunk17, undefined, undefined]
And the beginning "pointer" would be key -1, index 4.

If you wanted, you could just have a big ds_map. But if you're going to be loading and unloading chunks frequently, this can get messy, as you're constantly allocating and deallocating memory. The deque approach (with a good N value) should avoid allocating memory in a lot of cases and only have to do so occasionally, while also avoiding the messy part of using straight arrays (having to copy the existing array to resize).

Now, that's one dimension. If you write this data structure in the abstract, getting the second dimension should be pretty easy: just have a deque of deques of chunks, in the same way that you can get higher-dimensional arrays in languages that don't support it directly by having arrays of arrays.

Brief side note about modulo. I lied a little up above when I said "take K mod N" to access the correct position in the deque. That instruction is correct, if we're talking about mathematical modulo. But Game Maker (like many lower-level languages) makes a design decision that I consider questionable, and implements a faster, less correct version of modulo that behaves very oddly on negative numbers. Generally, in my games, I'll end up writing a function called "truemod" or "mathmod" that does

Code:
truemod(a, b) = (a mod b + b) mod b
This corrects for the error if the dividend is negative. Mathematically speaking, the sign of the result should match the sign of the divisor. In C, GML, and many other languages, for efficiency reasons, the sign of the result matches the sign of the dividend, which causes problems if you're taking modulo of negative numbers like we are here.
i'm very familiar with data structures. not sure how far my knowledge and experience could carry me but, i think i'm familiar enough.

deques sound interesting though, but i'm not really sure i get the explanation, unless by a stroke of luck i actually understood it.
the way i understood it is, rather than using array indexes, for each "index", you create a new key/value pair which points to the correct true index ( so -1 wouldn't exist as an array index, but as its own array, with the key '-1' pointing to said array ), allowing for a pseudo negative index. please do correct me if i'm wrong though, just trying to piece it all together.

the only issue i can see with this might be that everything would have the be a predetermined size to begin with. i feel like i'm missing something vital here though.
to me it looks like i would have to set a fixed array size for each of the numbers you affixed with ->. of course, i haven't tried it yet since i already mentioned i'm unsure if i got it right, so maybe it actually would work like that. i don't know how much something like that would restrict me down the line but i figure i should bring it up so at least i can get it clarified.

i'll take a look around google to see if i can wrap my head around deques in the meantime, a little more learning never hurts :^)

EDIT: i think i missed the whole point of the reply. as i was looking up what a deque is i realised that the deque itself isn't what i would be using to index chunks, but rather where i would be storing them to retrieve from later. still though, gonna need a moment to try and figure out exactly how to implement something like this and use it for what i plan to use it for.

EDIT 2: OH i think it just clicked. am i right in thinking that when you said that you need to store the start and end indexes, it's for knowing where the negatives would increment from? i feel silly for overlooking that and wrote it off as just explaining a way of having an array or queue that wraps around itself, but now i see that it can be used to, rather than wrapping, increment from the end of the "positive" indexes ( so if the positive indexes end at +6, then -1 is +7, -2 is +8, etc ). jeez, that sank in a bit too slowly for my liking, and even then i could still be falling short of actually getting it.
 
Last edited:

Mercerenies

Member
The arrays within the map are a fixed size, yes. But that places no restrictions on the application data you store in them. The idea is that, on the one hand, we could just use a ds_map. We could have one key for each chunk, where the key value is the chunk's coordinate. But then, if we're loading and unloading a lot of chunks, we end up making and freeing new keys a lot, which can be slow. If you want, you can try that approach (it's semantically simpler than the deque approach I described above), and if it's performant enough for your case, then go for it.

But the idea behind the deque approach is that, rather than store one value at each key, we store several values at each key, so that we only have to allocate a new key (i.e. a new array and slot in our ds_map) every once in awhile, rather than every time we need to load a chunk. Then, rather than refer to a chunk by its index alone, we refer to it by a pair (key, index) which tells us where to find it. Of course, that detail can (and probably should) be hidden behind an abstraction layer as an implementation detail, but internally we'll refer to things using two indices.
 

Yal

šŸ§ *penguin noises*
GMC Elder
In my 3D framework, I'm using ds_map data structures to store blocks of collision data, using a key generated as string(x) + ":" + string(y) + ":" + string(z)... it's easy to get the key both when looking up and adding new data to the structure, it handles negative coordinates just fine, it's fast to look up data in, and it doesn't use any memory for cells that currently has no data... I wouldn't be surprised if it could be useful for your use-case as well.

(To be precise, you'd store composite data, e.g. an array of chunk data, in the ds_map, and then get the entire data blob at once whenever you need to save/load a new chunk - GM's ds_maps can only handle a single value per key)
 

saffeine

Member
In my 3D framework, I'm using ds_map data structures to store blocks of collision data, using a key generated as string(x) + ":" + string(y) + ":" + string(z)... it's easy to get the key both when looking up and adding new data to the structure, it handles negative coordinates just fine, it's fast to look up data in, and it doesn't use any memory for cells that currently has no data... I wouldn't be surprised if it could be useful for your use-case as well.

(To be precise, you'd store composite data, e.g. an array of chunk data, in the ds_map, and then get the entire data blob at once whenever you need to save/load a new chunk - GM's ds_maps can only handle a single value per key)
i did wonder whether ds_maps would be a decent method of doing it, but i was worried that the lookup would take a lot longer the more chunks i added.
in my defense though, i don't know much about how ds_maps work, so in my head it was a case of it looping through every entry to see if it matched.
that could still be the case, but having you mention that you use it and it works fine for you is reassuring. thanks for that c:
 

Nidoking

Member
The entire point of maps, the purpose of their existence at all, is that they take the least time to look up information of any organized data structure. The key tells the map exactly where the value is stored so that there is no searching required. To get any faster, you'd need to use an array.
 
Top