Depth Ordering in GMS2: Priority List, Nested List-Grid, Binary List

Ariak · Nov 23, 2016

Update:

I would like to point out that I wrote a techblog on z-tilting, a more advanced technique for 2.5d depth sorting. Check it out here: https://www.yoyogames.com/blog/458/z-tilting-shader-based-2-5d-depth-sorting
@MirthCastle came up with an easy solution to replace depth=-y with layers. Long term GM users will feel right at home with his system and I recommend checking it out! https://forum.yoyogames.com/index.php?threads/new-2-5d-fake-3d-depth-sorting-grid-layers.50250/

Hey Folks,

as you all know with the introduction of layer alot has changed. New Concepts pose new challenges. You can read up on layers in this guide @MaddeMichael .

A very major change is how depth ordering works with GMS2. The old depth=-y trick still works, but its far from ideal. The new layers also have depth, meaning your objects could dissappear behind them. Also every depth level creates a new sublayer. This layer is self-managed and will be deleted once there are no objects at this layer. @Mike told us that GMS 1.4 works with priority lists to sort objects and manage depth, and that this was causing immense slowdown the graphics pipeline.

As a consequence the new layers speed everything up in the graphics pipeline. On the flipside, this means that isometric games and RPG / Zelda-likes face a bit of a conundrum: we have to create a depth ordering system ourselves.

I'm going to analyze different approaches. 2 have already been suggested. I want to showcase a third method.

All follow the same principle. Grab all the id's of the instances we want to depth order, once we've got them all sort and draw them in the desired order with a loop. Look at the Dungeon Demo provided with GMS2 to see how this is achieved. I will be focusing on the more technical details here, namely which method to use under which circumstances.

Nested List within Grids
Looking at the demo I was stumped to see that a grid was used which height was that of every single pixel in the room along the y axis for depth ordering. Each entry was then assigned a list. This means a 4096px high room has 1 grid + 4096 list entries. This seemed extremely wasteful in terms of memory. However, we completely forgoe the need to sort our data structures! Simply grab the y-coordinate of our instance, reference that grid entry grid[#0,y] and add our instances id to that list. As all entries will by default have the same y value this is extremely efficient. Later we simply loop our grid, check if a list is empty, and if not we grab all those instances and draw them. Afterwards we clear the list. Neat.

The downside to this approach is that it does not allow for negative values. However it is extremely fast when using many objects at once. If we only loop grid entries that actually fall within the view area we can optimize this even further.

Priority Queues
Suggested by @MaddeMichael in the aforementioned tutorial on layers, we could cut down on tons of memory by simply having a single priority list. Priotity lists work akin to regular list, but they contain a second entry. Think the inverse of a 2D array. 2 points (x,y) to reference one value - vs priorities: 1D, but contains 2 values. A value and a priority. Priority Queues handle sorting for us. The idea is thus simple. Take all our objects, add them to the priority - the value being the ID and the priority being the depth. Loop + read all values from a single sorted data structure. Brilliantly simple, were it not for the fact that the lookup speeds for priorities are ... atrocious.

On the plusside we circumvent an nested grid-list data structure and we can automatically skip over all "empty" lists / y-coordinates with no objects. Also we dont get an "grid index out of bounds error" when instances leave the room and have negative y coordinates!

On the other hand reading is so slow that this system quickly falls apart. Add more than ~75 instances for ordered drawing of your lovely pixel art game and the grid-list approach would have been faster. Whatsmore, reading from priority queues will pop the value: read + delete. This prevents us from creating cool effects like the silhouette - also shown in the YoYo Dungeon Demo for GMS2.

Combining the best of both worlds: Binary Lists.
The idea is exactly the same as a priority queue. However instead we will be using binary operators to store 2 values in one, which we can then add to a single ds_list for sorting. If you're not familiar with binary operators read up on them here.

The value added to the list is as follows:

Code:

binlist[| i]=(id|y<<32);

We take our id and then we OR in our y, which is our depth with the old depth=-y trick. Essentially the first 32bits contain our instance id, whilst the next 32 bits contain our depth. Once all instances are added we can sort them with ds_list_sort by depth, as this will be the greater value. This approach also sorts by instance_id number should multiple instances exists with the same y value. Sorting in ascending order and looping through the list from 0 should results in the desired effect when drawing.

Reading an ID from the list is done as follows:

Code:

var oID=binlist[| i]&$ffffffff;

We AND out the 32 bits making up the depth, what remains is our object ID, except its now sorted! We can now proceed as we always do and loop+draw.

Why the best of both worlds? We can now store + sort everything in one data structure, whilst allowing for negative depth values. Further reading values will not clear them enabling silhouette effects etc.

Performance / Speed Comparison

Inherently Priotities and the Binary List follow exactly the same concept. As we add all instances to a single data structure we will have to sort them at some point. Be that whilst adding with priorities, or only once when our binary list is complete. This does not scale very well with instance count. More instances, more possibilities, more lookups required to sort.

The Nested-Grid approach on the other hand is slowed down initially - as no matter how many instances exist - it will have to loop all its grid entries (at least those within the view) and check whether its sublists contain entries. These however do not need to be sorted, as established earlier. This results in a higher "base load", but also incredible scalability.

I'm going to show three Scenarios for different resolutions. Pixel Art Games are fine with low resolutions, thus they also do not need as many list entries in the height grid. This means less lookups per step, regardless of instance count and thus a lower "base load".

In each scenario we will increase the instance count from 25 to 1000. As expected the results show a clear trend.

Measurements were taken for 15 seconds before reseting and increasing the instance count.

Scenario 1: 1080p with 2*64px margin

Scenario 2: 540p with 2*32px margin

Scenario 3: 270p with 2*16px margin

Code

Edit: I'm merely bitshifting by 20 for the binary list in this code snippet. As i've shown in an answer below whether we shift by 16,20 or 32% has minimal impact on performance. 32 was somehow even faster...

9th Sept 2018: The picture was deleted by the filehost and sadly I can't find the project anymore.

Results
As we can see from the graphs the higher the resolution the better the binary list approach - lower resolution favor the nested grid-list approach.
@ 1080px height binary lists draw even wtih grids at around 375 depth ordered instances.
@ 540px height break even is roughly 250 depth sorted instances
@ 270px height the balance is struck just short of 175 depth ordered instances.

The standard priority list is out of the picture. The binary list simply scales better, and is also more efficient to begin with. However in the lower ranges, which are for more likely, the binary list is also consistenly outperforming the nested list-grid approach by a significant margin! Many object do not need depth ordering to boot, with the new layer system and the remainder will seldomly exceed 250! During a playthrough the instance count can vary wildly - but there are certain limits. If your are certain you need to accomodate an extreme amount of depth odered objects than the grid-list approach is the way to go! If you can be fairly certain this is not the case then definetly go with a binary list approach.

Final remarks

Once we have an instances id ready for drawing we have three options. In order of speed (fastest to slowest):
draw_sprite>>with(o){draw_self()}>>draw_sprite_ext. The latter two are extremely close and both allow for rotation, alpha, blending and scaling. Again this depends on what you are going for. There is no optimal solution.

On why we use an 64bit number for the binary list approach. An instances ID begins with 100000 and can thus not be stored in an 16bit integer. Furthermore adding and deleting instances at runtime does not reset this counter. Meaning it will continually grow. if we used 17 bit values we could store 2^17-100k =~ 31k instances before our method fails and the id will be returned corrupted. The longer we play the more likely this is to happen. Increasing this value will in turn only decrease the possibilites to store our y/depth value in the remaining bits. A balance could be struck at say 19bits for the id, and the remainder for y. I simply chose to add more padding and allow for most eventualities.

I hope you found this insightful and that it will help you with depth ordering in GMS2.
Happy Coding!

Edit: Fixed a major bug. Consequently reran the tests. Also included @rui.rosario idea to store the ID in a seperate list and pass the index of the list entry into the binaryList with the depth for sorting. This also enables 32bit whilst avoiding unfathomably unlikely ID errors, caused by instances id exceeding 32bits. See "Binary ID".

Lahssoo · Nov 23, 2016

I'll probably benchmark with 32 bits values : afaik, gms2 compiles in 32 bit only, so accessing a int64 should be slower than a simple int32. That being said, I have no idea on how gms2 handles memory, so this is purely speculation.

I mostly work on pixel art games, so 20 bits for id (948,576 possible instance allocation) and 12 bits for position (4096 possible height) is more than enough.

Ariak · Nov 24, 2016

@Lahssoo

Results: Shifting by 32 is faster than shifting by 20, no matter how much I turn it, even if i switch the order the code executes. Loops is set to 50000.
The effect is pretty tiny ~2.5% roughly. If even such a statement can be made.

Lahssoo · Nov 24, 2016

Well, thank you then!

rui.rosario · Nov 24, 2016

Since you know instance IDs start at 100000 you could store the ID in just 16 bit by first subtracting 100000 from it before storing the value and re-adding it when reading it back. If you even wanted to make sure the indexing wasn't a problem in the long run you could even ditch the instance ID altogether and store all depth ordered instances in a list and use their index in the list as the ID part in the binary list. That way the limits would only apply within the size of the list and not because your game was running for hours.

Ariak · Nov 24, 2016

Both are good suggestions @rui.rosario . Here's why I would still argue against them.

Some quick math on the problem - just too see how long we can play before we run out of ID's:

Substracting 100000 before, and readding it later is very possible - but why slow the binary list down by ~6% just so that we can add 0,0023% more instances before running into the problem again. I just gave it a quick test, switching rooms does indeed reset the instance counter! So unless you plan on creating a single room for your game, and spend >400h in a consecutive session there ... you're good. Also as shown in my previous post whether we shift by 32 or 20, and possibly 16 for that matter (just tested it) is extremely insignificant (and <<32 wins anyway by a tiny margin

).

Your second-list solution makes the problem ID moot,but will also impact performance rather heavily. Yes it can store ID's in arbitrary order, but how do we store the y value? Another list? Or do we grab the instance id, then get its y coordinates. Either we tripple our list calls, or we add a step where we reference another external variable (y). So far the only external variable we needed to add instances was the of the list. The key problem to me seems to be: how do we sort the ID-list without any depth information? Edit: turns out I misunderstood

The "draw-object" creates and stores the binary list. All objects wishing to to draw call the following in their draw_event. Pre Draw, or step whereever you like. As long as the binary listed is fully populated before drawing.

Code:

ds_list_add(obj_draw.binarylist,id|y<<32)

Edit: Remember that my method is only faster if we use less than ~300 instances. Whether we store thos values as 32bits or 64bits should thus not really matter in terms of memory. If you consistently need more go with the grid-list approach. It will use alot of memory too, regardless of instance count! Don't know where the break even point is, but 1 grid + 540 sublists empty seem rather high. If we assume an empty list takes up the same memory as a single 32 number, then we will almost never use more than the grid-list, except for extreme cases ~800+ instances as both will still need to store the ID (32bits).

rui.rosario · Nov 24, 2016

Ariak said:
The key problem to me seems to be: how do we sort the ID-list without any depth information?

I think you misunderstood me, the list was parallel to the binary list, you would add the object to the list and use the index of the object in the list to combine with the depth information to populate the binary list. Hence you'd have two lists: The binary list and the list that would act as a indirection layer for smaller IDs. Although if the ID resets after changing rooms then it may not be required

Hyomoto · Nov 25, 2016

This discussion has been specific to draw calls, but we can also use this type of ordering to determine when events happen. For example, if we deactivate our instances and use event_perform from a controller object, we actually have total control over our sequence of events. Since we normally have little control over what order events are run in, there can be issues when the outcome of one calculation may depend on another. For example, when dealing with the mouse pointer and user interfaces it is generally undesirable to have the mouse able to interact with elements that are obscured. If we use depth ordering to check which one the cursor has focus on, we can check from the top down, and once we find one that satisfies our criteria we can stop rather than running it for all elements. Another example is if one object makes a determination based on the position of another object, depending on which one moves first we may get a different outcome. This way we can be assured which order those steps are performed.

I hadn't considered it before, but it seems this technique has a lot of great uses. Good read.

c023-DeV · Nov 25, 2016

What about only letting the sprites in the Ordered List only add themselves to the list if they are close to the view?
Since they add themselves on every draw call...
Or only render those within the Y-axis of the view....

Ariak · Nov 25, 2016

Ariak said:
If we only loop grid entries that actually fall within the view area we can optimize this even further.

The code I presented is purely designed to test the performance of the data structures. How and when you choose to add instances is up to you. It will affect the performance yes, but for all three options equally. I wanted to showcase a faster method with less memory overhead for sorting these (at low instance count). Its pretty obvious that you only add instances to the data structures that actually fall within the view + a certain margin to avoid sprites magically apprearing at the borders. Thus my tests also indicate the margin that I added to the resolution//height used. The grid-list does not keep track of where on the screen the instances are (y) value, so it will have to loop every height and thus every sub- list to check if its empty or not.

Also, as correctly pointed out by @Hyomoto , whilst I focus on the depth ordering problem here, this approach is essentially just a faster priority list and can be applied for lots of things.

Dave Holland Art · Mar 28, 2017

I'm fairly new to all of this and I need a little hand-holding. Can someone explain what practical uses I could take (or provide a tutorial / example) on how to apply this to my orthographic game? I've been using depth = -y and the recent IDE update made that go to poo. I could really use a person who understands this but also understands my incompetence to help me out. Much thanks for the help in advance

xDGameStudios · Mar 30, 2017

Hyomoto said:
This discussion has been specific to draw calls, but we can also use this type of ordering to determine when events happen. For example, if we deactivate our instances and use event_perform from a controller object, we actually have total control over our sequence of events. Since we normally have little control over what order events are run in, there can be issues when the outcome of one calculation may depend on another. For example, when dealing with the mouse pointer and user interfaces it is generally undesirable to have the mouse able to interact with elements that are obscured. If we use depth ordering to check which one the cursor has focus on, we can check from the top down, and once we find one that satisfies our criteria we can stop rather than running it for all elements. Another example is if one object makes a determination based on the position of another object, depending on which one moves first we may get a different outcome. This way we can be assured which order those steps are performed.

I hadn't considered it before, but it seems this technique has a lot of great uses. Good read.

Didn't know you could call event_perform on a deactivated instance!! because with(instance_id) will not work... how would you do that?!

Hyomoto · Apr 15, 2017

xDGameStudios said:
Didn't know you could call event_perform on a deactivated instance!! because with(instance_id) will not work... how would you do that?!

Late post, late reply. So, first off when you do event_perform it's only performing that event in the sense of a script call. The actual call itself is performed from the calling entity, unless as you observed, you use with. Since variables are always available, even if the instance is deactivated, you could feasibly set a value to the id of the instance whose variables it should be using, and then write the event to perform it's functions using that variable. The viability of this is up for testing, but my point is largely that this technique can be used for more than just draw sorting and could give you very precise control over what should be doing what and when and could be as simple as just:

Code:

/// @desc ev_user0
id.x += 1;

In this case, the code is being run from the id of the calling instance. So, if you did something like:

Code:

/// @desc ev_user0
target.x += 1;

Then in the calling instance, you only need to set target to the id of the instance whose variables it should be using. Like I said, the viability is up for discussion but it shows that there are a lot of ways to handle the in-game loop and give yourself more control than the default generally allows for.

RicardoVL · May 16, 2017

What about tiles? do you have a solution for that?
https://forum.yoyogames.com/index.php?threads/what-is-the-good-way-to-handle-tile-object-depth.26641

icuurd12b42 · May 17, 2017

Paging Dr @Juju !

@Nocturne This is a perfect example of an Advanced Discussion Topic

Nocturne · May 17, 2017

icuurd12b42 said:
@Nocturne This is a perfect example of an Advanced Discussion Topic

Agreed and moved.

Strawbry_jam · May 17, 2017

Ariak said:
Inherently Priotities and the Binary List follow exactly the same concept. As we add all instances to a single data structure we will have to sort them at some point. Be that whilst adding with priorities, or only once when our binary list is complete. This does not scale very well with instance count. More instances, more possibilities, more lookups required to sort.

To speed up the sort, you can pre-sort instances by initial depth (a lot of times depth doesn't change depending on the game). For each object using the draw order, add to list. For each in list, with object, add each instance to queue/list.

A partially sorted list is faster to sort than a completely random list.

Edit nevermind. Didn't realize it's gone. I need to work in gms2 more.

But if you know the initial depth you will use for objects, you can add them in a better order to improve performance.

Ariak · May 17, 2017

Using the draw order is possible ofc, but requires you setting the depth in the first place. At which point you're back to ye olde "depth=-y" and could just run with that solution from the get go, ignoring the new layer features, foregoing all the sorting, manual drawing etc.

If you do choose to run with a system similar to the one presented a two part system is adviseable to minimize sorting effort: seperate statics and dynamics. Instances that move frequently go into the dynamic container and are updated/sorted etc. frequently, whilst the static container, where we know your mentioned initial depth (as custom "depth variable" mind you!), is updated very rarely on a per case basis. Infact an insert-sort algo might be a neat solution - depending on how many instances we're talking.

Hyomoto · May 19, 2017

RicardoVL said:
What about tiles? do you have a solution for that?
https://forum.yoyogames.com/index.php?threads/what-is-the-good-way-to-handle-tile-object-depth.26641

What I currently use is a depth render nested in the layer that needs sorting between tile layers. If you have a lot of layers of elevation you can transition an instance from one render to another render in another layer. Someone else may have a better solution but I'm guessing you aren't going to use thousands of layers of elevation so just two or three would be very viable and simple with this method.

icuurd12b42 · May 20, 2017

a method I use is to have an array the size of the view height, creating a vertical slot for each y line visible
each array item hold another array or a list containing the items to draw for that slot.
basically each raster line has an array or items to draw.

instances add themselves to the right vertical slot according to their y value
DepthSystemAddItem(y-view_yview,id);

in the draw event of a controller, for each slot, for each item in slot, draw item

slojanko · May 20, 2017

Is there a demo of all the comparisons available? I've never tackled depth sorting so all I see is a bunch of numbers.

Hyomoto · May 20, 2017

Ariak said:
Using the draw order is possible ofc, but requires you setting the depth in the first place. At which point you're back to ye olde "depth=-y" and could just run with that solution from the get go, ignoring the new layer features, foregoing all the sorting, manual drawing etc.

If you do choose to run with a system similar to the one presented a two part system is adviseable to minimize sorting effort: seperate statics and dynamics. Instances that move frequently go into the dynamic container and are updated/sorted etc. frequently, whilst the static container, where we know your mentioned initial depth (as custom "depth variable" mind you!), is updated very rarely on a per case basis. Infact an insert-sort algo might be a neat solution - depending on how many instances we're talking.

I use a very similar concept. The UI needs depth sorting but I know what depth it should be when it's created, so while it is depth sorted using a binary list as described, essentially the sort only happens when the UI elements are created rather than every frame. I think you bring up a very good point: objects that don't intend to change depth often or at all can probably have some special exclusion. Still, as the post points out, if you have enough instances where this type of optimization is relevant you could probably just use the grid-list approach since it scales very nicely. Most cases where this type of separation is needed you could probably forego depth sorting altogether and just use the layers anyways which would likely bring your instance count down enough to not need specific exclusion when depth ordering those that still need it.

GMWolf · May 20, 2017

icuurd12b42 said:
a method I use is to have an array the size of the view height, creating a vertical slot for each y line visible
each array item hold another array or a list containing the items to draw for that slot.
basically each raster line has an array or items to draw.

instances add themselves to the right vertical slot according to their y value
DepthSystemAddItem(y-view_yview,id);

in the draw event of a controller, for each slot, for each item in slot, draw item

That is indeed quite a smart way to do things, but it may be a little expensive when it comes to memory usage.
Perhaps using a map instead of the outer array could reduce the number of entries, if a line isnt occupied.

This solution also won't work if you want depth sorting based on something other than just the vertical position.

icuurd12b42 · May 20, 2017

Fel666 said:
That is indeed quite a smart way to do things, but it may be a little expensive when it comes to memory usage.
Perhaps using a map instead of the outer array could reduce the number of entries, if a line isnt occupied.

This solution also won't work if you want depth sorting based on something other than just the vertical position.

well an array of 720 entries, most entries empty, some entries with arrays of 1 to 5 items for example.

maps are likely slower than an array...

you can also use a one column grid for the main container and get a slightly faster method and use a list as the container for each slot

the point being with this method there is no sorting involved at any point aside figuring what slot the entry belongs in.

GMWolf · May 20, 2017

icuurd12b42 said:
well an array of 720 entries, most entries empty, some entries with arrays of 1 to 5 items for example.

maps are likely slower than an array...

you can also use a one column grid for the main container and get a slightly faster method and use a list as the container for each slot

the point being with this method there is no sorting involved at any point aside figuring what slot the entry belongs in.

It is indeed a very good technique from an efficiency stand point: O[N] i believe. (any sorting will be at least O[N log(N)])
Care must be take however, that the objects are put in these lists in a consistent order, to avoid a "z fighting" style issue on same y levels, not difficult to do though.

I still think this technique could be expanded to work outside of simple y- sorting. (what if you introduce a z variable?)

icuurd12b42 · May 20, 2017

>(what if you introduce a z variable?)
The lemming have a z variable. obviously the technique is only viable in specific conditions. if you are thinking of z as in true 3d then you might as well enable 3d and let the gpu figure it out

>Care must be take however, that the objects are put in these lists in a consistent order, to avoid a "z fighting" style issue on same y levels, not difficult to do though.

At that level of ordering, you get the same z fighting as GM's native 2d system

GMWolf · May 20, 2017

icuurd12b42 said:
The lemming have a z variable. obviously the technique is only viable in specific conditions.

so they do!
(I was thinking of having isometric elements that could cover others, along with a z varialbe, but there are ways to get around this (there was a recent blog post on it on the GMC blog)

csanyk · Mar 4, 2018

I have implemented this binlist approach in my Marketplace asset, z3d_engine.

MishMash · Mar 5, 2018

Interesting topic, I haven't read everything in pure detail, but wanted to chip in a few thoughts. I will also extend that i'm not 100% familiar with how GMS2.0 works, given that I mostly use GMS1.0, however these comments can still be relevant:

Depth Buffering
If you have the privilege of not needing to use any partially transparent textures, you can make use of the hardware depth buffer whilst rendering, setting the z value of the world matrix to be the instances depth. This means you don't acutally have to worry about sorting at all, and can simply make use of a feature already being executed in hardware.

(For background information, the depth buffer is essentially an extra value which exists within a created surface, where each pixel stores its depth value. When something new is rendered, the existing pixel is only overwritten (and the fragment shader is only executed) if the newly rendered pixel is at a smaller depth (depending on comparison function) than the previously rendered pixel.)

Code:

// Turn on the depth buffer
gpu_set_ztestenable(true);
gpu_set_zwriteenable(true);
// gpu_set_zfunc(cmpfunc_less); Optional if you want to change how the default depth comparison test is done.).

with(all renderable){
     var inst_depth = -y;
     matrix_set(matrix_world, matrix_build(0, 0, inst_depth, 0, 0, 0, 1, 1, 1));
     --- regular drawing routine for object
}
 matrix_set(matrix_world, matrix_build_identity());

As an additional note, you may need to either implement a shader which has a discard condition if the input pixel has transparency. Alternatively, you can turn on alpha testing using: gpu_set_alphatestenable(true), which will automatically discard pixels that do not meet the alpha threshold set in gpu_set_alphatestref(val).

It is also worth noting that this solution is also more optimal with regard to pixel fillrate, as we only have to run shading for pixels that are seen. Any pixel hidden behind something wont have its fragment shader executed, which can be significant if you have any particularly heavy shaders running.

Incorperating transparency inside this system
Luckily, we can still get the performance benefits of this system for objects that require partial transparency by simply using one of the other ordering methods already provided. In order for things to subsequently work with other things, we still need to set the z coordinate of the transformation matrix and perform the z-test, however, given we are rendering things in order, we will want to turn off the writing of values to the depth buffer, and only have the comparisons against our existing scene.

To do this, we simply set gpu_set_zwriteenable(false);

Hybrid solutions and spacial structures
Given that we can evaluate in which situations different solutions perform better, we can use this to come up with hybrid solutions that give us the benefits of both approaches. For example, the grid list has a constant time complexity at the cost of using up more memory. It also requires one row per pixel which can mean we have some dead iterations. There are a few solutions to improve this:

Maintain a list of the rows in the grid that have values in them. We can then simply iterate through the list rather than the full vertical span of the grid.
Given this solution can quickly become impractical for rooms with large verticality, we can use a spacial approach by which we divide the vertical screen space up into chunks, say of 100 pixels (the chunk size would have to be set based on the number of instances in the world, and the spread). Within each chunk, we can potentially then use whatever depth sorting solution we want. The main benefit here is that empty chunks can be completely ignored. We can also know that an entity will be entirely contained within a chunk, so that some of the solutions that slow down under a higher complexity with large instance counts could end up being better if you know that you are never likely going to exceed say 50 instances per chunk, as you only need to perform a limited amount of sorting.
Obviously, this falls into more of a second-hand optimisation, rather than actually improving the algorithmic complexity, but it is an important consideration for larger levels, as people may instantly disregard the grid method.
With the grid, assuming you are re-creating it every step, and not checking for depth changes (as its likely most instances will move), you know that there will be a limit to the amount of space you need to consider. Say if your screen height is 1080 pixels, and the largest sprite you render is 200 pixels high, the size of your grid only needs to be 1480 at most. This is because you only care about rendering things that will actually end up on screen, so you apply offsets to your grid.

Ariak · Sep 9, 2018

Im sorry to say that the filehost has deleted the pictures, and I do not have the project anymore. Im thus unable to replace the lost code images.

Simon Gust · Sep 10, 2018

Ariak said:
Im sorry to say that the filehost has deleted the pictures, and I do not have the project anymore. Im thus unable to replace the lost code images.

Rip, also

Ariak said:
Edit: I'm merely bitshifting by 20 for the binary list in this code snippet. As i've shown in an answer below whether we shift by 16,20 or 32% has minimal impact on performance. 32 was somehow even faster...

Just saw that and I know why it's faster.
Usually the normal type in game maker is a 32bit float. If you apply a bitwise operation or bitshift it will cast into a 32bit int and then revert back into a 32 bit float for the outcome. Unless it exceeds 32 bits alltogether, in which case, the type will stay 64bit int. Which is for one faster, and you also don't get unwanted casts anymore.

Hyomoto · Mar 6, 2020

I was linking to this post and it got me thinking about my own optimizations. I tend to use the binary list method for it's simplicity and scale-ability ( it works regardless of your y axis) but I added a personal optimization to it. The biggest bottleneck is iterating over the list once to make sure the depths are updated, again to sort it (though GM handles this), and once more during the draw phase. If you have objects that don't change their depths often, or at all, it's possible to use two lists, interleave them during the draw step, and remove those objects from the list update step.

GML:

var _dynamic     = ( ds_list_size( __renderListDynamic ) > 0 ? __renderListDynamic[| 0 ] : noone );
var _static      = ( ds_list_size( __renderListStatic ) > 0 ? __renderListStatic[| 0 ] : noone );
var _nextDynamic = 0;
var _nextStatic  = 0;
var _next;

repeat ( ds_list_size( __renderListDynamic ) + ds_list_size( __renderListStatic ) ) {
    if ( _static != noone && ( _dynamic == noone || _dynamic > _static ) ) {
        _next   = _static;
        _static = ( ds_list_size( __renderListStatic ) > ++_nextStatic ? __renderListStatic[| _nextStatic ] : noone );
      
    } else {
        _next    = _dynamic;
        _dynamic = ( ds_list_size( __renderListDynamic ) > ++_nextDynamic ? __renderListDynamic[| _nextDynamic ] : noone );
  
    }
    with ( _next & RFLAG.MASK ) { event_perform( ev_draw, 0 ) }
  
}

I haven't revisited this code in a while, so there may be some optimizations I'm ignoring here. Anyways, I've linked a lot of people to this post to learn about depth sorting in GMS2, so this is my meager contribution to the topic.

Depth Ordering in GMS2: Priority List, Nested List-Grid, Binary List

Ariak

Guest

Lahssoo

Guest

Ariak

Guest

Lahssoo

Guest

rui.rosario

Guest

Ariak

Guest

rui.rosario

Guest

Member

Member

Ariak

Guest

Dave Holland Art

Guest

GameMaker Staff

Member

RicardoVL

Guest

icuurd12b42

Guest

Friendly Tyrant

Member

Ariak

Guest

Member

icuurd12b42

Guest

Member

Member

aka fel666

icuurd12b42

Guest

aka fel666

icuurd12b42

Guest

aka fel666

Member

MishMash

Guest

Ariak

Guest

Member

Member