• Hey Guest! Ever feel like entering a Game Jam, but the time limit is always too much pressure? We get it... You lead a hectic life and dedicating 3 whole days to make a game just doesn't work for you! So, why not enter the GMC SLOW JAM? Take your time! Kick back and make your game over 4 months! Interested? Then just click here!

Broad Phase Collision Dedection with GML

erayzesen

Member
Hi. I need to optimize my collision dedection with gml. My condinations are;

-I will usually have dynamically moving objects.
-I will use this on side scroller, platformer games. So the map will may be large.
-Usually my dynamic objects will be circle, rectangles.

I searched on the web about this and I finded these solutions;
-Quad Tree Algorithms
-Dynamic AABB Tree Algorithms
-Sweep and Prune

I shouldn't go into crazy loops, since my objects will be dynamic, I shouldn't be refreshing this structure with big loops at every step.
Which one of them would you recommend to use with the gml and its potential? What are your experiences? What are your experiences with it?
 

rytan451

Member
GMS2 already implements an R-Tree (a refinement of quad trees) in its collision system. Just use GMS's built-in collision functions, and you should be fine.
 

erayzesen

Member
GMS2 already implements an R-Tree (a refinement of quad trees) in its collision system. Just use GMS's built-in collision functions, and you should be fine.
I know. But I need more custom somethings...
I implement quadtree algorithm on GMS. The default way is every frame re-construct all quadtree nodes but I wrote a simple ai that makes fixes instead to make it work better.

 

rytan451

Member
Why would you want to create your own collision system, rather than using GMS's built-in one? Also, don't reconstruct the quad tree every step. That way might actually be slower than just the naive N^2 loop.
 

erayzesen

Member
Why would you want to create your own collision system, rather than using GMS's built-in one? Also, don't reconstruct the quad tree every step. That way might actually be slower than just the naive N^2 loop.
I'm writing a physics that can react with different types of objects, built-in physics can only interact with objects of their own type. I allow it to interact with different physics types with my tile systems. I will also ensure that physics can communicate comfortably with different objects I have written. This way I can interact with an object in any way I want.

In order to optimize Quadtree, instead of rebuilding the tree at each step, I create the tree only once, when objects move, it informs the branch of the tree. It checks the branch of the tree again and sends the data to the upper branch or its sub-branches if necessary.
 

GMWolf

aka fel666
built-in physics can only interact with objects of their own type.
What do you mean by that?


I shouldn't go into crazy loops, since my objects will be dynamic, I shouldn't be refreshing this structure with big loops at every step.
If your objects are moving, then you're going to get loops.
Best possible algorithm will have at least one loop.
Most algorithms will end up N Log N (like quad trees).


Which one of them would you recommend to use with the gml and its potential?
Doesn't really matter what the language is.
Just remember, GML is slow. You will most likely not reach anything close to native performance.


What are your experiences with it?
I like BVH / BVT. Not terribly complex but very effective. Much nicer behaviour than quadtrees but slightly more expensive.


In order to optimize Quadtree, instead of rebuilding the tree at each step, I create the tree only once, objects move to inform the branch of the tree. It checks the branch of the tree again and sends the data to the upper branch or its sub-branches if necessary.
Ask yourself how much you are really saving by doing that.
You are still traversing every object in the tree.
The primary cost of building structures is accessing the memory.



Best advice for optimization I can give you is to avoid allocations.
Don't use nested arrays, etc. They are inherently expensive to build because each array needs to be individually constructed and destroyed.

Instead look at flat representations. They will save you a lot in temporary objects.
 

erayzesen

Member
What do you mean by that?
In order for the built-in physics to interact, you have to make a physics object of its own kind.

If your objects are moving, then you're going to get loops.
Best possible algorithm will have at least one loop.
Most algorithms will end up N Log N (like quad trees).
Actually I don't use loops for this, each object keeps the reference of the branch of the tree it belongs to, and when it moves, it informs the branch of the tree it belongs to.

I guess in your assumption that I need to create a search loop from the top of the tree to the branch where the object is, no I don't use that.

Doesn't really matter what the language is.
Just remember, GML is slow. You will most likely not reach anything close to native performance.
That's why I asked, I am aware of the gml and the performance problems experienced. But the performance issues in gml are a bit more specific. For example, some of the built-in features in the game engine work much better than what you have created, because these are optimized structures that have been compiled and added under the hood. So I wanted to look for the best answer that adapts to these factors.

Ask yourself how much you are really saving by doing that.
You are still traversing every object in the tree.
The primary cost of building structures is accessing the memory.
Yes, but I think the ram memory I use in doing this is not even comparable to the memory used by other parts of the game. I do this to reduce the processor load, when there are too many objects, collisions and other calculations are incredibly large at every step.

In my test, 1000 collisions occurred with Quadtree on, and more than 3000 collisions with it off. As the number of objects I use increases, this savings reaches incredible levels.
 

GMWolf

aka fel666
In order for the built-in physics to interact, you have to make a physics object of its own kind.
You mean make another physics object?
Why is that a problem?
If you mean another object with the same object_index, that is incorrect.


Actually I don't use loops for this, each object keeps the reference of the branch of the tree it belongs to, and when it moves, it informs the branch of the tree it belongs to.
And how do you figure each instance gets updated? With a loop!
Actually this is probably worse for perf than your own loop because of data and instruction locality.


Yes, but I think the ram memory I use in doing this is not even comparable to the memory used by other parts of the game. I do this to reduce the processor load, when there are too many objects, collisions and other calculations are incredibly large at every step.
Processors can process a lot more data than they can read.
I'n this cases you are reading position, and size, doing a tiny bit of math to figure out where it goes, and writing an index. The processor load is nothing. It's not expensive to do.

I'm not talking about memory usage. I'm talking about reading and writing memory.

I'm not saying don't use a quad tree. I'm saying make sure your scheme of updating a quad tree is actually faster than just building a new one from scratch.
In my experience, with a very optimized tree creation algorithm, its hard to beat it with a tree update algorithm.
 

erayzesen

Member
You mean make another physics object?
Why is that a problem?
If you mean another object with the same object_index, that is incorrect.
No, I guess gms is confusing at this point. When you say "use physics" for an object in gms, the physics engine produces and adapts the objects required for that object according to its library. You also create a new physics object. This is actually not bad, it is useful. But for example, an object not defined in this way, you cannot interact with any component. For this you need to go under the hood and customize the onboard physics engine, which we can't do anyway.

And how do you figure each instance gets updated? With a loop!
Actually this is probably worse for perf than your own loop because of data and instruction locality.
Each of my objects has an object reference named aabb. This aabb object is also a leaf of the tree. With this reference, I communicate when necessary with the tree branch where the object is located. Without this, you cannot update the tree in an optimized way, you do an update function on the branch in the tree, and when the properties of an object change, you also update the branch of that tree. If you have a different suggestion than this, I would like to listen. Either I will rebuild the tree every time, or I will notify the tree when necessary and update the required location.

All this is not free anyway, you use memory for it.
 

erayzesen

Member
I'm not saying don't use a quad tree. I'm saying make sure your scheme of updating a quad tree is actually faster than just building a new one from scratch.
In my experience, with a very optimized tree creation algorithm, its hard to beat it with a tree update algorithm.
Sorry, I overlooked this.
You are absolutely right about this. I've done tests on this, if I create the tree every time it takes 5ms per step for a certain number of objects. When I did this for the same number of objects, it cost me 1.2 ms. Can it be further optimized? Of course. It needs to be worked on.
 

GMWolf

aka fel666
No, I guess gms is confusing at this point. When you say "use physics" for an object in gms, the physics engine produces and adapts the objects required for that object according to its library. You also create a new physics object. This is actually not bad, it is useful. But for example, an object not defined in this way, you cannot interact with any component. For this you need to go under the hood and customize the onboard physics engine, which we can't do anyway.
That's ok then, just make the other object a physics object too.
There are options to make object sensor objects, kinematic objects etc

What kind of other objects do you need anyways?


Each of my objects has an object reference named aabb. This aabb object is also a leaf of the tree. With this reference, I communicate when necessary with the tree branch where the object is located. Without this, you cannot update the tree in an optimized way, you do an update function on the branch in the tree, and when the properties of an object change, you also update the branch of that tree. If you have a different suggestion than this, I would like to listen. Either I will rebuild the tree every time, or I will notify the tree when necessary and update the required location.
What I mean is you are still using an implicit loop. The loop of game maker going over each instance to update it.

If you are trying to optimize the case where you have moving and static objects, then the usual techniques is to maintain two different acceleration structures:
One acceleration structure for static objects, that is slow to rebuild, but cast to query.
And another acceleration structure for dynamic objects, that is fast to build, but slower to query.
(Of course you can use the same type of tree for both for simplicity).


when the properties of an object change
The is is the crux
How do you determine when a property changes? Is it when you change the property directly? Then you must have a sort of loop that itself changes the property.
Remember the step event is a loop, just happening inside GM.



I've done tests on this, if I create the tree every time it takes 5ms per step for a certain number of objects. When I did this for the same number of objects, it cost me 1.2 ms. Can it be further optimized? Of course. It needs to be worked on.
This discrepancy seems very large.
How are you building your tree?
If you are using a separate object per node, you are going to have performance issues, especially hidden ones with GC.
 

erayzesen

Member
That's ok then, just make the other object a physics object too.
There are options to make object sensor objects, kinematic objects etc

What kind of other objects do you need anyways?
I know the built-in physics of gms and I have used it before.
For example, there are bitmap objects (I derive bitmaps) that I generate during the game, it is quite costly to define them as built-in physics objects. Their boundaries are not clear.

There is also the tile map that I have generated. As you can imagine, describing each square of these as a physics object is expensive. I can make an algorithm that can reduce the number of these objects according to the sequence of these tiles for this, which is not simpler than what I am dealing with now. Even though I figured it out, I have the kind of objects I described above.

The situations you mention are the situations where you make the section designs of the game yourself and keep the objects in the collection. Tell me if I got it wrong.

I agree with you that what you are trying to explain is that this system that I do cannot work more efficiently and optimized than the built-in physics. This game engine is in general, nothing can be faster than methods under the hood. I do this because of the needs I have mentioned. It would be more correct to say "I'm trying". :)

If you are trying to optimize the case where you have moving and static objects, then the usual techniques is to maintain two different acceleration structures:
One acceleration structure for static objects, that is slow to rebuild, but cast to query.
And another acceleration structure for dynamic objects, that is fast to build, but slower to query.
(Of course you can use the same type of tree for both for simplicity).
I don't update my static objects, they stay in the tree as always.

I update dynamic objects if their positions and aabbs change by a certain amount. This object I call aabb is not just something I use for this situation, I use it again in the broad phase of the crash tests. I need to update it anyway, while updating it, I pull the key values to a simple if query and inform the tree it belongs to if necessary.

These queries and updates can be further reduced according to the speed of the simulation. In other words, the moving object can send news to a tree in 10 steps, not 5 steps.

The is is the crux
How do you determine when a property changes? Is it when you change the property directly? Then you must have a sort of loop that itself changes the property.
Remember the step event is a loop, just happening inside GM.
My objects have values that also have functions elsewhere, such as old_x and old_y. I use this to compare it to his current position.

The other thing I compare is the size of the aabb. These are included in the function of the aabb update I do at each step.

This discrepancy seems very large.
How are you building your tree?
If you are using a separate object per node, you are going to have performance issues, especially hidden ones with GC.
My English is a little awful, I guess I couldn't explain this fully.

If I do the tree from scratch at every step from beginning to end, it does a 5ms process, if I do it by editing the tree instead of always rebuilding the tree as I told you, it takes 1.2ms. In these tests, the number of objects is too many, I do not remember the exact number, but I am talking about a condition where fps is at the bottom and total ms is too high.

The main place I want to optimize right now is the part of an object's collision list I want from the tree, if I do this from the top of the tree, I make a very unnecessary loop. It is wasting unnecessary performance.

It's not useful if I have few objects, it's just a burden. Because the collision operations of these objects are not more than the load required for that tree to work.

In fact, Box2d's AABB Dynamic Tree algorithm is very flexible and useful for such jobs. But I could not manage that algorithm, especially the issue of keeping the tree in balance seems very complicated to me.
 
Last edited:

GMWolf

aka fel666
I don't update my static objects, they stay in the tree as always.
i recommend trying out having 2 trees. one for static, one for dynamic. thats virtually how every physcis engine manages things.

The reason for using two trees rather than just the one shared tree is three fold:
One is simply that you can optimize the static tree to be built once and never change, and optimize the dynamic tree to be updated or rebuilt quickly.
Second:
When moving or rebuilding, the cost is somewhat tied to the number of other objects in your tree. So by not including other objects in your tree you have slightly faster dynamic rebuild times.

And third is most important: it greatly speeds up your search:
Static objects don't have to do any collision checking between each other.
So by having the dynamic and static trees separate, dynamic-dynamic checks don't include the cost of static objects, and static-dynamic don't include the cost of other static objects in the tree. All your collision checks are using a smaller tree.
I update dynamic objects if their positions and aabbs change by a certain amount. This object I call aabb is not just something I use for this situation, I use it again in the broad phase of the crash tests. I need to update it anyway, while updating it, I pull the key values to a simple if query and inform the tree it belongs to if necessary.

These queries and updates can be further reduced according to the speed of the simulation. In other words, the moving object can send news to a tree in 10 steps, not 5 steps.
I take it you take care to include the velocity of you objects to extrude the AABBs so that you dont end up with an incorrect acceleration structure?

My objects have values that also have functions elsewhere, such as old_x and old_y. I use this to compare it to his current position.

The other thing I compare is the size of the aabb. These are included in the function of the aabb update I do at each step.
here is my argument: reading the position and aabb is most of the cost already.

If I do the tree from scratch at every step from beginning to end, it does a 5ms process, if I do it by editing the tree instead of always rebuilding the tree as I told you, it takes 1.2ms.
This difference in performance seems very high to me. at least, for an optimized tree. what kind of representation are you using? by which i mean, are you using a separate object for each node?
Are you using the 'link' aproach where each node points to its children?
Or are you using a flat representation? or perhaps a hybrid of the two?

In fact, Box2d's AABB Dynamic Tree algorithm is very flexible and useful for such jobs. But I could not manage that algorithm, especially the issue of keeping the tree in balance seems very complicated to me.
i'd still recomend using a BVT/BVH over a quad tree. You don't really need to keep it in balance. just using the surface area heuristic should give you good enough results already.
Quad trees are inherently not balanced, parts of the tree will be very deep, others will be shallow.

The main place I want to optimize right now is the part of an object's collision list I want from the tree, if I do this from the top of the tree, I make a very unnecessary loop. It is wasting unnecessary performance.
So you want to query your tree without starting from the top of the tree?

One structure that can do that is a spatial hash.
Essentially a grid that has a list of instances for each cell. They are very quick to build and query since determining which cell an object is in is very quick. The downside is that they use more memory.
One way to iprove their memory usage is to use a tiered system, where each grid holds another gird. (kinda like a quad tree, but have many subdivisions per node rather than just 4).
 
Last edited:

erayzesen

Member
i recommend trying out having 2 trees. one for static, one for dynamic. thats virtually how every physcis engine manages things.
What you say makes sense. But then when I get the collision lists of dynamic objects, I will do a separate loop to detect static objects. Ultimately these objects will collide with each other. Am I wrong?

I take it you take care to include the velocity of you objects to extrude the AABBs so that you dont end up with an incorrect acceleration structure?
Yes I do this and with the acceleration I show the aabb blown.

here is my argument: reading the position and aabb is most of the cost already.
Yes, what can be done instead of doing this to understand that the object is moving? After all, the object has a position, I have to look at it, and there is a rect named aabb that it has.

i'd still recomend using a BVT/BVH over a quad tree. You don't really need to keep it in balance. just using the surface area heuristic should give you good enough results already.
Quad trees are inherently not balanced, parts of the tree will be very deep, others will be shallow.
I think this is, though the quad tree is simple and useful, it is not as intended as bvh as you said. I think it will be much more optimized than the quad tree.

Essentially a grid that has a list of instances for each cell. They are very quick to build and query since determining which cell an object is in is very quick. The downside is that they use more memory.
As I mentioned before, I create an aabb object in each of my objects and refer to the tree as a reference. This aabb object already exists, not an object I created for it. When I say obj_sample.aabb.parent, it gives me the tree it depends on. Or if I want to reach the object on the tree; I'm writing aabb.owner. So I actually built the whole structure on this reference bridge. It doesn't seem unreasonable to me, I find it much more logical than using an array. I do not do search loops, I proceed with references. For example, I write the following to the function that updates the obj_sample position and sizes;
aabb.parent.check_data (aabb);
The tree on which it is located does what is necessary, if necessary.
Therefore, I can reach the tree this way, I can also reach the hill from the tree. For example, each tree branch has a reference named parent, it goes up to the upper branch.

I don't know if they're unreasonable, because search queries wouldn't be more efficient than a reference variable that would link both sides rather than different grids?

The structure I'm talking about progresses only with an object named a quadtree. I show it to visualize, the top branch and bottom branches are actually the same object, the quadtree object.

root_tree-> node_tree_01
-> node_tree_02
-> node_tree_03
-> node_tree_04
 
Last edited:

GMWolf

aka fel666
Ultimately these objects will collide with each other. Am I wrong?
static objects don't collide with themselves.

Yes, what can be done instead of doing this to understand that the object is moving? After all, the object has a position, I have to look at it, and there is a rect named aabb that it has.
my point is that if constructing your quad tree is much more expensive then that, i think you may be doing something wrong when contructing your quad tree. (if it was BVH, different story, BVH is quite a bit more compute than quad tree).

So I actually built the whole structure on this reference bridge. It doesn't seem unreasonable to me,
references are slow in generaly. your cpu will wait for thousands of cycles waiting on the memory to be fetched from a reference. An array is very fast in comparison because the data can easily be prefetched.

I write the following to the function that updates the obj_sample position and sizes;
what calls that function?

The tree on which it is located does what is necessary, if necessary.
Therefore, I can reach the tree this way, I can also reach the hill from the tree. For example, each tree branch has a reference named parent, it goes up to the upper branch.
so now you throw recursive function calls into the mix? a function call is infinitely more expensive than a loop iteration. Especially if its a method! oh boy, methods are slow!
I don't know if they're unreasonable, because search queries wouldn't be more efficient than a reference variable that would link both sides rather than different grids?
Its a bout two things: Data layout, and data creation/destruction.
Having a bunch of nodes referencing each other is not a great layout. its also expensive to construct and destruct the nodes.
 

erayzesen

Member
my point is that if constructing your quad tree is much more expensive then that, i think you may be doing something wrong when contructing your quad tree. (if it was BVH, different story, BVH is quite a bit more compute than quad tree).
Isn't it logically the most expensive thing to create 4 trees at every step? By not doing this, I am providing 5-fold optimization. What do you think wrong with that?
references are slow in generaly. your cpu will wait for thousands of cycles waiting on the memory to be fetched from a reference. An array is very fast in comparison because the data can easily be prefetched.
I don't think array loops are any better than this, I also give you an example for this; when you start from the top of the tree with the string loop to get the collision list (I'm talking about the 3-4 nodes to go) it costs you a lot.
I could join you in another language, in the environment, about the array. But when you make frequent references in gms with array loops, this is not an optimized option.

In the meantime, the processor takes over the reading process in the memory, but the processor also undertakes the search process and functions. In the option you mentioned, you both read the references and have the conditions calculated for it. In the option I mentioned, your reference is ready in memory, you sacrifice ram and save countless processor loads.

what calls that function?
I loop all the objects one step at a time, these objects are stored in an array.

so now you throw recursive function calls into the mix? a function call is infinitely more expensive than a loop iteration. Especially if its a method! oh boy, methods are slow!
I'm not using a recursive loop, I'm just describing the structure as an example. You can reach the top from a tree branch with this reference. What is correct? Saving all trees in an array and searching within? Also, the recursive functions where needed is not a bad thing. The important point here is to use it only where it is needed.

Its a bout two things: Data layout, and data creation/destruction.
Having a bunch of nodes referencing each other is not a great layout. its also expensive to construct and destruct the nodes.
I guess what I wrote is not well understood by you. Why should I delete nodes? It is also interesting that you find the referencing structure bad, many game engines rely on this reference structure. root.child1.child2 .... I don't make sense to find it more economical to search for arrays for each objects instead.
 
Last edited:

GMWolf

aka fel666
Isn't it logically the most expensive thing to create 4 trees at every step? By not doing this, I am providing 5-fold optimization. What do you think wrong with that?
well if creating trees is expensive yes. As i mentioned before, i suspect your node creation is very expensive, hence i suggest a flat representation. For example you could use a list, where each entry is the index of the child nodes.
You would group child nodes together in groups of 4. so the children indices of node "index" would be { list[| index], list[| index] + 1, list[| index] + 2, list[| index] + 3 }.

Another aproach is to use a map, where the keys represent the node. because each quad tree node has 4 children, its easy to represent each node key with a binary number, each group of two bits represents one of the children.
for example 00 01 11 10 would be {00: top left, ==> 01: top right ==> 11: bottom right ==> 10 bottom left.
The advantage of this is that its really easy to go up the tree (shift right) and back down (add bits on the end).

Because a quad-tree is completely implicit (you can know a nodes size just by knowing its key and the size of the root) its really easy to traverse just by checking if keys exist.

your reference is ready in memory,
your reference is hard for the cpu to prefetch, so it has to wait hundreds of cycles for the memory to be fetched.

In the meantime, the processor takes over the reading process in the memory, but the processor also undertakes the search process and functions.
CPUs are pipelined, if their next task doesnt rely on something being read from memory, it can keep doing a bunch of maths. So if you are fetching data for object 1, you can calculate object 0 whilst you wait.
If however you fetch data, then test it, then do something, the cpu can do some speculative execution but you will probably also end up stalling it somewhat. Especially if you do dependent memory reads (derefencing a value you get from a reference, then the cpu cant do that much but wait)

I don't think array loops are any better than this, I also give you an example for this; when you start from the top of the tree with the string loop to get the collision list (I'm talking about the 3-4 nodes to go) it costs you a lot.
im suggesting creating a tight loop around your objects rather than using methods. methods are slow because you don't really know what you are executing next.
If instead you have a loop that contains all your logic here and there, you can get much better perf.


I loop all the objects one step at a time, these objects are stored in an array.
here is your loop. the loop you said you didn't have. its just higher up your call stack.

I'm not using a recursive loop,
For example, I write the following to the function that updates the obj_sample position and sizes;
aabb.parent.check_data (aabb);
The tree on which it is located does what is necessary, if necessary.
Therefore, I can reach the tree this way, I can also reach the hill from the tree. For example, each tree branch has a reference named parent, it goes up to the upper branch.
it sounds to me like aabb.parent.check_data(...) can itself call parent.check_data if needed. that's recursive right?

What is correct? Saving all trees in an array and searching within?
all trees? what do you mean all trees? i am so confused now....

I guess what I wrote is not well understood by you. Why should I delete nodes?
if you create or destroy objects your tree will change.
Also when creating the tree the first time, you will create nodes.
When deleting the tree, if you change levels, you will leave a ton of nodes lying around, now the garbage collector has a lot of work to do cleaning that mess up.

eferencing structure bad, many game engines rely on this reference structure.
And yet you seem to be having performance issues 🤔

don't make sense to find it more economical to search for arrays for each objects instead.
its the whole linked list vs array list thing. Sure on paper a linked list has o[1] add and remove, but in real life an array list in many many times faster than a linked list.

Also i'm not suggesting you do a linear array search. that would be slow.

One structure that can do that is a spatial hash.
Essentially a grid that has a list of instances for each cell. They are very quick to build and query since determining which cell an object is in is very quick. The downside is that they use more memory.
I would really recommend you look into this if you want to speed up your query.
its super trivial to figure out what cells your AABB covers, and then you have nice linear lists of objects to do your narrow phase against.
Its not sexy, its not high tech, but its hella fast.
It kinda breaks down for general purpose physics engines where you have to cover a wide range of applications, but if you have full control over how things are created you can tune the grid size etc to get very good behavior.



I think it would be helpful if you posted some code to demonstrate fully how your system currently operates.
 

erayzesen

Member
well if creating trees is expensive yes. As i mentioned before, i suspect your node creation is very expensive, hence i suggest a flat representation. For example you could use a list, where each entry is the index of the child nodes.
If you create the tree 1 time at the beginning, there is no problem. In the teachings, they say make the tree from scratch at every step, but a system like I do is followed in the resources that aim more optimization. Because it is more convenient to arrange the tree according to need, rather than always rebuilding the tree. I could not understand the part you find wrong about this.

it sounds to me like aabb.parent.check_data(...) can itself call parent.check_data if needed. that's recursive right?
No, "parent" is a variable in the tree. check_data function is checking to the status of aabb in the node. I call this function when aabb properties is changed.
No loops, no searches in the tree.
step->for loop all objects->object-> update properties-> update aabb-> (if aabb is changed) tree of aabb-> check_data
That's all.

And yet you seem to be having performance issues
I don't have performance problems other than the search for a arrays from the top of the tree I mentioned. To get the list, I need to make less loop loops, less processing. The system I'm talking about is the same as in some of the teachings, and it's not very optimized.

here is your loop. the loop you said you didn't have. its just higher up your call stack.
I use loops where necessary. But I try not to use it for things to do at every step.

if you create or destroy objects your tree will change.
Also when creating the tree the first time, you will create nodes.
When deleting the tree, if you change levels, you will leave a ton of nodes lying around, now the garbage collector has a lot of work to do cleaning that mess up.
its the whole linked list vs array list thing. Sure on paper a linked list has o[1] add and remove, but in real life an array list in many many times faster than a linked list.
I didn't use a linked list.
I'm making variables that hold the addresses of objects in memory. It's not like this in GMS, but think like this;
owner-00fff777
parent-0033322
If deleting a branch of a tree results in deleting these 2 variables, which makes 8 out of 4 parts, yes I can afford it. Because instead I'll have to do this;
for (var i = 0; i <tree_count; i ++) (
// ...
for (var n = 0; n <data_count; n ++) (
//
if (obj == robj) {
// do it something
}
}
}
Each branch has an array that carries the data, you have to make this container.

So I can't reconcile the structure with what you're describing.

Your suggestions are based entirely on ram memory, I don't have much trouble with that. The only benefit of making a lighter tree object might be less ram usage. In fact, my ram usage is very little. 19 MB on the blank screen, 20 MB with all of these. In addition, there is no accumulation or leakage, I follow this as well.
 
Last edited:

GMWolf

aka fel666
Your suggestions are based entirely on ram memory, I don't have much trouble with that.
My suggestions are not about reducing memory usage.
My suggestions are about improving layout in memory.
Because CPUs can do lots of maths very fast, but not if the memory isn't layed out nicely.
If you have lots of references and pointers, the CPU won't be able to do much maths because it will have to wait for the data to reach it.
If you lay out the data nicyz the CPU doesn't need to wait as much for the memory and so it can do more maths.



No, "parent" is a variable in the tree. check_data function is checking to the status of aabb in the node. I call this function when aabb properties is changed.
No loops, no searches in the tree.
step->for loop all objects->object-> update properties-> update aabb-> (if aabb is changed) tree of aabb-> check_data
That's all.
Could you then share your current code to do the broad phase?


If you create the tree 1 time at the beginning, there is no problem. In the teachings, they say make the tree from scratch at every step, but a system like I do is followed in the resources that aim more optimization. Because it is more convenient to arrange the tree according to need, rather than always rebuilding the tree. I could not understand the part you find wrong about this.
What I'm saying is that the very long tree build times is an indication to me that they are not very efficient.
And that will mean they are not very efficient to query either.

The system I'm talking about is the same as in some of the teachings, and it's not very optimized.
The teachings often talk about complexity in terms of big O notation.
That's what you mean by less loops.
Unfortunately the teachings forget that real life CPUs are constrained by real life physics.
That's what I meant by the whole "linked list Vs array list". I didn't say you were using a linked list.
I was just saying linked lists are many times slower than array list, despite them using less loops.


Because instead I'll have to do this;
for (var i = 0; i <tree_count; i ++) (
// ...
for (var n = 0; n <data_count; n ++) (
//
if (obj == robj) {
// do it something
}
}
}
Why do you need to do this instead?


Note: all my advice above is mostly about quad trees, except when I was specifically talking about grids.

So I can't reconcile the structure with what you're describing.
The grid structure I'm talking about is super simple.

You have a grid that covers.your whole map. Each cell covers a portion of the map.
Each cell contains a list.

To get all the cells the aabb is in, you can do something like this:
Code:
for(var i = aabb.miny % cellSize; i < aabb.maxy % cellSize +1; i++)
for(var j = aabb.minx % cellSize; j < aabb.maxy % cellSize +1; j++)
{
   var list = grid[# j, i];
}
Notice these loops are very small. If your object is smaller than a cell, each loop will at most be two iteration.
Then you do a linear search through that list. But that list holds very few elements (I'd you tune your cell size correctly) so its all good.
You can tune the cell size to be as large as possible, while having not too many objects in it. If you can't control where the objects are very well, but you know objects don't overlap, a size of 2x an object can work well.

For very large world's, memory will become an issue (this time I mean you will run out) so having two or three grid levels can work well. (Don't have smaller grids in empty cells). I wouldn't use more than three levels of grids though.

Things to notice about the grid structure:
Finding the grid cells is O[1]. It doesn't matter how many objects you have. It will ways be the same cost. Better yet: the CPU can quite easily prefetch the memory: there is one level of indirection only, and most CPUs can deal with that.
The linear search for the arrays are not really part of the broad phase, they are part of the narrow phase.
Basically to get collision pairs for narrow phase you can loop over every object, check what cells they overlap, check collisions with the objects stored in the lists.
What's nice is that you have easily accessible lists already, so you don't have to construct a new list in the broad phase to give to the narrow phase, because the broad phase is already built of lists.

With a little work you can inline the actual object AABBs into the lists (by having multiple lists for instance) which may speed up your narrow phase.
 

erayzesen

Member
My suggestions are not about reducing memory usage.
My suggestions are about improving layout in memory.
Because CPUs can do lots of maths very fast, but not if the memory isn't layed out nicely.
If you have lots of references and pointers, the CPU won't be able to do much maths because it will have to wait for the data to reach it.
If you lay out the data nicyz the CPU doesn't need to wait as much for the memory and so it can do more maths.
What do you mean by Nicyz?

Could you then share your current code to do the broad phase?
In the broad stage, aabb objects take the aabb collision test.

if(obj_a.aabb.is_collision_with(obj_b.aabb) ){
//start narrow test
}

The grid structure I'm talking about is super simple.

You have a grid that covers.your whole map. Each cell covers a portion of the map.
Each cell contains a list.
I thought of the grid system, but it is much more costly and not flexible. I even visualized it with the queries you told. Objects have variable dimensions, it is recommended to create a grid for the largest object as a solution. Even this will be a huge grid for troublesome, many times smaller objects.

To get all the cells the aabb is in, you can do something like this:
It's a more troublesome system than Quadtree.

When I say that the examples you have given are not very comprehensive, I am talking about this;

You cannot pull objects from a single grid. Some objects are between 2 grids, some are between 4 grids. This time you may have to scan 4-5 neighbors of your target grid. Some objects can fall between grid lines and may not be assigned to any grid. Imagine a 16x16 object versus a 128x128 object and things get even more trouble. This time, the 16x16 object will collision test with all other small objects in the cells covered by your large object.

In Quadtree this structure has already been thought out, so it's nicer than the grills. In other words, it remains in the upper tree that does not fit into the lower trees of the 4 tree, and the upper tree that does not fit into it. And in the meantime, you collide the remaining objects with the objects in the sub tree, which is necessary anyway. Grids can still be useful if there are fixed size objects. If I had predictable or fixed size objects, I could think.
 
Last edited:

GMWolf

aka fel666
What do you mean by Nicyz?
*Nicely. Sorry Im Bad at typing.


Why do you recommend a grid? It's a more troublesome system than Q
Because in my experience it's quite efficient.
Certainly more efficient than a naive quad tree using references between nodes.

As stated: I think you will need to rearchitechture your quadtree to not be references between nodes with AABBs to make it efficient. Using a more implicit representation for instance.


You cannot pull objects from a single grid. Some objects are between 2 grids, some are between 3 grids. This time you may have to scan 4-5 neighbors of your target grid.
You just have to scan the grid cells the instance overlaps. Those grid cells hold all the instances that overlap it. In practice that's 2/3 cells.


In Quadtree this structure has already been thought out, so it's nicer than the grills. In other words, it remains in the upper tree that does not fit into the lower trees of the 4 tree, and the upper tree that does not fit into it.
So you are saying not all you objects are in leaf nodes?
The issue here is that the tree is not balanced. Different parts of the tree have different depths.
That means that you can't do a full query without some form of traversal: you cannot get the leaf nodes an AABBs intersects without checking if those leaf nodes exist in the first place.
This can be remedied by using an essentially complete tree. But now you basically have a grid.

With the structure I propose, you have a known depth. Which means you can get the "address" of an entry in a sine step, no need to go up/down the tree.


And please, again, would you share the code you are currently using for your broad phase so we can get a better understanding of how it currently operates?
 
Last edited:

erayzesen

Member
So you are saying not all you objects are in leaf nodes?
The issue here is that the tree is not balanced. Different parts of the tree have different depths.
That means that you can't do a full query without some form of traversal: you cannot get the leaf nodes an AABBs intersects without checking if those leaf nodes exist in the first place.
It is quite balanced actually. This is the most advantageous part of Quadtree. You don't have to worry about tree balancing.

The thing is, does your object fit on the target tree? These objects are moving, and if they don't fit, your object goes to the upper tree. That's why you have incredible freedom in object sizes in quadtree.

And please, again, would you share the code you are currently using for your broad phase so we can get a better understanding of how it currently operates?
When you say the large phase, which part do you refer to, the tree? My broad phase collision test consists of the aabb collision test. A standard aabb collision function.

The only thing to worry about is empty cells, the only tree algorithm that can solve it is the aabb dynamic tree or the bvh method you mentioned (it is actually the same) . If you are really obsessed, it is the best solution. Box2d and many physics engines use it. But balancing the tree is not easy, you have to set certain parameters. I think I will not be able to overcome it in a short time. In BVH If you do not balance the tree, your query and cycle numbers will swell, your lists will swell, you will not be able to get efficiency.

So I can optimize and use my current tree and it might be enough for me. If I can have a lot of time in the future, I will switch to bvh.
 
Last edited:

GMWolf

aka fel666
The thing is, does your object fit on the target tree? These objects are moving, and if they don't fit, your object goes to the upper tree. That's why you have incredible freedom in object sizes in quadtree.
So your objects are not all on leaf nodes, so you have to look at inner nodes as well as leaf nodes. Same inherent issue with imbalance trees.
Also how is a quad tree balanced?
If your quad tree is balanced, you just have a grid!

You don't have to worry about quad tree balancing precisely because you cannot balance a quad tree. That is a big disadvantage of quad trees.

My broad phase collision test consists of the aabb collision test. A standard aabb collision function.
Broad phase would be how you use that tree. How do you query potential collisions before doing the aabb test?
Are you using a translator? I never said large phase. I said broad phase.
 

erayzesen

Member
So your objects are not all on leaf nodes, so you have to look at inner nodes as well as leaf nodes. Same inherent issue with imbalance trees.
Also how is a quad tree balanced?
If your quad tree is balanced, you just have a grid!
All objects are in leaf nodes. There is no bad or unstable situation in this, it has to be, I want this. Because the in-between object in the top tree is a potential candidate for collision.

Broad phase would be how you use that tree. How do you query potential collisions before doing the aabb test?
GML:
function pysics_collisions(){
    collision_count=0;
    for(var i=0;i<array_length(objects);i++){
        var obj_a=objects[i];
        var col_obj_list=[]
        root_tree.retrieve(obj_a.aabb,col_obj_list);
        for(var j=0;j<array_length(col_obj_list);j++){
            collision_count++;
            var obj_b=col_obj_list[j];
            if(obj_b==obj_a)continue;
            if(!obj_a.aabb.is_collided(obj_b.aabb))continue;
            
            if(obj_b.type==physics_object_types.circle){
                collision_w_circle(obj_a,obj_b);
            }else{
                collission_polygon(obj_a,obj_b);
            }
            
        }
        
                
    }
}
Are you using a translator?
Yes, I use it for practical purposes : )
 

GMWolf

aka fel666
A few small optimizations:
Move col_obj_list out of the outer loop so you don't keep creating a destroying an array.
Avoid methods, try to use normal functions instead.

But I guess the most important function I here is root_tree.retrieve. how does that work?
 

erayzesen

Member
A few small optimizations:
Move col_obj_list out of the outer loop so you don't keep creating a destroying an array.
Yep, you're right. I will change it.

But I guess the most important function I here is root_tree.retrieve. how does that work?
It's terrible. It's a standard retrieve function of quadtree tutorials. I don't work on this yet. But I will work on it.

How it works, just like that;
GML:
public List retrieve(List returnObjects, Rectangle pRect) {
   int index = getIndex(pRect);
   if (index != -1 && nodes[0] != null) {
     nodes[index].retrieve(returnObjects, pRect);
   }

   returnObjects.addAll(objects);

   return returnObjects;
}
more detail is on here.

I told you about this before, that's the only part I'm not satisfied with, but I believe I can fix it.


Avoid methods, try to use normal functions instead.
Yes, maybe. I'm also skeptical about this, can you explain this to me a little bit? I came from these traditional programming languages, oop, class, polymorphism, morhpism are things that I like.

These do not hurt in terms of optimization in other languages, it is standard. In these languages, only the bytes of objects in memory are reduced for optimization. For this, a minimum number of variables and variable types that take up little memory are used.

Now there is gms update that enables this, I love the oop structure. What is the negative impact of this on performance? If so, what is the purpose of this update? What is the point if we are not going to use it?
 
Last edited:

GMWolf

aka fel666
So yeah tour retrieve function IS recursive.

Try to implement it with a loop and a stack instead.

What was all that stuff about aabb.parent etc, ?



Yes, maybe. I'm also skeptical about this, can you explain this to me a little bit? I come from these traditional programming languages, oop, class, polymorphism, morhpism are things that I like.

These do not hurt in terms of optimization in other languages, it is standard. In these languages, only the bytes of objects in memory are reduced for optimization. For this, a minimum number of variables and variable types that take up little memory are used.

Now there is gms update that enables this, I love the oop structure. What is the negative impact of this on performance? If so, what is the purpose of this update? What is the point if we are not going to use it?
In C++, Java, C# etc polymorphic functions DO have a cost.
In java and c# they still end up being used, the hope being that the jit will take care of it. But even then people will implement code weaving techniques to monomorphism function calls in tight loops.
In c++ they are avoided all together. Only used when necessary.

The problem with polymorphic functions is that the CPU doesn't know what it's about to execute until the address of the function is resolved, and so it stalls the entire pipeline.

This is usually not an issue, butnif you have a tight loop, you really should avoid them as their costs add up.

The new GML updates are useful, there are lots of places where polymorphism is useful. But in a tight loop you want to avoid it.




Another thing to note is nodes don't need to store their size and position.
When you traverse the tree you can easily calculate the width and position of each node you visit.
 

erayzesen

Member
What was all that stuff about aabb.parent etc, ?
I need that part to organize the tree.

When aabb properties of an object change, I notify the tree branch it is in. If the new aabb does not fit into the tree branch, it sends it to the parent. So instead of controlling everything, I make a specific arrangement. I need a parent for this too. I need this if I'm going to arrange the tree.

Because I go directly to the relevant tree, I am operating there. This makes everything simple.

Another thing to note is nodes don't need to store their size and position.
When you traverse the tree you can easily calculate the width and position of each node you visit.
These frequent calculations are very expensive and folded, if you want, give this a try. Maybe the reason of this we are make tests with vm. But there is such a fact that it is impossible not to worry about the processor.

Anyway, the purpose of using these trees is nothing but to relieve the processor side. A collision test is based on the sap algorithm and is expensive. In fact, vector calculations are not very expensive, but if you do this frequently and continuously, if you have a lot of objects, it is expensive for the processor.

Here, we organize an organization, waive the ram, relax the processor.
 
Last edited:

GMWolf

aka fel666
These frequent calculations are very expensive and folded, if you want, give this a try. Maybe that's why we do our tests with vm. But there is such a fact that it is impossible not to worry about the processor.
Are you not using YYC? You should use the YYC. It has different performance characteristics.
For instance fetching data can be more expensive than calculating it. Especially if the calculation is a couple bit shifts and additions.


Here, we organize an organization, waive the ram, relax the processor.
I'm actually saying it would be easier for the processor to just calculate the node size as you traverse the tree rather than fetching it from ram. Ram is located far away from the CPU. It takes time for the data to reach the CPU. We are talking about hundreds of nanoseconds. A CPU does hundreds if not thousands of instructions in that time.
On VM maybe not, but with YYC, doing a couple operations is faster than fetching some data from ram.
 

erayzesen

Member
Are you not using YYC? You should use the YYC. It has different performance characteristics.
For instance fetching data can be more expensive than calculating it. Especially if the calculation is a couple bit shifts and additions.
I know yyc but the idea of writing a library relying on it is not good for me.

I'm actually saying it would be easier for the processor to just calculate the node size as you traverse the tree rather than fetching it from ram. Ram is located far away from the CPU. It takes time for the data to reach the CPU. We are talking about hundreds of nanoseconds. A CPU does hundreds if not thousands of instructions in that time.
On VM maybe not, but with YYC, doing a couple operations is faster than fetching some data from ram.
What you say is not unreasonable but I have had other experiences in gms. Performing important tests with vm may also be a factor in this.

I may look very stubborn, but you can be sure that I will try and consider your idea. I am always ready to try, I will cover all of these in the coming process.
Even, I will write my experiences on this subject here.
 

GMWolf

aka fel666
I know yyc but the idea of writing a library relying on it is not good for me.
Why? YYC is the only way to get good performance GML.


What you say is not unreasonable but I have had other experiences in gms. Performing important tests with vm may also be a factor in this.
Yes I'm assuming you were using the YYC. The VM is a lost cause. I don't have much experience profiling anything with the VM.


I may look very stubborn, but you can be sure that I will try and consider your idea. I am always ready to try, I will cover all of these in the coming process.
Even, I will write my experiences on this subject here.
:)
 
Top