GMS 2 Custom A* pathfinding + variations (HPA*, JPA etc)


Hi all,

This is mainly a discussion/presentation of ideas thread. There are lots of good pathfinding concepts and I am not pretending to know all about them. Hopefully this thread will grow to a meaningful exchange of knowledge. I will begin my approach from the basics and move up to more complicated concepts in future posts.

The past few days I have began prototyping a grid-based pathfinding system.The idea originated from Rimworld, which uses a fairly large map and tens of agents, however I have added some more goals to spice things up:
  • Multiple grids of variable size and relative locations to each other. Agents will be able to "jump" from grid to grid if they are close enough. Imagine something like jumping from an asteroid to a nearby one with your jetpack.
  • Portals/teleporters and/or muti-floor (like subterranean levels below ground)
  • Variable-cost terrain
  • Dynamically changing obstacles
  • Speed-efficient. Ideally, agent pathfinding search should be queued (ie only a certain agent(s) during a step can calculate a new path)
  • 8-directional movement
I began with implementing the basic A*. Since speed is more of an issue than memory, I opted for creating a node for each grid cell and precalculating its neighbors. Re-adjusting node neighbors when adding/removing an obstacle is easy and cheap to do in real-time. I tried some other approaches in order to minimize hand-written iteration code. For this purpose, I took the following measures:
  • Each node has a "list" flag indication which list it belongs to: Open, Closed or None.
  • Used a priority list along with a normal list for the Open nodes. In this sense, the Open list is probably redundant but it helps when checking which list a node belongs to, since lists have always different indices, plus the Open list contents can be used later for cleanup/resetting explored nodes. I haven't tested if it will be as fast to cleanup from a priority list but I have my doubts.
With these steps, hand-written iteration is nonexistent (save for cleaning up which has a minimal impact in total run time anyways).

The basic node creation code is this:
EDIT: Code updated Jul 14 2019.
Argument0: grid id
Argument1: grid i
Argument2: grid j

var array;


return array;

And the pathfinding code is this:
Argument0: path list id
Argument1: start node
Argument2: end node
Argument3: (optional) left border
Argument4: (optional) top border
Argument5: (optional) right border
Argument6: (optional) bottom border

var list_inode=argument[0];
var node_start=argument[1];
var node_end=argument[2];
var border_left=-infinity;
var border_right=infinity;
var border_top=-infinity;
var border_bottom=infinity;
if (argument_count>3) then {
var stack_cleanup=global.stack_cleanup;
var plist_open=global.pf_plist_open;
var endx=node_end[c_pfi];
var endy=node_end[c_pfj];
var list_child=global.pf_list_nodechild;
var list_childdist=global.pf_list_nodechilddist;
var tt=0,t1,t2;
var i,nodelist;
var node,node_current;
var size,tg,h;
var psize=1;

while (psize>0) {
    if (node_current==node_end) then {
        while (node_current<>-1) {


    var xx,yy;
    for (i=0;i<size;i+=1) {
        xx=node[c_pfi]; yy=node[c_pfj];
        if (xx>=border_left && xx<=border_right && yy>=border_top && yy<=border_bottom) then {
            if (nodelist==c_closedlist) then {
            } else {
                tg=node_current[c_pfg] + (node[c_pfcost]*list_childdist[|i]);
                if (nodelist==c_openlist) then {                
                    if (tg<node[c_pfg]) then {
                } else {
                    node[@c_pfg] = tg;
                    node[@c_pfh] = h;
                    node[@c_pfparent] = node_current;


Node resetting script (pf_find_cleanup):
var stack_cleanup=global.stack_cleanup;
var plist_open=global.pf_plist_open;
var node;

while ds_stack_size(stack_cleanup)>0 {


I have used the Euclidean heuristic. The square root may seem slow, but contrary to what certain websites state, using the "square root free" version can actually botch the path since g and h will then be expressed in different metrics. In addition, the slow part will always be the large amount of explored nodes, calculating a cheaper heuristic doesn't make much of a difference. Finally, I want the very same algorithm to be used in HPA* (discussed later), where the Euclidean heuristic works best due to the spreading of abstract nodes.


All tests are run in YYC. As you can see, the algorithm takes ~67 milliseconds on a 100x100 grid to reach a faraway point. The turquoise grid and blue dots you see in the screenshot are used in HPA*, ignore them for now. Also, The path list omits the last (goal) node for now, I will fix that later.

Obviously, 67 milliseconds for just a 100x100 map seems like a lot of time, it's definitely prohibitive for realtime calculation. Even if we split that up in multiple steps, it would still take some frames for an agent to start moving. I am trying to avoid such an approach if possible.

EDIT (Jul 14 2019): Same path is calculated in ~29 ms with the new code.

So I will leave you with this first post and add some questions for discussion:
  • Time-wise, is the algorithm I wrote efficient? 67 milliseconds looks like a lot, I suspect that e.g. in C++ it would be a lot less. Is my assumption true? Is YYC inherently slower, or have I made some undetectable mistake?
  • Do you have to propose other optimisation concerning the "Vanilla" A* algorithm? I am not talking (yet) about other hybrid methods like hierarchical A* or Theta or JPS, I am still talking about the "bare-bones" A* code.
  • If you have written your own A* pathfinding code, what were your own times, and how did you achieve them? Again, since this post covers only the A* approach, try to stick with A* for now.
Last edited:


Heavy metal viking dentist
GMC Elder
I'm working on my own implementation of A* as well, and your code isn't very different from mine. I'm far from an expert in this field, I've pretty much only used the Wikipedia page on A* to make my solution, but I think I've found a few ways to make the algorithm faster.
Our implementations are similar, but the differences I can spot are as follows:
  • You remove any info you've created during the pathfinding, after the algorithm is done. My implementation only clears the necessary information before starting the algorithm, potentially saving a lot of processing. You only really have to reset the parent, gscore and fscore of the first node you add to the open priority data structure. Previous gscore only matters when a node has already been added to the open set, and the gscore is overwritten when writing the node to the open set, so the previous value before it was in the open set doesn't really matter.
  • You use a ds_list to figure out which set the node belongs to. I use a ds_map for the closed set. If the node exists in the closed ds_map (ie. if closedSet[? nodeID] is not undefined), the node has already been checked and the loop can continue. Checking whether or not a value exists in a ds_map is very fast. Doing it this way will also make it easier to use the method in the previous point, since you won't have to manually reset the nodes afterwards. You can simply clear the ds_map instead.
  • I don't entirely understand what pf_getneighbors does, but that seems like it could slow down the algorithm considerably. If memory management isn't critical, I'd rather recommend keeping a list of available neighbours in the node structure itself.
Since our maps differ vastly, and we use different computers, this cannot be used for direct comparison, but it took the YYC 21 milliseconds to find this path:

When using the square of the distance, rather than the actual distance, for the heuristic, the pathfinding is sped up by a factor of four, and a similar but slightly more erratic path is found in 5 milliseconds:

You could post a benchmarking executable of your algorithm so that I could test how the speed of your implementation.

I'm planning to release my A* implementation soon, but for now, here's my pathfinding script in its entirety:
/// @description navmesh_find_path(navMesh, x0, y0, z0, x1, y1, z1, exact)
/// @param navMesh
/// @param x0
/// @param y0
/// @param z0
/// @param x1
/// @param y1
/// @param z1
/// @param exact
    An implementation of the A* algorithm
    Returns an array containing the 3D positions of all the pathpoints, including the start and end positions.
    The start and end positions will be clamped to their respective nearest triangles.
    This assumes that the player can move in a straight line from the starting point to the first node,
    as well as in a straight line from the final node to the destination.
    Returns: Array if successful, -1 if not
    Script created by TheSnidr
var startTime = current_time;
var navMesh = argument0;
var x0 = argument1;
var y0 = argument2;
var z0 = argument3;
var x1 = argument4;
var y1 = argument5;
var z1 = argument6;
var exact = argument7;

var nodeList = navMesh[eNavMesh.NodeList];

//Clear necessary data structures

//Find starting and destination quads.
var startQuad    = navmesh_find_nearest_quad(navMesh, x0, y0, z0);
var endQuad        = navmesh_find_nearest_quad(navMesh, x1, y1, z1);
if (!is_array(startQuad) || !is_array(endQuad))
    show_debug_message("navmesh_find_path: Could not find start or end point. Returning -1.");
    return -1;

//Clamp the start and end coordinates to their respective triangles
x0 = startQuad[4];
y0 = startQuad[5];
z0 = startQuad[6];
x1 = endQuad[4];
y1 = endQuad[5];
z1 = endQuad[6];

//Check if the start and end triangles contain the same corner points. If they do, we can move in a straight line from start to finish.
var i, j;
for (i = 0; i < 4; i ++){
    for (j = 0; j < 4; j ++){
        if (startQuad[i] == endQuad[j]){break;} //Check if this node exists in both the start and end point
    if (j == 4){break;} //If the node does not exist in both sets, break the loop early
if (i == 4){
    //show_debug_message("navmesh_find_path: Start and end points are in the same convex shape (triangle or quad), and we can move in a straight line.");
    return [x0, y0, z0, x1, y1, z1];}

//Add the gscore and fscore of the nodes in the starting quad
var node, fScore;
var heuristic = 0;
for (i = 0; i < 4; i++)
    node = startQuad[i];
    heuristic = sqr(x1 - node[0]) + sqr(y1 - node[1]) + sqr(z1 - node[2]);
    node[@ eNavNode.CameFrom] = undefined;
    node[@ eNavNode.Gscore] = point_distance_3d(node[0], node[1], node[2], x0, y0, z0);
    fScore = (exact ? sqrt(heuristic) : heuristic);
    ds_priority_add(global.NavMeshOpenSet, node[eNavNode.ID], fScore);

//Start looping through all reachable points
var current, nbArray, dtArray, nbNum, gscore, tent_gscore, old_gscore, not_in_open_set, ID, nbID, dx, dy, dz;
while (ds_priority_size(global.NavMeshOpenSet) > 0)
    //Find the node in the open set with the lowest f-score value
    ID = ds_priority_delete_min(global.NavMeshOpenSet);
    current = nodeList[| ID];
    if (current == endQuad[0] || current == endQuad[1] || current == endQuad[2] || current == endQuad[3]){break;}
    global.NavMeshClosedSet[? ID] = true;
    //Loop through the neighbours of the node
    gscore  = current[eNavNode.Gscore];
    nbArray = current[eNavNode.NbArray];
    dtArray = current[eNavNode.DtArray];
    nbNum = array_length_1d(nbArray);
    for (i = 0; i < nbNum; i ++)
        node = nbArray[i];
        nbID = node[eNavNode.ID];
        //Continue the loop if the node is in the closed set
        if !is_undefined(global.NavMeshClosedSet[? nbID]){continue;}
        //Check whether or not this is a better path
        tent_gscore = gscore + dtArray[i];
        not_in_open_set = is_undefined(ds_priority_find_priority(global.NavMeshOpenSet, nbID));
        if !not_in_open_set and tent_gscore >= node[eNavNode.Gscore]{continue;} //This is not a better path
        //This path is the best until now. Record it!
        heuristic = sqr(x1 - node[0]) + sqr(y1 - node[1]) + sqr(z1 - node[2]);
        node[@ eNavNode.CameFrom] = current;
        node[@ eNavNode.Gscore] = tent_gscore;
        fScore = tent_gscore + (exact ? sqrt(heuristic) : heuristic);
        if not_in_open_set
            ds_priority_add(global.NavMeshOpenSet, nbID, fScore);
            ds_priority_change_priority(global.NavMeshOpenSet, nbID, fScore);

//If the last node is not in the end triangle, we couldn't find any valid paths
if (current != endQuad[0] && current != endQuad[1] && current != endQuad[2] && current != endQuad[3])
    show_debug_message("navmesh_find_path: Could not find any valid paths, returning -1.");
    return -1;

//Read all the path points from the ds_map backwards
var backwardspath = ds_list_create();
while !is_undefined(current)
    ds_list_add(backwardspath, current);
    current = current[eNavNode.CameFrom];
var num = ds_list_size(backwardspath);

//Initialize the path point array
var size = num * 3 + 6;
var path = array_create(size);
path[0] = x0;
path[1] = y0;
path[2] = z0;
for (var i = 0; i < num; i ++)
    node = backwardspath[| num-i-1];;
    path[i * 3 + 3] = node[0];
    path[i * 3 + 4] = node[1];
    path[i * 3 + 5] = node[2];
path[size - 3] = x1;
path[size - 2] = y1;
path[size - 1] = z1;

//Clean up

show_debug_message("navmesh_find_path: Found shortest path in " + string(current_time - startTime) + " milliseconds.");
return path;
It's made to work in 3D as well:
Last edited:



Interesting post! I have a potato PC with an Intel Quad core Q9550. I have uploaded my .exe here, let me know how much time (3rd line on the right of the screen, in microseconds) it takes you to go from the bottom-left of the map to the upper right.
Shift-Lclick to set start point, Shift-Rclick to set end point, press Space to run.

I improved my originally posted algorithm by finally removing the open list altogether, this made the algorithm run 10-15% faster. Without turning a blind eye to your observation about cleanup, so far upon benchmarking it's like 1/25 of the entire run time, so I was looking for other ways to shave off more time from other parts of the code. Also in my version neighbors are pre-calculated. pf_getneighbors() just returns the neighbor list, the reason it exists is in order to get any extra neighbors in the future (portals, nearby separate grids etc, as per the goals I set in my first post). Inlining its content has almost zero impact on speed.

EDIT: Simply using the "standard" node neighbors list, there is a 40% increase in speed. Perhaps "special" node neighbors like the ones I suggested above must be pre-inserted too.

My next presentation was going to be about HPA*. It's a simple yet effective way to speed up pathfinding on large maps. The only issue I have so far is that it takes a lot of time to reconstruct edge nodes when adding/removing an obstacle on an edge area. Then I got caught with Jump Point Search. I had mixed results with it. The out-of-the-box version was 2-10x slower than vanilla A*. With some preprocessing (the kind that is easy to recalculate online if needed) and taking into account Harabor et al. suggestion of intermediate node pruning, I managed to get it to run 2-10x faster than vanilla A*. Introducing weighted maps proved to be a bit problematic for JPS, in the sense that it kills any speed gains by having pre-identified points of interest on the map.

Ultimately, I suspect that I may have introduced several major bottlenecks in my code. First of all, nodes are actually lists. Each element corresponds to a different property (cost,x,y,neighbors list etc). I don't think that Game Maker likes frequent list lookups. I tried to only read from a list only when absolutely necessary, even used bit compression for Jump Point Search, but it looks like if at any given grid square one does more than a single lookup, speed goes down the drain. I am really not sure how to approach this. It is easy to make a quick prototype where things are laid out flat (you can do A* even without a "node" structure), but in a real complex game you got to have some sort of data hierarchy or else things get too complicated too soon.

EDIT2: I use boundary checks on my algorithm. They are useful in HPA* where each A* has to be run within certain bounds. Obviously, this adds some extra slowdown. Of course there could be a "boundless" hierarchical A* and a bounded one when necessary.
Last edited:


Heavy metal viking dentist
GMC Elder
HPA* sounds interesting! Haven't heard of that method before, seems like it has the potential to speed up pathfinding drastically.

Running your project, the time for generating a path from lower left to upper right seems to vary wildly between 35 and 80 milliseconds. Pressing space repeatedly, the lowest I got was 28 milliseconds, which is starting to become nice.
I also did a test with my own pathfinding system using your map. It calculated this path in 34 milliseconds:

And when using the square of the distance, it calculated this path in 4 milliseconds:

I'm honestly a bit surprized that using the square of the distance is that much faster, but I guess it makes sense, since the overshooting of the f-score makes it so that you don't have to check that many alternative paths. Of course though, it most likely won't return the shortest path, just a valid path.


Thanks for checking it out! By immediately getting a node's neighbors list instead of repopulating a list (in case any "special" neighbors are found), and by treating nodes as arrays, my algorithm does that path consistently at 30 milliseconds on my PC, even with boundary checks. So I guess we both got it as good as it gets, heh.

Don't be surprised about the rootless heuristic: Any cost f = w + εh, where ε>1, will have a similar effect. You can try for example an euclidean direction heuristic multiplied by 100. So it's not avoiding the square root that speeds up the process but rather using a more "relaxed" heuristic. On modern computers square root isn't much of a big deal anyways. However on hierarchical approaches like HPA* I found that using a relaxed heuristic is largely useless: For short distances no significant difference has the time and space to kick in.

The problem is that I keep reading on forums like stackexchange that, for example, in C++ an A* algorithm only takes 25 milliseconds in the worst case on a 300x300 map... Granted, maybe the guy has a really fast cpu but even an online javascript example does an 150x150 map in something like 5 milliseconds.The test map I sent you is 96x96... I may have been looking at this the wrong way, but I have doubts if YYC is indeed serious business. There are obvious speed gains compared to VM but it still falls behind other languages by a lot.


I downloaded this C# project from by CastorTiu and ran it. It does this path
in 0,4 milliseconds.

Exactly the same map, in GameMaker, my code, with YYC:
Time: ~2,3 milliseconds

Almost 6 times slower. This is bad...

What did they guy do? Space magic? He kinda explains certain optimizations in his article but I already use some of them and others I can't understand or aren't fully explained (like, what's that "calculation grid" he's talking about?) Any opinions on this?


Sadly, I managed to verify my suspicions.

I wrote the same pathfinding algorithm in C++. Same map (256x256), same start and end points. GameMaker code super-optimized, all speed-critical data structures are arrays (which, in a real game scenario poses problems, as other data structures are even slower). All code is available on request.

On GameMaker Studio (YYC): 346 milliseconds

On C++: 12 milliseconds

As you can see, a C++ implementation wipes the floor with YYC. And while I could understand a 3-4x speed downscale from pure C++ to YYC, we are talking here about an 28x speed downscale.