Beginners Proc Gen in GMS #6: A* Is Born (Pathfinding’s Greatest Hits)

GM Version: 2.3.1
Target Platform: ALL
Download: A* Showcase/Test Project
Links: Full Tutorial Here!

Other Entries:
Summary:
In this entry to the Procedural Generation in GMS series, we go over a number of basic pathfinding algorithms, culminating in the creation of an A* pathfinding system that takes tile costs into account (similar to the Civilisation games, where moving through jungle is more costly than moving over plains). These algorithms are incredibly useful for all sorts of reasons beyond procedural generation, so make sure to have a read!

Tutorial:

Procedural Generation in GMS #6: A* Is Born (Pathfinding’s Greatest Hits)

In this entry we explore the fascinating world of pathfinding, culminating in a fully-featured A* pathfinding algorithm that takes tile costs into account!



---

Section 1: A STAR-T

Let’s begin with a little discussion about some other pathfinding algorithms that are not A*. In our previous entry, we implemented a flood-fill algorithm. This is better known amongst the hoity-toity as a “breadth-first” fill algorithm. Something that might be surprising to you if the last entry was your first contact with flood filling is that it can be considered a pathfinding algorithm with only one very small tweak: keeping track of the direction from the cell “currently being filled” to the cell that is “doing the filling” (or, said another way, keeping track of the “came from” direction):


The black arrows in the first image are pointing to the cells that are going to be flood filled next, the white arrows in the second image are the direction to where the flood fill came from. We want to keep track of the white arrows.

All we need to do is keep track of the direction of the white arrow for each cell and we turn flood fill from a “find all points” algorithm into a “find a path from anywhere to a point” algorithm.

Let’s say we have point A and point B on a map and we want to find out how to get from A to B. We can start flood filling outwards from A, keeping track of the directions like above, and once we hit B with the flood fill we stop. Then starting from point B, all we do is add the current cell to an array (so the current cell would be point B to begin with), look at the direction that we kept track of for that cell and then move to the cell that direction is pointing towards. We then add that second cell to the array and check it’s direction and so on. The beauty of this is that as long as we follow the directions, we will end up back at point A, with an array full of the cells that lead from point B to point A in the shortest path. Reverse that array and you have your path from point A to point B.

This is known as “breadth-first-search” (BFS) and, even without keeping track of the “came from” directions, it is a powerful and very useful algorithm. There’s one big problem though. BFS searches in all directions equally, so it will search in the opposite direction of the target just as much as it searches in the direction of the target. This makes it pretty sub-optimal for pathfinding from one point to another (don’t be fooled though, it has many, many uses where it is the perfect algorithm, this just isn’t one of them).

In any case, we didn’t come here to learn BFS, we came here to learn about A*! So let’s learn about Dijkstra’s (gottem!). Dijkstra’s (or Uniform Cost Search) is useful because it lets us add “costs” to cells. What are costs? The simplest example is movement in the Civilisation games. Trying to move over jungle terrain costs more movement points than trying to move along a road, for example. We have no way to implement this in vanilla BFS, so we turn to Dijkstra’s.

The idea behind Dijkstra’s is similar to BFS, except we keep track of the total “cost” of movement as well as the direction. In the most naive implementation, we consider the cost to simply be 1 per cell:



Here, we can see that the path with the least total cost of movement before hitting the goal is the middle path, with a total cost of 3. To keep track of the cost, we simply use a priority queue. Add each cell around the current cell to a priority queue, with the priority value being the total cost of moving to that cell. Then when we need to move to a new cell, we simply pull the lowest priority entry out of the priority queue, which corresponds to the cell with the lowest cost that we have checked so far.

So we have an algorithm that searches and finds the least costly path to the target. There’s still a slight problem though. If all the cells around are the same total cost of movement, Dijkstra’s will check them all, which means that it, in essence, turns back into BFS. It’s only when Dijkstra’s encounters cells with differing costs that it’s powers are unlocked, however, it doesn’t help us actually get to the target quicker.

Ok, so BFS and Dijkstra’s are both good, but not exactly what we are looking for. What if we just directed the algorithm to move towards the target. After all, we know where the target is right? So just make the algorithm move in that direction…

This is known as Greedy Best-First-Search and anyone who has done any motion planning in games will have an understanding of why/how this might fail. Let’s compare two situations:



In the above situation, Greedy Best-First works fine. We just make it move towards the goal. It goes cell by cell and eventually reaches the goal, without much searching at all (well, I didn’t show it, but it searches the 4 squares around each cell it checks, but that’s all, much more efficient than Dijkstra’s or BFS)! But what about this situation?



While Greedy Best-First will eventually get to the goal, we can see that it follows a circuitous route that definitely isn’t the most optimal. If only we could follow the path that has the least cost associated with it, while also trying to head towards the goal consistently…

Enter our heroic champion, A*! Finally, we’ve worked our way to it. A* takes the best of both worlds from Dijkstra and Greedy Best-First Search. It does this by using a priority queue, like Dijkstra, but instead of simply using the total cost for the priority, it uses the total cost plus the distance from the cell being checked to the goal. If it is moving away from the goal, the distance will increase, meaning that even if the cell cost is low, the increased distance cost will counteract that and the cell further away will be checked later than cells nearby the target (remember, we pick the lowest priority to search first and with the priority being cost+distance, then if either the distance or the cost increases, that cell will get a higher priority and will be checked later than cells with a lower cost+distance).

That’s a lot of theory to absorb. I’d definitely recommend having a look at some of the interactive diagrams here: https://www.redblobgames.com/pathfinding/a-star/introduction.html, if you really want to experiment and understand how each of these different pathfinding algorithms work. Anyway, this is about A* in GML so let’s get started looking at how to actually code this thing.

---

Section 2: ILLUMINATING THE PATH

Firstly, we create our function and then do a little bit of setup:

Code:
///@func                AStar(start,goal);
///@desc                The A* pathfinding algorithm
///@param {array} start    An array containing the x and y positions of the starting point
///@param {array} goal    An array containing the x and y positions of the goal
///@param {array} map    A 2D array containing our map data
function AStar(start,goal,map) {

    #macro X 0
    #macro Y 1

    static map_width = array_length(map);
    static map_height = array_length(map[X]);

    static frontier_queue = ds_priority_create();
    static came_from = array_create(map_width*map_height,-100);
    static cost_so_far = array_create(map_width*map_height,-100);
    static location = [[-1,0],[0,1],[0,-1],[1,0]];
Remember (from our previous entries) that the static keyword makes sure that the variables (and methods) are only created once, rather than being created in every instance that runs the function. So let’s have a look at what we’ve done.

Firstly, we’ve got two macros. These aren’t a part of the A* algorithm, they’re just something that I find useful to do when dealing with arrays holding positions (which we will be doing a lot of here). We set X to 0 and Y to 1 and that way, if we store a position in an array like this: my_pos = [x,y] then we can access x like this: my_pos[X] and y like this my_pos[Y]. It’s a bit easier, at least for me, to read rather than my_pos[0] and my_pos[1].

We figure out the width and height of the provided map, using the same technique as we learnt about in the last entry.

Then we create a ds_priority_queue called frontier. The way that frontier works is each time we are working with a cell, we check it’s surrounding neighbours, calculate the cost of moving to each of them and the distance from each of them to the goal, add the cost+distance together and add the neighbour to the frontier priority queue using cost+distance as the priority. This lets us pick the optimal cell to examine next from the priority queue, with the lowest priority entry being that optimal cell. Pretty much what I explained in the previous section.

After that, we create two arrays, came_from and cost_so_far. We initialise them to a size of map_width*map_height, which means we have one entry in the array for each cell of the map (in other words, the area of the map is map_width*map_height and we want to be able to encompass the entire area in a single array). These arrays are how we keep track of the direction to the previous cell as well as how much the cost is to move to that cell, as explained in the previous section.

Finally, we have the location. This is a fancy way of figuring out the coordinates to the neighbour of each cell without having to use any maths. The array is all tucked up so it looks kind of complicated, but if you were to visualise it, it would look like this:

-10
01
0-1
10

The left column is the change in the x we need to get to the neighbour cell and the right column is the change in y we need to get to the neighbour cell.

Let’s assume we are at cell x: 10, y: 5. We want to get to the neighbour on the left. We can guess that that neighbour has the coordinates x: 9, y: 5. We pick the first row because we can see it has a -1 in the x column and that is what we need to change x: 10 to x: 9. Apply that column to the two variables:

x: 10-1 = 9
y: 5-0 = 5


If we wanted to check the neighbour above the cell, we would pick row 3 (as that has a -1 in the y column) and it would look like this:

x: 10-0 = 10
y: 5-1 = 4


In other words, location is holding a matrix, and we use that matrix to transform the coordinates.

Ok, so that’s all good, let’s move on with the code a little more.

Code:
    ///@func                    Pos(cell,map_height);
    ///@desc                    Returns the position of the cell as a single integer
    ///@param {array} cell        An array containing the coordinates of the cell we are looking at
    ///@param {int} _map_height    The height of the map (unfortunately we have to pass this in as it is a static variable)
    static Pos = function(cell,_map_height) {
        return (cell[X]*_map_height+cell[Y]);
    }

    ///@func                    OutOfBounds(cell,map_width,map_height);
    ///@desc                    Checks whether cell coordinates are out of bounds
    ///@param {array} cell        An array containing the coordinates of the cell we are looking at
    ///@param {int}    map_width    The width of the map
    ///@param {int} map_height    The height of the map
    static OutOfBounds = function(cell,_map_width,_map_height) {
        if (cell[X] < 0 || cell[Y] < 0 || cell[X] >= _map_width || cell[Y] >= _map_height || map[cell[X]][cell[Y]] == SOLID) {
            return true;
        }
        return false;
    }
We’ve got two methods here. One called Pos(), which takes a 1D array called cell which holds the x and y position for the cell we are looking at like this my_cell = [x,y], and we have to also give it the map_height because when we created map_height we made it a static variable. Making map_height a static variable helped save on memory and loose instance variables floating around, but static variables are outside of the scope of method functions, which, a little bit unfortunately, means that we have to manually hand it into the method. We have to do this for most of the static variables at some point or another in the following functions. Anyway, what Pos() is doing is a fancy little trick to let us store both of a cell's coordinates (x and y) inside of a single unique integer.

Huh? What in the world does that mean? Let’s use some real numbers to get a handle on it. Imagine we have a 3×3 map (so it has a map_width of 3 and a map_height of 3) like this:

x: 0, y: 0x: 1, y: 0x: 2, y: 0
x: 0, y: 1x: 1, y: 1x: 2, y: 1
x: 0, y: 2x: 1, y: 2x: 2, y: 2

We can see the coordinates for each cell there. Lets take the middle cell x: 1, y: 1 and apply our Pos() transformation to it: x*map_height+y = 1*3+1 = 4. Wow, we turned two numbers into one. Let’s do that for all the cells:

0*3+0 = 01*3+0 = 32*3+0 = 6
0*3+1 = 11*3+1 = 42*3+1 = 7
0*3+2 = 21*3+2 = 52*3+2 = 8

We can see that we’ve neatly turned all our coordinates into single integers and there are no repeats, meaning each integer is unique. This is perfect for being able to store each cell in a unique position in a 1D array. If we wanted to save the top-left cell into an array, we would simply go my_array[0] = some_data_here because that cell's coordinates collapse into 0. The top-right cell would be my_array[6] = some_other_data because that cell's coordinates collapse into 6.

After that we’ve got the OutOfBounds() function, taking the cell array again, the map_width and the map_height. The way this works has been explained in previous entries. We’re just making sure the coordinates aren’t outside the map size, however, we have one new check, which is to check if we are looking at a SOLID tile on the map. If the tile is SOLID, we want to completely skip it, so we act as though it is out of bounds.

Some more code:

Code:
    ///@func                    Cost(cell,cost_scale);
    ///@desc                    Returns the "movement cost" of the cell we are checking, with a scaler option for optimisation purposes
    ///@param {array} cell        An array containing the coordinates of the cell we are looking at
    ///@param {real} cost_scale    The scaler for the cost, this lets us optimise how much the algorithm will try to avoid costly areas (unfortunately we have to pass this in as it is a static variable)
    static Cost = function(cell,_cost_scale) {
        var tile = map[cell[X]][cell[Y]];
        var cost = 0;
        switch (tile) {
            case TREE:
                cost = 10;
            break;
            case ROAD:
                cost = 1;
            break;
            case EMPTY:
                cost = 3;
            break;
            case WATER:
                cost = 5;
            break;
        }
        return 1+_cost_scale*(cost-1);
    }

    ///@func                            Manhattan(cell,cell2,heuristic_scale);
    ///@desc                            This returns the manhattan distance between cell and cell2, with a scaler option for optimisation purposes
    ///@param {array} cell                An array containing the coordinates for the first cell
    ///@param {array} cell2                An array containing the coordinates for the second cell
    ///@param {real} heuristic_scale    This lets us influence the A* to act more like Greedy Best-First or Dijkstras, useful for optimisation purposes
    static Manhattan = function(cell,cell2,_heuristic_scale) {
        return _heuristic_scale*(abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y]));
    }
Ok, another two methods, Cost() and Manhattan(). Cost() takes the cell array and the cost_scale (cost_scale is an optimisation technique that I’ll explain in a later section). If we remember the way we worked with our map in the Cellular Automata entry, we created two macros, EMPTY and SOLID, and we stored those macros in our map to tell us what each tile was. While it’s not shown in this tutorial, when we created our map, we also created some macros, EMPTY, TREE, ROAD, WATER and SOLID and stored them appropriately in our map. In the Cost() function, we are reading what macro is inside of a particular cell on the map and returning a "cost" based on that.

We grab the correct tile by reading the x and y values from our cell array using our macros we created at the start, X and Y like this: map[ cell[X] ][ cell[Y] ]. It might seem a little confusing because we have arrays inside of arrays, but cell[X] and cell[Y] both just hold a number, so the computer ends up parsing it as map[5][10] or whatever values cell[X] and cell[Y] holds.

Beginners Note: You might notice the switch() statement in the code block above. A switch() statement is basically another way of formatting a series of if...else if...else conditionals. To begin with, we put the variable we want to check inside of the round brackets in the switch: switch(my_var), then we use the case: statements to check if the variable equals a value. So if we wanted something to happen when my_var was equal to 1, we would do this:
Code:
switch (my_var) {
    case 1:
        // Do something
    break;
}
That is equivalent to this:
Code:
if (my_var == 1) {
    // Do something
}
switch() is useful when we have a number of exact values we want to check for (often, it will be either enums or macros). For each new value you want to check for, add a new case. We can also add a default case which will run in the event that the variable doesn't equal any of the values we are checking for. Let's do another and check for 1, 2, 3 and add a default case:
Code:
switch (my_var) {
    case 1:
        // This triggers if my_var equals 1
    break;
    case 2:
        // This triggers if my_var equals 2
    break;
    case 3:
        // This triggers if my_var equals 3
    break;
    default:
        // This triggers if my_var equals anything but 1, 2 or 3
    break;
}
Always add your default case last. Switches are more organised to view and add entries to than if/else chains.

We run the macro value of the tile variable through the switch() statement and depending on what tile is holding, it will add a certain amount to the cost variable. This number is the “movement cost” that we associate with that cell, like in Civilisation. Then we do a little optimisation maths with cost (which, as I said, I will explain in a later section) and return what the value is. Generally, you can consider the value being returned to be whatever cost got from the switch() statement and you can ignore the maths bit at the end as it only comes into play if we actively decide we need to optimise the pathfinding algorithm.

Then we have the Manhattan() method. The Manhattan distance is a particular way of measuring distance. If we imagine we are in a city (Manhattan, for instance) and we wanted to get from a particular building to another building somewhere else in the city, we couldn’t just walk across the buildings. We would have to walk along the blocks following the street. You could consider each block in the city as a cell in a grid and if you followed the path in such a way where you made exactly one turn (so, for instance if the building is up and to the right, you would walk up until you are level with the building and then make a right turn and walk right until you got to the building), then you would have walked the Manhattan distance:



In the above example, we would need to walk 3 cells up and 4 cells right to get from cell to cell2. This means the Manhattan distance would be 3+4 = 7.

In the actual code: _heuristic_scale*(abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y])) we can ignore the _heuristic_scale for now (it’s another optimisation technique that we will discuss later) and look at the rest:

Code:
abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y])
We’re using abs() because we don’t care if the value is negative or positive (negative values just mean we are moving left or up), we just care what the total would be as a positive number (we’ve covered what abs() does in a previous entry), so we just figure out what the difference is between cell[X] and cell2[X] and then cell[Y] and cell2[Y] and add them together to get the Manhattan distance and then we return that value. These two functions are the secret sauce of A*. We were discussing how cost+distance tells us what priority a cell has, and this is how we get the cost and the distance.

Continue on with the rest of the tutorial!
 
Last edited:
Top