[TEXT TUTORIAL] Hash Tables Explained - Structs vs Maps

xDGameStudios · Dec 4, 2020

GM Version: Studio 2
Target Platform: Windows / All
Download: n/a
Links: n/a

[FACTS]
Hash tables are "constant time" but they are constantly slower than arrays (by "very" little).
Both ds_maps and structs are hash tables, so they store data as [key: value] pairs.

[HASH TABLE]
An hash table is actually an array whenever you create an hashtable (struct/ds_map) we are creating an hidden
array (lets call it HASH_ARRAY for future reference). So when we store the [key: value] pair GMS2 is actually
placing it into an index of that big HASH_ARRAY.

[HASHS]
We are talking of hash tables but you may ask, "what is an hash?". Well, an hash is actually an index an integer number.
So internally GMS2 has a function that takes in the key of the [key: value] pair and converts/remaps it into an index.
The function follows 3 essencial rules:

The same input value must always produce the same output hash.
The number of different output hashes (indexes) is always finite (it's max value is the length of the hidden HASH_ARRAY).
Different keys can produce the same output hash.

[HASH COLLISIONS]
So now that we know what an hash is, you probably heard of the term "hash collisions", what are those?
The algorithm that computes the hash index has a limited amount of outputs (that is rule number 2).
But we have infinite keys we can use, so sometimes different keys compute into the same hash. (that is rule number 3).
This is called an hash collision... so how do we resolve them?

For our example lets say YYG came up with an algorithm that returns hashes between 0-1000 (length of the HASH_ARRAY).
The HASH_ARRAY doesn't contain a single value per index, it actually contains a kind of linked list,
where each item in the linked list has the actual key being accessed (and the value stored)
So the lookup steps are:

Compute the hash (index) value for the given key (input).
Look into that index of the HASH_ARRAY.
Look through every item in that index linked list until the item key equals the input key.
Read/write the value.

So as you might imagine more collisions result in bigger linked lists and so results in slower lookup times.
You can experience this with an example a ds_map/struct with a lot of entries is slower to access than a ds_map/struct with less entries.
One thing to know is that hash collisions don't impact as much in performance as the computation of the hash itself.

[MAPS]
If both structs and ds_map are hash tables then why is ds_map access (read or write) so less performant, you may ask?
Lets review what we learned so far, when you create a ds_map you are actually creating a hidden array (HASH_ARRAY) with for example 1000 length.
Take a look at the example below:

GML:

myMap[? "entry"] = 20;
var value = myMap[? "entry"];

When we take a deeper look at the example "entry" is not an hash it is a string so GMS2 needs to pick up that string and convert it into the correspondent hash.
For the purposes of explaining let's say the hash (index) of "entry" is 200, this means that GMS will then take this index and read or write the 200 position of that HASH_ARRAY.
So the performance bottleneck here is the actual conversion of the string/real/array key into the correspondent hash value.

[STRUCTS]
So structs are hash tables?! YES they are.
Take a look at the example below:

GML:

myStruct = { entry: 10 };
var value = myStruct.entry;

The word entry is not a string, it is not a number, it is an identifier and it is compiled as the hash itself.
So when you access the data using the expression myStruct.entry to read or write into it this process is almost as fast as using arrays.
Because we don't need to compute the hash.

[PSEUDO CODE EXAMPLE]
I will try to give a simplified example.
When you write the following code:

GML:

myStruct = {};
myStruct.entry = 10;

What GMS2.3 does (more a less) is something like:

GML:

$220 = {};
$220.$320 = 10;

Where $*** is the hash index of the respective identifiers (myStruct and entry) NOTE: this notation is made up just for explanation purposes.
So as the hashes are computed at compile time, GMS2 already knows the indexes to search in the HASH_ARRAY.. so it's a lot faster.

[DYNAMIC KEYS]
If you are using structs but don't know the key you are accessing ahead of time, you need to use the accessor myStruct[$ "entry"] .
This will work as well but now we have the same problem as ds_maps as GMS2 has to convert the string into an hash (index) and only then can look for it.
So it becomes slow again.

[CONCLUSION]
The examples below are fast/almost as fast as arrays:

GML:

a = {};
a.b = 12
a.nice = "Good";
var c = a.b;

The examples below are slow, as slow as ds_maps:

GML:

a = {};
a[$ "b"] = 12;
a[$ "nice"] = "Good"
var c = a[$ "b"];

The real big difference is that contrary to structs, ds_maps cannot be accessed using dot accessor so they only allow using dynamic keys meaning the key needs to be converted into an hash (slow) at runtime and only afterwards can be read or write from/to.

Here xD from xDGameStudios,
Good coding to you all.

[TEXT TUTORIAL] Hash Tables Explained - Structs vs Maps

xDGameStudios

GameMaker Staff