GMS 2 Arrays as map keys: Works in VM, not in YYC

Discussion in 'Programming' started by vdweller, Apr 15, 2019.

  1. vdweller

    vdweller Member

    Joined:
    Jun 24, 2016
    Posts:
    96
    Hi all!

    I happened to stumble on a rather interesting find. I am working on a project where execution speed is of paramount importance, so I decided to go back in some places in my code and replace certain fixed-size lists with arrays.

    During this process, I got some errors and, long story short, I found out that if you write:
    Code:
    var a=array_create(6);
    var m=ds_map_create();
    m[?a]=1;
    var s=ds_map_find_first(m);
    ds_map_delete(m,s);
    print(ds_map_size(m));
    
    This produces the correct result in VM (0) but in YYC the key isn't deleted.

    Does anyone have a technical explanation on why this doesn't work on YYC? Even if arrays are pointers, they have some numeric form and in theory could be used as map keys, right?

    PS I know there are a couple other related threads, but they either pertain to 1.4 or don't touch upon the issue from a technical standpoint, they're more like what other things to do to avoid this (which I'm already aware of).
     
  2. vdweller

    vdweller Member

    Joined:
    Jun 24, 2016
    Posts:
    96
    On further investigation, it looks like if I delete the declared array "a" from the map, this works. However, using ds_map_find_first() and use the returned key for ds_map_delete(), somehow messes things up.
     
  3. FrostyCat

    FrostyCat Member

    Joined:
    Jun 26, 2016
    Posts:
    3,858
    In my experience, the only thing that's guaranteed to work as a map key everywhere is a string. Everything else varies and dependents on export-specific typecasting fudge. You can sometimes use non-string values like a key when looking up values, but even when they work, they're always internally converted to a string first. In addition, if you iterate through keys with functions like ds_map_find_first(), you'll always get the keys out in string form.

    Turn the array into a string first, then use it as a map key. And expect only strings to come out when you try to iterate through a map's keys. If you depend on having non-string values as keys in a map without explicit conversion, your data model is wrong.
     
  4. vdweller

    vdweller Member

    Joined:
    Jun 24, 2016
    Posts:
    96
    Since I have already tested this, using is_string() after a ds_map_find_first() in a map where I put a non-string type as a key does not return true. Furthermore, turning the array into a string poses two problems. First, since speed is of the essence, string() is quite slow, I'm better off using lists anyway. Second, the array reference is lost and you can't process it, unless maybe you do some other conversion which again defeats the purpose of using arrays for speed anyway.
     
  5. Justice

    Justice Member

    Joined:
    May 8, 2017
    Posts:
    425
    The manual says that real numbers can be used as well; you've run into problems with reals, though? Because that's strange. Do you have any theories why this is?

    Also, what's wrong with a data model that uses an enum as keys? This takes advantage of the IDE's autocompletion and allows safe "refactoring" (i.e., find-and-replace in GMS2). It also allows you to loop through the things in the enum in a predictable and flexible way, e.g., it allows making a carousel of pictures that changes the picture by decrementing and incrementing, and the sprite references could be stored in a map using the same enum. I realize there're other ways to do this---like a grid, again using an enum---but why not maps?

    (I have no education in CS beyond what I've managed to hack out in game programming or learn through "Learn C#" books, so I'm genuinely interested in the answer to this.)
     
  6. FrostyCat

    FrostyCat Member

    Joined:
    Jun 26, 2016
    Posts:
    3,858
    It's because GML maps are implemented as hash tables on VM and YYC exports, and strings are a hash table's baseline form of key input. This has been proven years ago by a hash collision, and also by the characteristic lack of a recognizable ordering of keys.

    I once reported real-valued keys being returned back in string form in GMS 1.2 (before I knew any better), and since then they also added some "memory" for the key's type. But that was a temporary "reprieve" for something that didn't need any reprieve. Later came the new types in GMS 1.3 (arrays, int32/int64, undefined, etc.). The use cases for how they're hashed and returned back as keys remained inconsistent, and it just went downhill from there. That was when I simply wrote off non-string map keys as "undefined behaviour" territory.

    On HTML5 in particular, GML maps are implemented as standard JS objects for performance reasons. That only allows strings as keys, and it shows in the runner and bites naive developers used to "native privilege". Here's an example:
    Code:
    var map = ds_map_create();
    map[? 2] = "foo";
    show_message(typeof(ds_map_find_first(map)));
    ds_map_destroy(map);
    
    Both 1.4.9999 and runtime 2.2.2.326 give "string" on the HTML5 export. Yes, simple gets and sets with non-string keys will still appear to work, the runner would try to cast them to strings first. But the facade exposes itself as soon as you use ds_map_find_first().

    This is why I asserted that strings are the only common denominator for keys in a GML map, and assuming that other types enjoy the same rights everywhere is a form of platform dependence.

    As I already said, maps are implemented as hash tables. Even with the HTML5 export, hash tables are used internally by the browser to store JS object fields. They work by hashing the key and then offsetting at least once by the hashed result.

    In contrast, sequential non-linked structures like arrays work directly on offsetting once and only once. This obviously supports only numeric types (and even then only those counting up from 0), but has no hashing routine to run and is thus much faster at runtime.

    A standard enum in current GML is compiled down to a sequence of numbers counting up from 0. This is the perfect use case for offsetting. You'd be better off cutting the hashing middleman and go straight for a list, grid or array.

    The only exception is if the enum values have been specifically altered to not follow a sequential pattern and/or contain a negative value. Then implicitly converting to string and then hashing becomes a price to pay for saving storage space or just getting to run without crashing.
     
    Justice and Pfap like this.
  7. vdweller

    vdweller Member

    Joined:
    Jun 24, 2016
    Posts:
    96
    Interesting analysis! So what I take from it is that if you use strings, you're in the safe zone. For non-string types, all bets are off. There may be a possibility that real keys are also handled gracefully though.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice