1. Hey! Guest! The 35th GMC Jam will take place between November 28th, 12:00 UTC - December 2nd, 12:00 UTC. Why not join in! Click here to find out more!
    Dismiss Notice

GMS 2 Optimizing code for YYC

Discussion in 'Advanced Programming Discussion' started by kraifpatrik, May 9, 2019.

  1. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    Hey everyone,

    yesterday I started fiddling around with C++ files generated by YYC (the path to the files can be found in the console upon start of running the project with YYC, for me it was C:\Users\kraif\AppData\Roaming\GameMakerStudio2\Cache\GMS2CACHE) and I found that there are situations where it produces some silly code, but since the files are available to see, we can just change our GML code accordingly to get huge performance gains.

    Here is the first thing that I found.

    Consider the following code.
    before.png
    Really simple right? Just passing a bunch of object and built-in variables to draw_sprite, in a for cycle, nothing that you wouldn't normally do and you (well or me at least) would expect that to be pretty fast. But this is what YYC generated C++ code looks like.
    before_cpp.png
    It actually performs a lookup for the variables (variables are stored in an array of YYRValue structures, which contain the variable value, it's type and overloaded operators to do operations like add, subtract etc. between different C++ types; variable names are just translated into a unique index within that array) inside of the loop, every iteration! And this is how it can be fixed.
    after.png
    after_cpp.png
    We can just store the variables into local vars before the loop starts and YYC will then do the same thing. The produced code is longer, but much more clever and the performance gains are crazy - more than extra 1000fps for me! And that's just drawing a 1000 sprites in a for cycle.

    I hope this helpes and I will continue playing around with it and report back again when I find more silly C++ code.

    EDIT: Note: The C++ code can be also modified by hand into even better one, which in this case was another 600fps extra for me. But when you clean the cache or modify the GML code after you change the C++, it gets overwritten.
     
    Last edited: Jul 12, 2019
    Mool, DBenji, Pere and 21 others like this.
  2. lolslayer

    lolslayer Member

    Joined:
    Jun 23, 2016
    Posts:
    693
    Thanks, I'll rewrite my project now :p
     
    Kyon, Cpaz and kraifpatrik like this.
  3. Mert

    Mert Member

    Joined:
    Jul 20, 2016
    Posts:
    416
    You'd also want to look for gml_release_mode as it'll also remove some of the error-checking mechanisms (probably those stacktrace lines).
     
    Cpaz and kraifpatrik like this.
  4. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    @Mert You're right, thank you for pointing that out! I've checked it out and the stacktrace lines are kept even with release mode enabled.

    Here is another thing that I found:

    I have an array as an object variable. The array represents a velocity vector and I want to add it's values to x,y,z variables in the step event like so:

    before_gml.PNG
    But when accessing the array directly like this, it's beeing looked up over and over for every single index!
    before_cpp.PNG
    This can be fixed by storing the array into a local variable. YYC then does the same, looks up for the array only once and then accesses it at given indices.
    after_gml.PNG
    after_cpp.PNG
     
    Last edited: May 10, 2019
    DBenji, Nux, dapper and 11 others like this.
  5. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    Good morning everyone, this time I tried something with certain expectations and got a little surprise.

    I wanted to see how YYC behaves when using the with statement and reading & writing variables from outside. Here is the GML, I'm trying to read & write built-in variables, object variables and globals.
    before_gml.PNG
    My expectations were that every variable is going to be looked up in every instance, and to my surprise, globals are actually always loaded only once (remember that when some tells you how globals are bad :))! And unsurprisingly, built-ins and object variables are looked up by every instance in the loop.
    before_cpp.PNG
    Now let's fix that with our favorite local vars.
    after_gml.PNG
    after_cpp.PNG
    And voilĂ , we now load everything only once and then only update modified variables afterwards, as we wanted. This can be a huge performance boost when dealing with many instances and many variables outside of the with's scope. Happy optimizing everybody!
     
    Nux, NeZvers, the_dude_abides and 7 others like this.
  6. TsukaYuriko

    TsukaYuriko Q&A Spawn Camper Forum Staff Moderator

    Joined:
    Apr 21, 2016
    Posts:
    1,712
    +1 for the research. This certainly belongs in the category of optimization that sounds so ridiculous that it's easy to mistake for premature optimization, is actually useless at a small scale and won't be fully appreciated until you start fighting exponential explosion with it, such as when dealing with hundreds of instances in a loop.

    This is a known quirk that I swear was already discussed years ago, but the most recent article I can find that talks about it, Optimizing Your Games (which is partially outdated), is from (or was last updated at) the beginning of 2018:
    I'd be interested in the possibilities of compile-time optimization regarding this so that we don't have to awkwardly work around an invisible exponentially exploding performance drain, but I can imagine this to either require very careful analysis of the provided code or it would run the risk of accidentally auto-creating a local variable when it's not even needed and thus actually harming performance instead of improving it... or be flat out impossible to detect. It might be worth filing a ticket for to see if the development department sees any chance to implement this.

    I can feel the next big "YYC performance TURBO-CHARGED! BOOST your game's FPS by up to ONE THOUSAND!" marketing ploy around the corner already if this works out.
     
    immortalx, 00.Archer and kraifpatrik like this.
  7. lolslayer

    lolslayer Member

    Joined:
    Jun 23, 2016
    Posts:
    693
    Basically, love local variables, because local variables love you!
     
    kraifpatrik likes this.
  8. immortalx

    immortalx Member

    Joined:
    Sep 6, 2018
    Posts:
    295
    That's one of the most useful threads! Nice findings @kraifpatrik ! What's your opinion about this:

    Code:
    var array=[1000];
    
    var i;
    
    for (i = 0; i < array_length_1d(array); ++i)
    {
        //
    }
    
    // vs
    
    var i;
    var length = array_length_1d(array);
    
    for (i = 0; i < length; ++i)
    {
        //
    }
    The output for the first is:
    Code:
    #include <YYGML.h>
    #include "gmlids.h"
    extern YYVAR g_FUNC_NewGMLArray;
    extern YYVAR g_FUNC_array_length_1d;
    #ifndef __YYNODEFS
    DValue gs_constArg0_89CCA7B1 = { 1000, 0, VALUE_REAL };
    #else
    extern DValue gs_constArg0_89CCA7B1;
    #endif // __YYNODEFS
    
    void gml_Object_obj_tester_Create_0( CInstance* pSelf, CInstance* pOther )
    {
    YY_STACKTRACE_FUNC_ENTRY( "gml_Object_obj_tester_Create_0", 0 );
    YYRValue local_array;
    YYRValue local_i;
    YYRValue __ret1__(0);
    
    
    YY_STACKTRACE_LINE(1);
    FREE_RValue( &__ret1__ );
    YYRValue* __pArg1__[]={(YYRValue*)&gs_constArg0_89CCA7B1};
    local_array=YYGML_CallLegacyFunction(pSelf,pOther,__ret1__,1,g_FUNC_NewGMLArray.val,__pArg1__);
    
    YY_STACKTRACE_LINE(3);
    
    YY_STACKTRACE_LINE(5);
    
    YY_STACKTRACE_LINE(5);
    local_i=0;
    bool ___f2___ = true;
    while( true ) {
    if (!___f2___) {
    
    YY_STACKTRACE_LINE(5);
    ++/* local */local_i;
    }
    ___f2___ = false;
    FREE_RValue( &__ret1__ );
    YYRValue* __pArg3__[]={&/* local */local_array};
    bool ___b4___ = ((/* local */local_i < YYGML_CallLegacyFunction(pSelf,pOther,__ret1__,1,g_FUNC_array_length_1d.val,__pArg3__)));
    if (!___b4___) break;
    {
    
    YY_STACKTRACE_LINE(6);
    }
    }
    }
    
    And the second:
    Code:
    #include <YYGML.h>
    #include "gmlids.h"
    extern YYVAR g_FUNC_NewGMLArray;
    extern YYVAR g_FUNC_array_length_1d;
    #ifndef __YYNODEFS
    DValue gs_constArg0_2B309DA9 = { 1000, 0, VALUE_REAL };
    #else
    extern DValue gs_constArg0_2B309DA9;
    #endif // __YYNODEFS
    
    void gml_Object_obj_tester_Create_0( CInstance* pSelf, CInstance* pOther )
    {
    YY_STACKTRACE_FUNC_ENTRY( "gml_Object_obj_tester_Create_0", 0 );
    YYRValue local_array;
    YYRValue local_length;
    YYRValue local_i;
    YYRValue __ret1__(0);
    
    
    YY_STACKTRACE_LINE(1);
    FREE_RValue( &__ret1__ );
    YYRValue* __pArg1__[]={(YYRValue*)&gs_constArg0_2B309DA9};
    local_array=YYGML_CallLegacyFunction(pSelf,pOther,__ret1__,1,g_FUNC_NewGMLArray.val,__pArg1__);
    
    YY_STACKTRACE_LINE(3);
    
    YY_STACKTRACE_LINE(4);
    FREE_RValue( &__ret1__ );
    YYRValue* __pArg2__[]={&/* local */local_array};
    local_length=YYGML_CallLegacyFunction(pSelf,pOther,__ret1__,1,g_FUNC_array_length_1d.val,__pArg2__);
    
    YY_STACKTRACE_LINE(6);
    
    YY_STACKTRACE_LINE(6);
    local_i=0;
    bool ___f3___ = true;
    while( true ) {
    if (!___f3___) {
    
    YY_STACKTRACE_LINE(6);
    ++/* local */local_i;
    }
    ___f3___ = false;
    bool ___b4___ = ((/* local */local_i < /* local */local_length));
    if (!___b4___) break;
    {
    
    YY_STACKTRACE_LINE(7);
    }
    }
    }
    
     
  9. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    @immortalx Thanks man! As you can see in the C++, in the first case the array_length_1d function is called inside the while loop, so it will be executed as many times as is the size of the array. In the second case the size is stored into a variable before the loop starts, same as in the GML code. So the second code is better indeed. Or in case where you can go through the array from right to left, you can use

    Code:
    for (var i = array_length_1d(array) - 1; i >= 0; --i)
    {
        // ...
    }
    
    as the variable i is always created only once before the loop starts.
     
    immortalx likes this.
  10. immortalx

    immortalx Member

    Joined:
    Sep 6, 2018
    Posts:
    295
    Thanks again @kraifpatrik ! I knew about the continuous function calls, I was just wondering (since I've no real knowledge in C++, and can't interpret what's happening especially in that generated form) if the YYC did any abnormal stuff under the hood :D
     
  11. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    Based on the code that I've seen so far and how I understand it, there are certain design decisions that I don't agree with, but in general I would say that YYC is doing a pretty good job and what I post here are just a few exception. As for the more under-the-hood stuff, I'm preparing another thread focused entirely on that. This thread I would like to stay focused only on how to write GML in a way that the resulting C++ is better.
     
  12. the_dude_abides

    the_dude_abides Member

    Joined:
    Jun 23, 2016
    Posts:
    676
    Hi. I had two questions about this, as i'm trying to see if it will help my projects performance.

    1) If a FOR loop gets translated into a WHILE loop anyway, does that mean using a while to begin with is better? If it has to be translated then that's an amount of unnecessary work, even if it is a small amount.

    2) I tried the suggestions, and got a fatal memory error. Have you got any tips for do, and don't dos, as to where this can be used? Or have you never witnessed that happening?

    As an example - you showed about setting an array to a local variable, and how that cuts down looking it up. So I theorized (admittedly not knowing much about it) that it might be the same for my permanent variables, which are held as they are being used throughout. In scripts where they were being called I set a new local variable for them, and accessed that instead. And bosh! The fatal memory error happened.

    Is it possible that, whilst this is faster for the internal processes, if used too much you will be clogging up the memory with duplicated information? (go easy on me if that is dumb - this is above my level of understanding) I've never seen that error until implementing this, and am wondering if it's inevitable if used too much, or if there are some things you can't do it with.

    EDIT: Ah...I'm using GMS 1, so is this specific to 2 only?
     
  13. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    Hey mate, yup, all I write here was tested in GMS2, so I don't know if GMS1.4 has the same issues. But to answer your questions:

    1) The translation from for to while is not done during gameplay, but only when GML is translated to C++ for compilation, so even if there was some little tiny extra ammount of work, I wouldn't see it as an issue.

    2) It should work the same everywhere, independently on which event or script you use it in. As for tips when to start bothering with these code transformations, I definitely wouldn't recommend always writing your code this way, as it decreases it's readability. You should do this when you have some portion of your game done and you're experiencing performance issues, otherwise you could just spend time on something that wouldn't affect the game that much at the end. The cases with the loops specifically wouldn't really have that big effect when dealing with only tens of iterations. I also never got any memory errors from doing this, could you post here the code you used?
     
    NeZvers likes this.
  14. the_dude_abides

    the_dude_abides Member

    Joined:
    Jun 23, 2016
    Posts:
    676
    Hey! Thanks for replying. I don't mind posting my code, but I pretty much tried this everywhere in my project and that's about 10000 lines. A more sensible thing to have done would be try it out one stage at a time, as it might give an idea of where it happened and then it's easier to pinpoint why. Assuming it was a specific point that caused it, rather than an accumulative effect. But that's after the fact, and in the end I rolled back most of the changes to an earlier version.

    Unfortunately it does have a lot of loops, and while I have tried to cut them down into manageable sizes it can still be half of a step for just one of them. Those are directly accessing instances though, and I have seen that using with makes a noticeable improvement even when it's only one instance being accessed. So I will try and configure them that way, and see if that helps. I'll also look at changing any loops into while, as any boost is appreciated at this point.

    EDIT:
    Out of all the topics regarding optimization I would say that this has been the most helpful. It's made a huge difference to my projects performance following the advice here, and quite eye opening as to what was slowing it down!

    Even defining a local variable to true / false / undefined / numerical values made a massive difference when called more than once, and I'd just assumed that wouldn't be the case (true, false, undefined etc being GMS constants) It's now almost twice as fast, so thanks :)

    (and no fatal memory error on this attempt, so fingers crossed.....)
     
    Last edited: May 23, 2019
  15. Pere

    Pere Member

    Joined:
    Jan 5, 2017
    Posts:
    29
    Here's some tests in GM:S 1.4: (Windows (YYC))
    Made two rooms with a for loop and one with an equivalent with loop.

    Object1:
    Code:
    var t1 = get_timer();
    var j = 0;
    repeat(10000){
        for(var i = 0; i < 1000; i ++){
            j = i;
            }
        }
    var t2 = get_timer();
    
    show_debug_message( "TEST1: " + string_format( (t2-t1) / 1000000,1,12));
    It also prints the time that it spent doing the loop. here's a snapshot of the compile window:
    upload_2019-6-28_21-37-54.png
    More or less, it took an average of 0.135 seconds


    Object2:
    Code:
    var t1 = get_timer();
    var j = 0;
    repeat(10000){
        var i = 0;
        while(i < 1000){
            j = i;
            i ++
            }
        }
     
    var t2 = get_timer();
    
    show_debug_message( "TEST2: " + string_format( (t2-t1) / 1000000,1,12));
    upload_2019-6-28_21-38-41.png
    And this took more or less an average of 0.139

    Conclusion: The For loop is slightly faster than the equivalent While loop. The While loop takes 2% longer than the For loop.


    Then I tested a for loop accessing object variables vs first putting them on a local var.
    Object3:
    Code:
    var t1 = get_timer();
    
    repeat(1000){
        for(var i = 0; i < 100; i++){
            draw_sprite(sprite_index, image_index, xpos[i], ypos[i]);
            }
        }
     
    var t2 = get_timer();
    show_debug_message( "TEST3: " + string_format( (t2-t1) / 1000000,1,12));
    upload_2019-6-28_22-4-16.png
    Aprox average: 0.0124 s.



    Object4:
    Code:
    var t1 = get_timer();
    
    var _xpos = xpos;
    var _ypos = ypos;
    var _spriteIndex = sprite_index;
    var _imageIndex = image_index;
    
    repeat(1000){
        for(var i = 0; i < 100; i++){
            draw_sprite(_spriteIndex, _imageIndex, _xpos[i], _ypos[i]);
            }
        }
     
    var t2 = get_timer();
    show_debug_message( "TEST4: " + string_format( (t2-t1) / 1000000,1,12));
    upload_2019-6-28_22-9-14.png
    Aprox average: 0.0097 s.

    Conclusion: Putting variables in local vars and accessing those in a For loop is faster than directly accessing object variables in the For loop. The usual method takes 28% longer than the local "var" method.
     
    Last edited: Jun 28, 2019
    amusudan likes this.
  16. Pere

    Pere Member

    Joined:
    Jan 5, 2017
    Posts:
    29
    I did more experiments to test your other claims. (Remember this is GM:S 1.4!)
    Here I made an object that checked an array 3 times like your example vs an object that first stores the array in a local var.

    Object5:
    Code:
    x = 0;
    y = 0;
    z = 0;
    var t1 = get_timer();
    repeat(1000){
        x += vector[0];
        y += vector[1];
        z += vector[2];
        }
    var t2 = get_timer();
    
    show_debug_message( "TEST5: " + string_format( (t2-t1) / 1000000,1,12));
    upload_2019-6-28_22-27-14.png
    Aprox average: 0.000132 s.

    EDIT: I had done this next Object6 test like your example, just doing var _vector = vector; thinking it was equivalent to the previous method (and it was faster). But then I discovered that you'd actually need to then do vector = _vector; at the end, so that the original array changes. When I did it like this, with three variable changes it was actually slower than the original method. But with 30 and 300 variable changes it was faster.
    upload_2019-6-29_15-10-32.png
    (where it says 3, 30, and 300 "value changes" it refers to assigning a value to a position of the array. But all the objects had the same number of value changes so that the comparison makes sense. What it means is that the 6A stored the local array into the original array, every 3 value changes, the 6B every 30, and 6C every 300; but the total amount was kept the same in all objects).

    5 aprox average: 0.000132 s.
    6A aprox avg: 0.000146 s.
    6B aprox avg: 0.000098 s.
    6C aprox avg: 0.000096 s.

    Object6: Here's the code
    Code:
    x = 0;
    y = 0;
    z = 0;
    var t1 = get_timer();
    repeat(1000){
        var _vector = vector;
        x += _vector[0];
        y += _vector[1];
        z += _vector[2];
        vector = _vector;
        }
    var t2 = get_timer();
    
    show_debug_message( "TEST6: " + string_format( (t2-t1) / 1000000,1,12));
    6B: (storing the array every 30 value assignments)
    Code:
    x = 0;
    y = 0;
    z = 0;
    var t1 = get_timer();
    repeat(100){
        var _vector = vector;
        repeat(10){
            x += _vector[0];
            y += _vector[1];
            z += _vector[2];
            }
        vector = _vector;
        }
    var t2 = get_timer();
    
    show_debug_message( "TEST6B: " + string_format( (t2-t1) / 1000000,1,12));
    6C (every 300)
    Code:
    x = 0;
    y = 0;
    z = 0;
    var t1 = get_timer();
    repeat(10){
        var _vector = vector;
        repeat(100){
            x += _vector[0];
            y += _vector[1];
            z += _vector[2];
            }
        vector = _vector;
        }
    var t2 = get_timer();
    
    show_debug_message( "TEST6C: " + string_format( (t2-t1) / 1000000,1,12));

    Conclusion: When accessing an array multiple times, storing it in a local "var" is slower if you don't change many values (like 3) and faster if you change many values (like 30). Assigning 3 values in the array takes 11% MORE time with the local "var" method than the normal method, it takes 26% LESS time with 30 values (than the normal method), and 27% LESS with 300 values.
    I'd say if you have less than 30 value assignments or references of your array in a script/event, forget about this, and if you have over 30 using "var" method will help your performance a bit, especially if that code will be run many times per step, like being in each enemy's step or something.




    Then I put 100 instances of "object1" in a room and compared changing variables in a With the normal way vs storing them first in local "vars".

    Object7:
    Code:
    myVar = 1;
    myOtherVar = 2;
    var t1 = get_timer();
    with(object1){
        other.x += 1;
        other.myVar += 1;
     
        y = other.y;
        myVar = other.myOtherVar;
        }
    var t2 = get_timer();
    
    show_debug_message( "TEST7: " + string_format( (t2-t1) / 1000000,1,12));
    upload_2019-6-28_22-44-6.png
    Aprox average: 0.000028 s.

    Object 8:
    Code:
    myVar = 1;
    myOtherVar = 2;
    var t1 = get_timer();
    
    var _x = x;
    var _y = y;
    var _myVar = myVar;
    var _myOtherVar = myOtherVar;
    
    with(object1){
        _x += 1;
        _myVar += 1;
     
        y = _y;
        myVar = _myOtherVar;
        }
    x = _x;
    myVar = _myVar;
    
    var t2 = get_timer();
    
    show_debug_message( "TEST8: " + string_format( (t2-t1) / 1000000,1,12));
    upload_2019-6-28_22-44-26.png
    Aprox average: 0.00002 s.

    Conclusion: Accessing object variables in a While loop is faster if they are first stored in a local "var". The normal method takes 40% longer than the local "var" method.

    So the practises you proposed are all effective, some more than others. The only thing you were wrong about is While/For loops being converted to the same: For loops are a bit faster and should still be used when appropriate. (at least in GM:S 1.4. It would be cool to see if the same applies to GM:S 2)
     
    Last edited: Jun 29, 2019
    amusudan likes this.
  17. Pere

    Pere Member

    Joined:
    Jan 5, 2017
    Posts:
    29
    It just occurred to me that I can just share the file that I used to do all the tests and someone who's interested can open it in GM:S 2 and see the results. So here is the link.
    https://drive.google.com/file/d/1q7ss8nEUEF1wy7G1gqh1uyjvAn9NcAQ-/view?usp=sharing
    I have no interest in it for now other than curiosity, since I only use 1.4 atm, so in case anyone wants to try it, just run it with YYC and hit the numbers 1-8 to go through the objects 1-8 that I described above.
     
  18. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    Hey, thank you for the test! I've tried to compare different loops later on and there actually are differences. In GMS2 it came out that while is faster than for and repeat was the best one of all of them, so I would suggest to use that one wherever possible.
     
  19. Lonewolff

    Lonewolff Member

    Joined:
    Jan 8, 2018
    Posts:
    1,207
    Weird. I would have thought that all of this would have been optimised out by the compiler seeing as the Visual Studio 'optimising compiler' is used.

    Or does GMS2 send flags telling it specifically not to optimise (maybe so the debugger can still cope with the result)?
     
  20. kraifpatrik

    kraifpatrik Member

    Joined:
    Jun 23, 2016
    Posts:
    132
    I'm actually not sure where Visual Studio is used, but based on logs in the console, the C++ files are compiled with Clang. It is given the -O3 parameter, which enables the maximum level of optimization targeting speed (also takes the most time to compile, can possibly generate longer code and can increase memory usage by caching results of same arithmetics performed multiple times etc.). But since the methods I've described here do actually work, I guess that even -O3 cannot do that much with the generated code. Well, it's definitely nowhere near as fast as it could be, if it was written by hand.
     
    Lonewolff likes this.
  21. Lonewolff

    Lonewolff Member

    Joined:
    Jan 8, 2018
    Posts:
    1,207
    Ah fair enough, that makes sense. It must be just using VS just for the SDK then (and fxc for compiling the shaders under the UWP build).
     
  22. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,464
    I'm surprised the c++ compiler doesn't figure out what is const and what isn't to do the local car extraction for you.
    Also YYC uses clang? I always thought it was msvc, at least on windows.

    YOYO! Are your functions marked const where possible? That should help the compiler a bunch, especially across translation units if you don't do link time optimization.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice