Shaders Optimizing post processing shader

HalRiyami

Member
I'm currently implementing a post processing stack, similar to Unity's "Combined Stack" in this link. I understand this is done through compute shaders (not sure how those work but maybe you can dynamically add/remove compute passes depending on which effect is enabled). Since we don't have access to those, I can do one of two things: combine all my effects into a single fragment shader with a bunch of if statements to keep it modular or create a shader wrapped in an if check for each effect to act as my passes.

If I were to use a single shader, then my if statements would go inside the fragment shader like this:
GML:
/// GML code

// Render view surface to the back buffer
shader_set(shd_effect_combined);
draw_surface(...);
shader_reset();

/// Fragment shader
uniform bool doEffect1;
uniform bool doEffect2;
uniform bool doEffect3;

void {
    color = texture2D(..., uv);
    if (doEffect1) {color = Effect1(color, ...);}
    if (doEffect2) {color = Effect2(color, ...);}
    if (doEffect3) {color = Effect3(color, ...);}
    gl_FragColor = color;
}
I know that if statements are generally frowned upon in shaders as their impact is high, but would it be the case here since this isn't divergent (all fragments end up running the exact same code)?

The other option would look like this:
GML:
/// GML code

// Render surface A on surface B using effect 1 shader
if doEffect1 {
    shader_set(shd_effect_1);
    surface_set_target(surface_b);
    draw_surface(...);
    surface_reset_target();
    shader_reset();
}
// Render surface B on surface A using effect 2 shader
if doEffect1 {
    shader_set(shd_effect_2);
    surface_set_target(surface_a);
    draw_surface(...);
    surface_reset_target();
    shader_reset();
}
// Render surface A on surface B using effect 3 shader
if doEffect3 {
    shader_set(shd_effect_3);
    surface_set_target(surface_b);
    draw_surface(...);
    surface_reset_target();
    shader_reset();
}
I assume this is slower since every effect causes a texture swap and does a full surface render, but if only 1 or 2 effects are active this might be faster.

Anyone with more experience in shaders know which is preferable in this situation?
 

muki

Member
Personally, I go for multiple simple and compact shaders, and splitting my effects into multiple surfaces that ping-pong into each other. I know texture swaps have overhead, but I haven't noticed a huge dent in performance for even 12+ view-sized swaps. It feels like a much bigger headache to combine everything into a single frag shader, but also you get more flexibility if you split them up. Like you can apply certain shaders to one asset layer but not another, or differently.

Just my two cents. I'm far from an expert on rendering, under the hood.
 

HalRiyami

Member
Originally I was leaning towards a single fragment shader because I didn't want to deal with ping-pong surfaces and I assumed the overhead that comes with it might cripple performance (8+ 4k surfaces), but you're absolutely right that keeping them in separate shaders gets me the flexibility of using individual effects on asset layers, especially for effects like blur that are used alone or as part of another effect. Now I'm leaning towards the second option, assuming the overhead is not too bad.

I've seen your profile posts and they are beautiful! Is something like the puddles made up of multiple small shaders or do do have a single shader that does all of it?
 

muki

Member
8+ 4K surfaces might change things a bit. Because of that, you might be right that it's safer to merge at least some shaders.
My project's still running 1000+ fps with a ton of view surfaces plus 4 shaders being manipulated and stacked, but I think the largest one is 512x512, and most are 480x270. Might be worth setting up a simple 4K ping-ponging test with your game, see how many you can do before things get bad.

The puddles actually don't use crazy shaders! I'm applying the same blur shader I use for the backgrounds (but lighter) on the reflection, but the rest is just surface/mask manipulation. This is more or less how it works.

The hand-placed puddle objects don't have much in them. Just declaring a couple variables that will be used later by the control object, and the draw event for only the sprite, using draw_sprite_pos to fake the perspective parallax effect based on its position on screen. The actual reflections are handled by a script in my main rendering control object. I just call it between two ping-pongs, while no surface targets are set, and input the last surface used as its argument.

1. Sets up surfaces and variables
2. Scans the view for any reflectors (like puddles), using a parent
3. Loops through the amount of on-screen reflectors
4. With blur shader applied, draw the part directly above the puddle, what needs to be reflected. "puddle surface"
5. Switch to main view surface for reflections, srf_screen_reflections
6. Redraw puddle sprite, turn off alpha color write
7. Draw the puddle surface on top of the puddle but flip upside down (it'll use the alpha maskof the sprite)
8. Draw specular reflections
9: Re-enable alpha color write

Then outside the script, in a later ping-pong, inside a set target, I just add srf_screen_reflections to the strack. This is the code of the script if you're curious. Not all of it is commented but the theory is there. It can probably be optimized, too. And I still need to write a function that frees puddle surfaces when they've been outside the view for a bit.

GML:
//this function is in my rendering control object's draw event
//between two ping-pong stacks
//then in a future stack I draw srf_screen_reflections with the other stuff

function scr_setup_reflections(_reflect_from) {


    if (!surface_exists(srf_screen_reflections)) {                    //create surface where all water will be drawn to
        srf_screen_reflections = surface_create(480, 270);
    }

    if instance_exists(par_reflectors) {                            //if puddles or water bodies exist
  
        var _inst;
        var _width;
        var _height;
      
        surface_set_target(srf_screen_reflections);                        //clear reflections surface at start of frame
            draw_clear_alpha(c_black, 0.0);                     
        surface_reset_target();         
      
        for (var _i = 0; _i < instance_number(par_reflectors); _i++;) {                    //loop through all reflectors

            _inst = instance_find(par_reflectors, _i);
            _width = _inst.sprite_width;
            _height = _inst.sprite_height;

            if (!surface_exists(_inst.srf_reflection)) {                //re-create instance surface if accidentally destroyed
                with (_inst) {
                    srf_reflection = surface_create(_width, _height);    //single puddle surface
                }
            }
  
            surface_set_target(_inst.srf_reflection);                    //set drawing target to puddle instance surface

                shader_set(shd_blur_gaussian);                            //apply blur shader to water
                shader_set_uniform_f(uRESOLUTION, _width, _height);
                shader_set_uniform_f(uRADIUS, 0.5); 

                draw_surface_part(                                        //draw area to be reflected from source surface
                                    _reflect_from,                        //<-- scenery
                                    _inst.x - global.view_x - (_width / 2),
                                    _inst.y - global.view_y - _height,
                                    _width,
                                    _height,
                                    0,
                                    0
                                    );
              
                shader_reset();                 
            surface_reset_target();
          
            //fade alpha to transparent if within 48 pixels from top. avoids glitchy reflections and just makes it look better
            var _fade = (_inst.y - global.view_y);
            if (_fade < 48) {
                _fade = clamp((_fade / 48), 0, 1);     
            }
          
          
            surface_set_target(srf_screen_reflections);    //switch to dedicated view-sized surface containing for reflections
              
                gpu_set_texfilter(true);
                draw_sprite_pos(                                    //draw alpha mask sprite with perspective parallax
                                    _inst.sprite_index,
                                    1,                                //use mask image index
                                    _inst.x-global.view_x - (_width/2),
                                    _inst.y-global.view_y,
                                    _inst.x-global.view_x + (_width/2),
                                    _inst.y-global.view_y,
  
                                    _inst.x-global.view_x  +(_width/2) - 80 + ((_inst.x-global.view_x) / 3),    //using draw_sprite_pos
                                    _inst.y-global.view_y + _height - 60 + ((_inst.y-global.view_y) / 3),        //to do all this
                                    _inst.x-global.view_x - (_width/2 )- 80 + ((_inst.x-global.view_x) / 3),    //perspective stuff
                                    _inst.y-global.view_y + _height - 60 + ((_inst.y-global.view_y) / 3),        //puddle object has similar draw_sprite_pos code
                                    _fade                                                                        //<-- fade
                                    );                 
                gpu_set_texfilter(false);                 
                                                  
                gpu_set_colorwriteenable(true, true, true, false);            //disable alpha channel drawing
                draw_surface_ext(                                            //draw surface but flipped on y
                                _inst.srf_reflection,
                                _inst.x - global.view_x - (_width / 2),
                                _inst.y - global.view_y + _height,
                                1,
                                -1,
                                0,
                                c_white,
                                1.0
                                );
              
                //-------------------------------------------------------------------------
              
                //this is only for the floating light reflection
                if instance_exists(obj_light_orb) {                    //draw light flare reflection if available
                    gpu_set_texfilter(true);
                    draw_sprite_ext(
                                        spr_light_orb_flare,
                                        image_index,
                                        obj_light_orb.x - global.view_x,
                                        obj_light_orb.y - global.view_y,
                                        1.0,
                                        8.0,
                                        0,
                                        c_white,
                                        1.0
                                    );     
                    gpu_set_texfilter(false);
                }
              
                //-------------------------------------------------------------------------
              
                //add specular reflections if a light obnject is above
                if instance_exists(par_lights) {                 
                    gpu_set_texfilter(true);
                  
                    var _inst_f;

                    for (var _f = 0; _f < instance_number(par_lights); _f++;) {
                        _inst_f = instance_find(par_lights, _f);
                        if (_inst_f.x >= (_inst.x - (_width/2))) && (_inst_f.x <= (_inst.x + (_width/2))) && (_inst_f.y <= _inst.y) {
                            draw_sprite_ext(
                                                spr_light_orb_flare,
                                                image_index,
                                                _inst_f.x - global.view_x,
                                                _inst_f.y - global.view_y + (_height / 2),
                                                1.0,
                                                12.0,
                                                0,
                                                _inst_f.image_blend,
                                                1.0
                                            ); 
                        }
                    }
                    gpu_set_texfilter(false);
                }         
              
              
                //-------------------------------------------------------------------------
                  
                gpu_set_colorwriteenable(true, true, true, true);            //r-eenable alpha channel drawing
            surface_reset_target(); 
      
        }
      
    }
}
 
Last edited:

HalRiyami

Member
Yeah, I assumed you were talking about 1080p surfaces when you said view-sized surfaces. I guess I have to test the impact of multiple 4k surfaces on the fps. The other issue is that rendering a 4k surface itself can be stressful on the GPU, not to mention rendering 4k surfaces multiple times. Also, seeing how Unity themselves combined the render stack I assume the process is warranted, otherwise they might have left the effects separate.
I'll do some testing on multiple surface sizes and post some results here.

The puddles "trick" is pretty neat actually. I tried optimizing it a little bit while trying to understand how it works so feel free to test it and let me know.
GML:
//this function is in my rendering control object's draw event
//between two ping-pong stacks
//then in a future stack I draw srf_screen_reflections with the other stuff

function scr_setup_reflections(_reflect_from) {
    // Reduce global look ups by storing these in locals
    var _viewX = global.view_x;    // View x (top left)
    var _viewY = global.view_y;    // View y (top left)
    var _viewW = global.view_w;    // View width
    var _viewH = global.view_h;    // View height

    //create surface where all water will be drawn to
    if (!surface_exists(srf_screen_reflections)) {      
        srf_screen_reflections = surface_create(480, 270);
    }

    // Prepare reflections surface since reflectors exist
    if instance_exists(par_reflectors) {                        
        surface_set_target(srf_screen_reflections);
            draw_clear_alpha(c_black, 0.0);  
        surface_reset_target();
    }

    //loop through all reflectors
    with (par_reflectors) {  

        // Simple AABB collision check with view to see if the object is visible
        // Object is out of view, don't process (Edited!)
        if (bbox_left > _viewX +_viewW) || (bbox_right < _viewX) || (bbox_top > _viewY +_viewH) || (bbox_bottom < _viewY) {
            continue;
        }

        //-----------------------------
        // This can probably be moved outside the with loop . Just draw the _reflect_from surface
        // blurred to be used by every reflector object with draw_surface_general when drawing the
        // puddle's reflection after disabling alpha channel (this will be much faster since you're
        // only blurring once as opposed to blurring per object and creating 1 surface max)

        // Object is guaranteed inside of the view, just create and free single puddle surface in this step
        var _srf_reflection = surface_create(sprite_width, sprite_height);

        // Draw the reflection on the puddle
        surface_set_target(srf_reflection);            //set drawing target to puddle instance surface
            shader_set(shd_blur_gaussian);            //apply blur shader to water
                shader_set_uniform_f(uRESOLUTION, sprite_width, sprite_height);
                shader_set_uniform_f(uRADIUS, 0.5);
                //draw area to be reflected from source surface (scenery)
                draw_surface_part(_reflect_from,
                                  x - _viewX - (sprite_width / 2),
                                  y - _viewY - sprite_height,
                                  sprite_width,
                                  sprite_height,
                                  0,
                                  0);
            shader_reset();  
        surface_reset_target();
        //-----------------------------

        //fade alpha to transparent if within 48 pixels from top. avoids glitchy reflections and just makes it look better
        var _fade = clamp(((y - _viewY) / 48), 0, 1);    // No need for if statement since it's clamped to 0-1

        surface_set_target(srf_screen_reflections); //switch to dedicated view-sized surface containing for reflections
             
              // Draw the puddle on the dedicated view-sized surface
            gpu_set_texfilter(true);
            draw_sprite_pos(                                    //draw alpha mask sprite with perspective parallax
                                sprite_index,
                                1,                //use mask image index
                                x-_viewX - (sprite_width/2),
                                y-_viewY,
                                x-_viewX + (sprite_width/2),
                                y-_viewY,

                                x-_viewX  +(sprite_width/2) - 80 + ((x-_viewX) / 3),    //using draw_sprite_pos
                                y-_viewY + sprite_height - 60 + ((y-_viewY) / 3),      //to do all this
                                x-_viewX - (sprite_width/2)- 80 + ((x-_viewX) / 3),    //perspective stuff
                                y-_viewY + sprite_height - 60 + ((y-_viewY) / 3),      //puddle object has similar draw_sprite_pos code
                                _fade                                                      //<-- fade
                                );      
            gpu_set_texfilter(false);          

            // Draw the puddle's reflection on the dedicated view-sized surface without replacing the alpha in there
            // This only draws the reflection where the puddle is.
            gpu_set_colorwriteenable(true, true, true, false);      //disable alpha channel drawing
            draw_surface_ext(                                 //draw surface but flipped on y
                            srf_reflection,
                            x - _viewX - (sprite_width / 2),
                            y - _viewY + sprite_height,
                            1,
                            -1,
                            0,
                            c_white,
                            1.0
                            );
       
            //-------------------------------------------------------------------------
       
            //this is only for the floating light reflection
            // Using with automatically takes care of the if instance_exists check
            with (obj_light_orb) {
                //draw light flare reflection if available
                gpu_set_texfilter(true);
                draw_sprite_ext(
                                    spr_light_orb_flare,
                                    image_index,
                                    obj_light_orb.x - _viewX,
                                    obj_light_orb.y - _viewY,
                                    1.0,
                                    8.0,
                                    0,
                                    c_white,
                                    1.0
                                );
                gpu_set_texfilter(false);
            }
       
            //-------------------------------------------------------------------------
       
            //add specular reflections if a light obnject is above
            with (par_lights) {
                gpu_set_texfilter(true);
               
                if (x >= (other.x - (other.sprite_width/2))) && (x <= (other.x + (other.sprite_width/2))) && (y <= other.y) {
                    draw_sprite_ext(
                                        spr_light_orb_flare,
                                        other.image_index,
                                        x - _viewX,
                                        y - _viewY + (other.sprite_height / 2),
                                        1.0,
                                        12.0,
                                        0,
                                        image_blend,
                                        1.0
                                    );
                }

                gpu_set_texfilter(false);
            }
            //-------------------------------------------------------------------------

            gpu_set_colorwriteenable(true, true, true, true);         //re-enable alpha channel drawing
        surface_reset_target();

        // Free instance reflection surface
        surface_free(_srf_reflection);
    }
}
 
Last edited:

muki

Member
@HalRiyami whoa, thanks so much! You spent a lot of time on this! I hope it didnt look like I was trying to get feedback (I wasnt) but I appreciate that you did. I'm an artist first, but a programmer only second, so a lot of efficiencies escape me. Sorry about the thread hijjack, too. I wasn't thinking.

You code might not work if I just copy and paste it, because there's some weird requirements for my rendering control object, but I should be able to borrow some snippets and ideas.

For example, I didn't know I could use with() without checking the instance exists first. The idea to use with() in place of a for loop didn't come to me either. I should definitely try that. How you wrapped almost everything in with() is really elegant!

Another thing I'm curious about is how surface_free will impact performance if I flush and recreate 2-3 instance surfaces every frame. Initially I was only going to free them if they move far enough out of the view for more than a second, so it wouldn't have to be constantly free/recreate surfaces. But I'll try and benchmark the freeing on every frame, see how that works. It would definitely make things easier!

The rendering control object needs to be refactored. There are a lot of inefficiencies in it on how positions/scale are stored and how every instance in the game is dependent on it. Too many magic numbers as well. I started it in 2017, and I wasn't much of a GMS programmer then. Just an art nerd with a vision. I've been holding off the refactoring too long. At the same time, I'm looking forward to it, as tricky as it'll be.

Thanks again for your tweaks! I'll give you some credit if and when I actually release this (it'll be a free download).
 
Last edited:

GMWolf

aka fel666
For performance, I would keep it as a single fragment shader.
Ping-ponging full screen buffers is a great way to cripple your performance.


If statements like you described are fine. You are right that since all the lanes don't diverge, you won't be paying a performance penalty for them. (Not meaningfully so).
 

muki

Member
GMWolf's probably right. I think the only reason I'm getting away with it with good performance is because my resolution is pretty small. (also, in a few cases, I can't really avoid it)

I'll definitely pay attention to this thread, if more insight is discussed on custom rendering pipelines and merging shaders.
 

HalRiyami

Member
@muki lol no need for credit and no need to apologize. I was just trying to understand it and made the tweaks along the way. I edited the AABB collision check because it was wrong so if you're gonna use it, make sure to use the edited one. Also, I still think you should move the srf_reflection outside the with loop and if you don't want to create/free every frame, you can save it somewhere else and it will be faster.

@GMWolf Thanks for confirming my suspicion. I guess I'll keep it in a single shader.
 

sp202

Member
Keep in mind that anything that samples the texture outside the current pixel will need to be a pass outside the shader, so depending on what the effects are you'll probably need to split it into a few shaders anyway.
 

HalRiyami

Member
Do you mean for something like bloom? My bloom is done in 2 passes, one to prepare the bloom surface and another to apply the bloom, the latter of which is part of my post process stack. Most post processing effects are simple color changes like vignette, color adjustment, film grain, etc... and those do not require separate passes.
 
Top