• Hello [name]! Thanks for joining the GMC. Before making any posts in the Tech Support forum, can we suggest you read the forum rules? These are simple guidelines that we ask you to follow so that you can get the best help possible for your issue.

Android Bug since v1.3 causes nearly all Android devices (and all iOS ones) to lag with surfaces

Hello!
First things first, I am in no way affiliated with the tech support team, I just use this name for fun.

Let's get to the good stuff:
So there have been topics regarding the usage of the application_surface (or surfaces in general) on Android and iOS, most of them saying that when using them the devices will lag a lot - and they were right! The lag is pretty unbearable.

There have been brilliant* suggestions such as limiting your game to 30FPS and a resolution of no more than 800x480 - good luck with that in 2016.
* Sarcasm
For a while we have all agreed upon the fact that although we could use surfaces for awesome effects, nobody will use them because the game will just lag too much. Whooptie-freaking-doodle.
Even I wrote a tutorial showing how exactly you could scale your screen for different aspect ratios WITHOUT using the application_surface, thus getting a game that is actually playable.
What was weird is that other engines did not suffer from this. Most of us can name quite a bunch of games that run on our smartphones and obviously render to a texture for post-processing, but for some reason if you do that in Game Maker it kills the GPU.

I did some research.
As it turns out, there isn't a problem in the draw pipeline itself, but we are missing a function. We're missing something that will clear surfaces forcibly and thus free the graphics hardware from doing some nasty, time-consuming memory transfers - which is what most if not all other engines already have.

This post is intended to be good news. I have filed a bug report (it hasn't gone through just yet, as it is still the weekend) and hopefully soon enough we'll have a fix for it and be able to use the GPUs in our smartphones correctly with GM.

Here's my full bug report: (This part will be deleted and replaced with a link to the report on Mantis once it has gone through)
TITLE

Surface / Frame Buffer Object missing function causes major crippling slowdown in almost all devices that are not high-end gaming PCs or current-gen game consoles

DESCRIPTION

I have discovered an issue in the Draw pipeline that not only affects almost every single mobile device that exists, but it also turns a big part of GM's rendering functionality useless on any device that does not use a dedicated powerful GPU (see: Gaming PCs), making it impractical / impossible to do effects that other engines do with ease on devices such as Android handsets, iOS devices, XBox 360 and even most laptops.

This is a bit of a longer description than usual, but I'll try to be as specific as possible so as to avoid the problem being incorrectly resolved as No Change Required, as per:
http://bugs.yoyogames.com/view.php?id=3471
The above mentioned bug was reporting the same issue as here, however the example file seemed to be "broken" (missing surface memory checks) and the issue being reported was resolved, having believed that the problem itself was the fact that some surface_exists calls were missing.
That, however, is not true.

Let us begin with what the above link mentioned as the main issue: [Surface performance on Android is slow].
This description, albeit correct, is not sufficient: The actual issue is neither the simple Usage of surfaces, nor the fact that it is on Android. This issue is actually present in almost every Android device out there, in 100% of iOS devices that exist (since they all use Imagination Technologies GPU cores) and even some laptop IGPs, such as the Intel HD Graphics HDn000 series.

To be more specific, we need to talk about how GPUs in systems with no dedicated high-speed video memory work. Here, we'll look at Tile-based (TBDR, or Tile Based Deferred Rendering) GPU architectures. These are present in all iOS devices, most Android devices and even some laptop GPUs implement a somewhat similar architecture.
TBDR GPUs or the ones with similar architectures experience an issue when incorrectly managing rendering to different Frame Buffer Objects (FBOs). Please notice that I did not say "incorrectly rendering", but incorrectly [managing] the rendering.
These GPUs have the memory segmented into Slow / Normal memory and Fast memory. The Fast memory is where the actual rendering takes place. Once having rendered to a FBO, its memory will be Resolved (transfered over from the Fast to the Slow memory). When re-binding that FBO, its memory will be Restored (transferred from the Slow to Fast memory).
This adds some overhead and it is standard behaviour. However, in some cases, you do NOT want to have unnecessary Resolve or Restores, since they WILL slow down the game on big FBOs (or as we call them in GM:S, Surfaces).

Let's recap what we know:
On Tile-based and similar GPUs, the graphics API will Resolve an FBO after it has been unbinded and Restore it when it is re-binded. This is very useful when you are switching from drawing to a surface to the screen Back buffer (main buffer) and then back again to the surface later on, without losing the contents of the surface (e.g. progressively rendering to a surface for more than just one frame, while still writing to the screen buffer).

THE PROBLEM that GM:S currently has, however, is that Resolve and Restore are done regardless of the use case.
A great example is the application_surface.
What currently happens is:
Pre-draw: application_surface render target is binded, all further drawing will happen on this.
***The surface is then Restored by the graphics API.
The surface is then cleared with the background colour.
Draw events will render everything to the application_surface.
Render target is reset, application_surface is unbinded, the GPU will now render to the back-buffer directly
***application_surface is now Resolved by the graphics API.
Back-buffer is optionally cleared (if set), applciation_surface is now rendered to it (if set)
The problems appear at the steps labeled with ***. The Restore and Resolve are here because this approach is universal. If the user does not set any background drawing to happen, the 2 steps make sense, because we probably want to continue drawing to what we had on the surface in the previous frame.
However! In a lot of cases, you do not need these steps. If, let's say, we have a game where we use the application_surface and we re-render the whole scene every frame (which is what most games do anyway), we do NOT need to restore the surface every frame (because we will clear it anyway after re-binding!). This saves a huge amount of processing time per frame, since now a slow memory transfer would not be done. This is what other engines already do, and what GM:S does NOT do yet, thus causing slowdown on most of anything that is not a desktop PC (which is bad. very bad.).

The weird thing is that this issue has been around ever since the application_surface has been introduced (v1.3!).

Please check the following relevant links for more information regarding this issue:
1) Same issue in IrrLicht: http://irrlicht.sourceforge.net/forum/viewtopic.php?f=4&t=49634
As pointed out in the following quote: "Is there a way to make sure that no data is ever copied back from fast video memory into slow memory after rendering a framebuffer?"

2) Native OpenGL application experiencing the same problem: http://stackoverflow.com/questions/...e-is-very-slow-using-opengl-es-2-0-on-android
Resolution to the problem pointed out in ZZZ's answer: "According to qualcomm docs, you need to glclear after every glbindframebuffer, this is a problem related to tiled architecture, if you are switching framebuffers, data need to get copied from fastmem to normal memory to save current framebuffer and from slowmem to fast mem to get contents of newly binded frame, in case you are clearing just after glbind no data is copied from slowmem to fastmem and you are saving time"
The GPU in question is a Tile-based one, as for most of all others in mobile devices, as previously mentioned

3) renderTextures (Surfaces / FBOs) in Unity causing exactly the same issue if badly managed: http://forum.unity3d.com/threads/drawing-to-render-texture-is-very-slow-on-android-and-ios.417007/
Resolution to the problem pointed out by jimCheng: "Maybe you should invoke RenderTexture.DiscardContents manually, otherwise you will get a worse performance on Tile-based deferred rendering (TBDR) GPU Architecture."

Let us look at point (3) above. In Unity, they have a special function for their Surface equivalents (renderTexture for them), where the contents can be manually discarded if the developer knows that the surface will not need to be reloaded every frame (if they are going to clear it anyway).
Specifically, let's look at what that function does for them:
"1. it marks appropriate render buffers (there are bools in there) to be forcibly cleared next time you activate the RenderTexture
2. IF you call it on active RT (render target) it will discard when you will set another RT as active."
Quoted from http://forum.unity3d.com/threads/where-to-call-rendertexture-discardcontents.215555/

This is what GM:S is missing. The draw pipeline itself is not broken in any way, it just misses a simple script that would call glclear() when the developer does not need to keep the contents of a frame buffer (surface) for more than one frame.
What glclear() does is it drops the contents of an FBO and it makes the API NOT copy it over to/from the slow memory depending on when it has been cleared (after binding or before unbinding).

The way to fix this would be to have an equivalent function for GM. My proposition is something like:
surface_discard(surfaceID);
I) Marks the appropriate surface to be forcibly cleared next time it is set via surface_set_target, thus eliminating a useless Restore from slow memory in case the game does not require multiple-frame rendering to the same FBO (surface).
II) If called on a currently active surface (if called while we have a surface on the surface target stack), it will forcibly discard the contents once the render target is reset via surface_reset_target(), thus eliminating an useless Resolve to slow memory in case this is for some reason not needed.

Such a function would solve the crippling slowdown that GM currently causes on 99.9% of mobile devices (and laptops with IGPs), finally allowing developers to actually be able to use shaders and surfaces in their games.
This function would not need to be for just the application_surface, but it should rather be an universal function for all surfaces.
Also, the problem here is NOT the drawing of surfaces on the screen. That is handled like any normal texture drawing. The problem is the current management of binding / unbinding the textures. It is bad practice on almost all GPUs, as pointed out by the OEMs themselves (please refer to link (2) again, where someone quotes actual Qualcomm Adreno datasheets).

SAMPLE URL:
https://drive.google.com/open?id=0B3hrBcUluiccaGZOdGtqazJPbTQ

STEPS TO REPRODUCE:

Please try multiple devices, as very fast ones such as the Nexus 5X do NOT (always) experience the same slowdown caused by FBO memory transfers as much as lower-end devices.
This should not affect the reproducibility rating, however, since the issue is not that all phones reach 20 FPS from 60, but rather, that a FBO management function is missing, which cripples any phone that does not have a high-end chipset.
Devices running the Snapdragon 400, 410, 600, 615, ULP GeForce + 720p or 1080p displays are good candidates for the test. They all run the demo at full 60FPS without texture rendering, and turn into a joke when enabling rendering to a surface.
Example A) Full rendering
1) Run attached example on Android
2) Admire the pretty demo
3) Press Back on device to switch to texture render target (surface)
4) Notice the slowdown

Example B) Bind / Unbind slowdown test
This second test shows that the slowdown does not occur from rendering to surfaces nor from rendering surfaces to the screen as normal textures (if that had been the case, simple sprite / background rendering would've killed the GPU).
1) Open attached example, switch the room order so that the game starts up in rm_test2
2) You should get a black screen with nothing rendered to it other than some text
3) Press Back on device to activate the surface target
4) Notice slowdown
The 2nd test does not even clear the surface. It just binds and unbinds it. This is enough to still cause the slowdown, thus further demonstrating the issue with the missing function for discarding the contents as pointed out in the previous links and paragraphs.
 
K

kurtwaldo

Guest
Very thoughtful and thorough investigation of the error. We need a lot more of this around here rather than "my game is slow WTF?!?". Inspiring. Anywho, I hope this gets some attention as I too have found myself disabling application surfaces all together. I released a basic game (graphically-demanding speaking) recently that used the application surface, and disabled it if it caused the FPS to drop below 50....and I also added analytics hooks to monitor how often this happened. Turns out, it happened A LOT just as you say. Only high-end devices were able to run the game at >50fps. And at a resolution of 1280x720, I was really hoping for better. I really hope I can start utilizing surfaces soon!!!
 
T

tacha

Guest
Yeah, very thoughtful and thorough investigation of the error!

Even I wrote a tutorial showing how exactly you could scale your screen for different aspect ratios WITHOUT using the application_surface, thus getting a game that is actually playable.
Would you be able to post a link to that tutorial? I'm doing an android game and I have seen some problems with the speed. I'm pretty new to coding, so it would very helpful to now how to avoid this issue.

Thanks in advance!
 
D

Drewster

Guest
Very thoughtful and thorough investigation of the error. We need a lot more of this around here rather than "my game is slow WTF?!?". Inspiring. Anywho, I hope this gets some attention as I too have found myself disabling application surfaces all together. I released a basic game (graphically-demanding speaking) recently that used the application surface, and disabled it if it caused the FPS to drop below 50....and I also added analytics hooks to monitor how often this happened. Turns out, it happened A LOT just as you say. Only high-end devices were able to run the game at >50fps. And at a resolution of 1280x720, I was really hoping for better. I really hope I can start utilizing surfaces soon!!!
We've all seen these surfaces-related slowdowns, particularly with Android -- I've done the same as Kurtwaldo -- monitoring and disabling as needed -- in one of my games.

Keep us updated on your bug link.
 
D

deni

Guest
Hi, sry for bumping this older thread, but did YoYo solved this problem? Did they even answer You?
 
S

Stuart

Guest
I'm developing a game that has unlockable palette swaps (like downwell) and am using draw_surface and application_surface to redraw the surface with the selected palette. I'm using Pixelated_Pope's Retro Palette Swapper Shader https://forum.yoyogames.com/index.php?threads/retro-palette-swapper.7498/#post-130650. Here's a video that shows the use:
.

Has there been any update on this or can anyone suggest an alternative method of swapping palettes? I'd really like to keep this as an unlockable feature in my game and seems a real shame to lose it for android after the work gone into implementing it.
 

zbox

Member
GMC Elder
Well this is a nice writeup indeed glad I've seen it. Shame nothing will ever be done though, surfaces are f i n g useless on mobile
 
J

jva

Guest
Hi, Thanks to the original poster for the research into this.
I have found a way to manually input glclear by modifying build cache files. Here is how (on android):

1. Create a script that has draw_clear_alpha(c_black,0.0) in it. Use this script in a place you want the glclear to occur. Now build and run your project (You will need to use YYC).

2. Go to your cache folder (Asset Cache Directory in gm preferences) and find your clear script as a .cpp file.
Open the file and file comment out everything inside the backets. Replace with:
Code:
_result = 0;
glDisable(GL_SCISSOR_TEST);
glColorMask(GL_TRUE,GL_TRUE,GL_TRUE,GL_TRUE);
glDepthMask(GL_TRUE);
glStencilMask(GL_TRUE);
glClearColor(0.0,0.0,0.0,0.0);
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT|GL_STENCIL_BUFFER_BIT);
return _result;
Add one more line to the top part of the .cpp script and save the file:
Code:
#include "gl.h"
3. Save https://www.khronos.org/registry/OpenGL/api/GLES/gl.h to the same folder.

4. Make the .cpp file and gl.h file READ ONLY so that they will not be replaced/deleted.

5. Build and run your game!

Let me know how it works, I'm seeing a 20-40% increase in FPS.
From what I have read, another important optimization closely related to this would be to use glDiscardFramebufferEXT (in glext.h) to hint the driver what buffers are no longer needed (Depth buffer after drawing for example).
 
Last edited by a moderator:

zbox

Member
GMC Elder
Hi, Thanks to the original poster for the research into this.
I have found a way to manually input glclear by modifying build cache files. Here is how (on android):

1. Create a script that has draw_clear_alpha(c_black,0.0) in it. Use this script in a place you want the glclear to occur. Now build and run your project (You will need to use YYC).

2. Go to your cache folder (Asset Cache Directory in gm preferences) and find your clear script as a .cpp file.
Open the file and file comment out everything inside the backets. Replace with:
Code:
_result = 0;
glDisable(GL_SCISSOR_TEST);
glColorMask(GL_TRUE,GL_TRUE,GL_TRUE,GL_TRUE);
glDepthMask(GL_TRUE);
glStencilMask(GL_TRUE);
glClearColor(0.0,0.0,0.0,0.0);
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT|GL_STENCIL_BUFFER_BIT);
return _result;
Add one more line to the top part of the .cpp script and save the file:
Code:
#include "gl.h"
3. Save https://www.khronos.org/registry/OpenGL/api/GLES/gl.h to the same folder.

4. Make the .cpp file and gl.h file READ ONLY so that they will not be replaced/deleted.

5. Build and run your game!

Let me know how it works, I'm seeing a 20-40% increase in FPS.
From what I have read, another important optimization closely related to this would be to use glDiscardFramebufferEXT (in glext.h) to hint the driver what buffers are no longer needed (Depth buffer after drawing for example).
Wow... That's super cool if true I'd love to test it out. Question being, if a 40% increase is possible why haven't YYG implemented the seemingly simple solution?
 

gnysek

Member
Wow... That's super cool if true I'd love to test it out. Question being, if a 40% increase is possible why haven't YYG implemented the seemingly simple solution?
maybe they yet didn't found out that there is some error in their engine - like they have some bugs before with blending modes/alpha tests/linear interpolation, that if you didn't repeated "enable" function each step, it was disabled by engine at end of frame pipeline.
If it really improves things - try reporting it trough helpdesk, maybe they finally find that one line which is wrong and speed things up :)
 
J

jva

Guest
I traced the opengl commands with MALI Graphics Debugger and found that:
1. GM clears with glclear but scissor is enabled (slow)
2. All surfaces are always created to power of 2 sizes, so with 1280x720 screen you will have a 2048x1024 application surface. 4096x2048 application surface on a 2560x1536 resolution device! Not sure but I don't think power of 2 sizes are needed from compatibility standpoint.

Reported these to YYG, hope they fix these soon as they should be quick and easy fixes on their side.

One more thing would be to get glDiscardFramebufferEXT but lets see what happens on the two first.
 

rwkay

GameMaker Staff
GameMaker Dev.
Thank you for the help...

we had not realised that the scissor test was on during the clear but now you have pointed it out, its obvious (amazing how fresh eyes can see the answer when we have looked so long at that code)...

anyway thanks to this insight we now see the slow down disappear on the affected platforms (mainly Android base, but the fix will be across OpenGL platforms).

this should be in the next public builds (on both GMS2 and GMS1)

Russell
 

rwkay

GameMaker Staff
GameMaker Dev.
We have been working on this for quite some time (since application_surface had been added), and we have been aware of it (I have had several discussions with driver teams in various companies trying to get to the bottom of it) but noone had point the obvious (now) out.

The fix should help across the board on multiple platforms.

The issue with swapping surface render targets is that if you are doing a clear first then the read back from memory will not need to happen now - if you do not do the clear then it will still have to do the read back to maintain the integrity of the surface... I will see if we can do some more tests on that scenario to see if we can make that faster.

OK... the bit below I will leave in because it shows how stupid I can be... (inside the spoiler)

As long as you do...

Code:
surface_set( blah );
   draw_clear( c_white );   // important that first thing done is the clear

   // important that you should do your rendering to the surface after the surface has been cleared
   // do rendering here...

surface_reset_target();
This will stop the surface from being read back from memory (as it has been cleared so the driver knows that the read back is not required).

Hmmm just been looking at our surface functions and we don't have an explicit surface_clear which is what you would need to do in that scenario so I doubt that the current fix would affect that... I will discuss internally.

Actually it would be better to have a parameter to surface_set which allowed you to say just clear the contents so the select itself would not incur anything

Russell
 
Last edited:
J

jva

Guest
Good to hear glclear is now working, glad I could help!

The non-power of two surfaces seem tricky and might lead to compatibility problems but glDiscardFramebufferExt() is still needed to get the best performace:
 
Last edited by a moderator:

rwkay

GameMaker Staff
GameMaker Dev.
Ah yes... we fixed the non-power of two issue at the same time and will discuss the issue with glDiscardFramebufferEXT - I suspect this will be down to setting the surface properly (i.e. if interested in keeping the depth / stencil portion)

Russell
 

gnysek

Member
LOT OF improvements. I hope that we don't need to wait 4-5 months to get it in 1.4 (finally, as stable one wasn't updated for 8 months!).
 

acidemic

Member
Looks as if this problem was fixed in the 1.99.551 Early Access build, but this seems introduced a new bug for the function "sprite_create_from_surface" which now creates distorted image when draw it back to the screen. Problem appeared only on Android, (Windows 7 and 10 are OK). Other platforms not tested.
 
Top