Customer Support
Member
Hello!
First things first, I am in no way affiliated with the tech support team, I just use this name for fun.
Let's get to the good stuff:
So there have been topics regarding the usage of the application_surface (or surfaces in general) on Android and iOS, most of them saying that when using them the devices will lag a lot - and they were right! The lag is pretty unbearable.
There have been brilliant* suggestions such as limiting your game to 30FPS and a resolution of no more than 800x480 - good luck with that in 2016.
* Sarcasm
For a while we have all agreed upon the fact that although we could use surfaces for awesome effects, nobody will use them because the game will just lag too much. Whooptie-freaking-doodle.
Even I wrote a tutorial showing how exactly you could scale your screen for different aspect ratios WITHOUT using the application_surface, thus getting a game that is actually playable.
What was weird is that other engines did not suffer from this. Most of us can name quite a bunch of games that run on our smartphones and obviously render to a texture for post-processing, but for some reason if you do that in Game Maker it kills the GPU.
I did some research.
As it turns out, there isn't a problem in the draw pipeline itself, but we are missing a function. We're missing something that will clear surfaces forcibly and thus free the graphics hardware from doing some nasty, time-consuming memory transfers - which is what most if not all other engines already have.
This post is intended to be good news. I have filed a bug report (it hasn't gone through just yet, as it is still the weekend) and hopefully soon enough we'll have a fix for it and be able to use the GPUs in our smartphones correctly with GM.
Here's my full bug report: (This part will be deleted and replaced with a link to the report on Mantis once it has gone through)
TITLE
Surface / Frame Buffer Object missing function causes major crippling slowdown in almost all devices that are not high-end gaming PCs or current-gen game consoles
DESCRIPTION
I have discovered an issue in the Draw pipeline that not only affects almost every single mobile device that exists, but it also turns a big part of GM's rendering functionality useless on any device that does not use a dedicated powerful GPU (see: Gaming PCs), making it impractical / impossible to do effects that other engines do with ease on devices such as Android handsets, iOS devices, XBox 360 and even most laptops.
This is a bit of a longer description than usual, but I'll try to be as specific as possible so as to avoid the problem being incorrectly resolved as No Change Required, as per:
http://bugs.yoyogames.com/view.php?id=3471
The above mentioned bug was reporting the same issue as here, however the example file seemed to be "broken" (missing surface memory checks) and the issue being reported was resolved, having believed that the problem itself was the fact that some surface_exists calls were missing.
That, however, is not true.
Let us begin with what the above link mentioned as the main issue: [Surface performance on Android is slow].
This description, albeit correct, is not sufficient: The actual issue is neither the simple Usage of surfaces, nor the fact that it is on Android. This issue is actually present in almost every Android device out there, in 100% of iOS devices that exist (since they all use Imagination Technologies GPU cores) and even some laptop IGPs, such as the Intel HD Graphics HDn000 series.
To be more specific, we need to talk about how GPUs in systems with no dedicated high-speed video memory work. Here, we'll look at Tile-based (TBDR, or Tile Based Deferred Rendering) GPU architectures. These are present in all iOS devices, most Android devices and even some laptop GPUs implement a somewhat similar architecture.
TBDR GPUs or the ones with similar architectures experience an issue when incorrectly managing rendering to different Frame Buffer Objects (FBOs). Please notice that I did not say "incorrectly rendering", but incorrectly [managing] the rendering.
These GPUs have the memory segmented into Slow / Normal memory and Fast memory. The Fast memory is where the actual rendering takes place. Once having rendered to a FBO, its memory will be Resolved (transfered over from the Fast to the Slow memory). When re-binding that FBO, its memory will be Restored (transferred from the Slow to Fast memory).
This adds some overhead and it is standard behaviour. However, in some cases, you do NOT want to have unnecessary Resolve or Restores, since they WILL slow down the game on big FBOs (or as we call them in GM:S, Surfaces).
Let's recap what we know:
On Tile-based and similar GPUs, the graphics API will Resolve an FBO after it has been unbinded and Restore it when it is re-binded. This is very useful when you are switching from drawing to a surface to the screen Back buffer (main buffer) and then back again to the surface later on, without losing the contents of the surface (e.g. progressively rendering to a surface for more than just one frame, while still writing to the screen buffer).
THE PROBLEM that GM:S currently has, however, is that Resolve and Restore are done regardless of the use case.
A great example is the application_surface.
What currently happens is:
Pre-draw: application_surface render target is binded, all further drawing will happen on this.
***The surface is then Restored by the graphics API.
The surface is then cleared with the background colour.
Draw events will render everything to the application_surface.
Render target is reset, application_surface is unbinded, the GPU will now render to the back-buffer directly
***application_surface is now Resolved by the graphics API.
Back-buffer is optionally cleared (if set), applciation_surface is now rendered to it (if set)
The problems appear at the steps labeled with ***. The Restore and Resolve are here because this approach is universal. If the user does not set any background drawing to happen, the 2 steps make sense, because we probably want to continue drawing to what we had on the surface in the previous frame.
However! In a lot of cases, you do not need these steps. If, let's say, we have a game where we use the application_surface and we re-render the whole scene every frame (which is what most games do anyway), we do NOT need to restore the surface every frame (because we will clear it anyway after re-binding!). This saves a huge amount of processing time per frame, since now a slow memory transfer would not be done. This is what other engines already do, and what GM:S does NOT do yet, thus causing slowdown on most of anything that is not a desktop PC (which is bad. very bad.).
The weird thing is that this issue has been around ever since the application_surface has been introduced (v1.3!).
Please check the following relevant links for more information regarding this issue:
1) Same issue in IrrLicht: http://irrlicht.sourceforge.net/forum/viewtopic.php?f=4&t=49634
As pointed out in the following quote: "Is there a way to make sure that no data is ever copied back from fast video memory into slow memory after rendering a framebuffer?"
2) Native OpenGL application experiencing the same problem: http://stackoverflow.com/questions/...e-is-very-slow-using-opengl-es-2-0-on-android
Resolution to the problem pointed out in ZZZ's answer: "According to qualcomm docs, you need to glclear after every glbindframebuffer, this is a problem related to tiled architecture, if you are switching framebuffers, data need to get copied from fastmem to normal memory to save current framebuffer and from slowmem to fast mem to get contents of newly binded frame, in case you are clearing just after glbind no data is copied from slowmem to fastmem and you are saving time"
The GPU in question is a Tile-based one, as for most of all others in mobile devices, as previously mentioned
3) renderTextures (Surfaces / FBOs) in Unity causing exactly the same issue if badly managed: http://forum.unity3d.com/threads/drawing-to-render-texture-is-very-slow-on-android-and-ios.417007/
Resolution to the problem pointed out by jimCheng: "Maybe you should invoke RenderTexture.DiscardContents manually, otherwise you will get a worse performance on Tile-based deferred rendering (TBDR) GPU Architecture."
Let us look at point (3) above. In Unity, they have a special function for their Surface equivalents (renderTexture for them), where the contents can be manually discarded if the developer knows that the surface will not need to be reloaded every frame (if they are going to clear it anyway).
Specifically, let's look at what that function does for them:
"1. it marks appropriate render buffers (there are bools in there) to be forcibly cleared next time you activate the RenderTexture
2. IF you call it on active RT (render target) it will discard when you will set another RT as active."
Quoted from http://forum.unity3d.com/threads/where-to-call-rendertexture-discardcontents.215555/
This is what GM:S is missing. The draw pipeline itself is not broken in any way, it just misses a simple script that would call glclear() when the developer does not need to keep the contents of a frame buffer (surface) for more than one frame.
What glclear() does is it drops the contents of an FBO and it makes the API NOT copy it over to/from the slow memory depending on when it has been cleared (after binding or before unbinding).
The way to fix this would be to have an equivalent function for GM. My proposition is something like:
surface_discard(surfaceID);
I) Marks the appropriate surface to be forcibly cleared next time it is set via surface_set_target, thus eliminating a useless Restore from slow memory in case the game does not require multiple-frame rendering to the same FBO (surface).
II) If called on a currently active surface (if called while we have a surface on the surface target stack), it will forcibly discard the contents once the render target is reset via surface_reset_target(), thus eliminating an useless Resolve to slow memory in case this is for some reason not needed.
Such a function would solve the crippling slowdown that GM currently causes on 99.9% of mobile devices (and laptops with IGPs), finally allowing developers to actually be able to use shaders and surfaces in their games.
This function would not need to be for just the application_surface, but it should rather be an universal function for all surfaces.
Also, the problem here is NOT the drawing of surfaces on the screen. That is handled like any normal texture drawing. The problem is the current management of binding / unbinding the textures. It is bad practice on almost all GPUs, as pointed out by the OEMs themselves (please refer to link (2) again, where someone quotes actual Qualcomm Adreno datasheets).
SAMPLE URL:
https://drive.google.com/open?id=0B3hrBcUluiccaGZOdGtqazJPbTQ
STEPS TO REPRODUCE:
Please try multiple devices, as very fast ones such as the Nexus 5X do NOT (always) experience the same slowdown caused by FBO memory transfers as much as lower-end devices.
This should not affect the reproducibility rating, however, since the issue is not that all phones reach 20 FPS from 60, but rather, that a FBO management function is missing, which cripples any phone that does not have a high-end chipset.
Devices running the Snapdragon 400, 410, 600, 615, ULP GeForce + 720p or 1080p displays are good candidates for the test. They all run the demo at full 60FPS without texture rendering, and turn into a joke when enabling rendering to a surface.
Example A) Full rendering
1) Run attached example on Android
2) Admire the pretty demo
3) Press Back on device to switch to texture render target (surface)
4) Notice the slowdown
Example B) Bind / Unbind slowdown test
This second test shows that the slowdown does not occur from rendering to surfaces nor from rendering surfaces to the screen as normal textures (if that had been the case, simple sprite / background rendering would've killed the GPU).
1) Open attached example, switch the room order so that the game starts up in rm_test2
2) You should get a black screen with nothing rendered to it other than some text
3) Press Back on device to activate the surface target
4) Notice slowdown
The 2nd test does not even clear the surface. It just binds and unbinds it. This is enough to still cause the slowdown, thus further demonstrating the issue with the missing function for discarding the contents as pointed out in the previous links and paragraphs.
First things first, I am in no way affiliated with the tech support team, I just use this name for fun.
Let's get to the good stuff:
So there have been topics regarding the usage of the application_surface (or surfaces in general) on Android and iOS, most of them saying that when using them the devices will lag a lot - and they were right! The lag is pretty unbearable.
There have been brilliant* suggestions such as limiting your game to 30FPS and a resolution of no more than 800x480 - good luck with that in 2016.
* Sarcasm
For a while we have all agreed upon the fact that although we could use surfaces for awesome effects, nobody will use them because the game will just lag too much. Whooptie-freaking-doodle.
Even I wrote a tutorial showing how exactly you could scale your screen for different aspect ratios WITHOUT using the application_surface, thus getting a game that is actually playable.
What was weird is that other engines did not suffer from this. Most of us can name quite a bunch of games that run on our smartphones and obviously render to a texture for post-processing, but for some reason if you do that in Game Maker it kills the GPU.
I did some research.
As it turns out, there isn't a problem in the draw pipeline itself, but we are missing a function. We're missing something that will clear surfaces forcibly and thus free the graphics hardware from doing some nasty, time-consuming memory transfers - which is what most if not all other engines already have.
This post is intended to be good news. I have filed a bug report (it hasn't gone through just yet, as it is still the weekend) and hopefully soon enough we'll have a fix for it and be able to use the GPUs in our smartphones correctly with GM.
Here's my full bug report: (This part will be deleted and replaced with a link to the report on Mantis once it has gone through)
TITLE
Surface / Frame Buffer Object missing function causes major crippling slowdown in almost all devices that are not high-end gaming PCs or current-gen game consoles
DESCRIPTION
I have discovered an issue in the Draw pipeline that not only affects almost every single mobile device that exists, but it also turns a big part of GM's rendering functionality useless on any device that does not use a dedicated powerful GPU (see: Gaming PCs), making it impractical / impossible to do effects that other engines do with ease on devices such as Android handsets, iOS devices, XBox 360 and even most laptops.
This is a bit of a longer description than usual, but I'll try to be as specific as possible so as to avoid the problem being incorrectly resolved as No Change Required, as per:
http://bugs.yoyogames.com/view.php?id=3471
The above mentioned bug was reporting the same issue as here, however the example file seemed to be "broken" (missing surface memory checks) and the issue being reported was resolved, having believed that the problem itself was the fact that some surface_exists calls were missing.
That, however, is not true.
Let us begin with what the above link mentioned as the main issue: [Surface performance on Android is slow].
This description, albeit correct, is not sufficient: The actual issue is neither the simple Usage of surfaces, nor the fact that it is on Android. This issue is actually present in almost every Android device out there, in 100% of iOS devices that exist (since they all use Imagination Technologies GPU cores) and even some laptop IGPs, such as the Intel HD Graphics HDn000 series.
To be more specific, we need to talk about how GPUs in systems with no dedicated high-speed video memory work. Here, we'll look at Tile-based (TBDR, or Tile Based Deferred Rendering) GPU architectures. These are present in all iOS devices, most Android devices and even some laptop GPUs implement a somewhat similar architecture.
TBDR GPUs or the ones with similar architectures experience an issue when incorrectly managing rendering to different Frame Buffer Objects (FBOs). Please notice that I did not say "incorrectly rendering", but incorrectly [managing] the rendering.
These GPUs have the memory segmented into Slow / Normal memory and Fast memory. The Fast memory is where the actual rendering takes place. Once having rendered to a FBO, its memory will be Resolved (transfered over from the Fast to the Slow memory). When re-binding that FBO, its memory will be Restored (transferred from the Slow to Fast memory).
This adds some overhead and it is standard behaviour. However, in some cases, you do NOT want to have unnecessary Resolve or Restores, since they WILL slow down the game on big FBOs (or as we call them in GM:S, Surfaces).
Let's recap what we know:
On Tile-based and similar GPUs, the graphics API will Resolve an FBO after it has been unbinded and Restore it when it is re-binded. This is very useful when you are switching from drawing to a surface to the screen Back buffer (main buffer) and then back again to the surface later on, without losing the contents of the surface (e.g. progressively rendering to a surface for more than just one frame, while still writing to the screen buffer).
THE PROBLEM that GM:S currently has, however, is that Resolve and Restore are done regardless of the use case.
A great example is the application_surface.
What currently happens is:
Pre-draw: application_surface render target is binded, all further drawing will happen on this.
***The surface is then Restored by the graphics API.
The surface is then cleared with the background colour.
Draw events will render everything to the application_surface.
Render target is reset, application_surface is unbinded, the GPU will now render to the back-buffer directly
***application_surface is now Resolved by the graphics API.
Back-buffer is optionally cleared (if set), applciation_surface is now rendered to it (if set)
The problems appear at the steps labeled with ***. The Restore and Resolve are here because this approach is universal. If the user does not set any background drawing to happen, the 2 steps make sense, because we probably want to continue drawing to what we had on the surface in the previous frame.
However! In a lot of cases, you do not need these steps. If, let's say, we have a game where we use the application_surface and we re-render the whole scene every frame (which is what most games do anyway), we do NOT need to restore the surface every frame (because we will clear it anyway after re-binding!). This saves a huge amount of processing time per frame, since now a slow memory transfer would not be done. This is what other engines already do, and what GM:S does NOT do yet, thus causing slowdown on most of anything that is not a desktop PC (which is bad. very bad.).
The weird thing is that this issue has been around ever since the application_surface has been introduced (v1.3!).
Please check the following relevant links for more information regarding this issue:
1) Same issue in IrrLicht: http://irrlicht.sourceforge.net/forum/viewtopic.php?f=4&t=49634
As pointed out in the following quote: "Is there a way to make sure that no data is ever copied back from fast video memory into slow memory after rendering a framebuffer?"
2) Native OpenGL application experiencing the same problem: http://stackoverflow.com/questions/...e-is-very-slow-using-opengl-es-2-0-on-android
Resolution to the problem pointed out in ZZZ's answer: "According to qualcomm docs, you need to glclear after every glbindframebuffer, this is a problem related to tiled architecture, if you are switching framebuffers, data need to get copied from fastmem to normal memory to save current framebuffer and from slowmem to fast mem to get contents of newly binded frame, in case you are clearing just after glbind no data is copied from slowmem to fastmem and you are saving time"
The GPU in question is a Tile-based one, as for most of all others in mobile devices, as previously mentioned
3) renderTextures (Surfaces / FBOs) in Unity causing exactly the same issue if badly managed: http://forum.unity3d.com/threads/drawing-to-render-texture-is-very-slow-on-android-and-ios.417007/
Resolution to the problem pointed out by jimCheng: "Maybe you should invoke RenderTexture.DiscardContents manually, otherwise you will get a worse performance on Tile-based deferred rendering (TBDR) GPU Architecture."
Let us look at point (3) above. In Unity, they have a special function for their Surface equivalents (renderTexture for them), where the contents can be manually discarded if the developer knows that the surface will not need to be reloaded every frame (if they are going to clear it anyway).
Specifically, let's look at what that function does for them:
"1. it marks appropriate render buffers (there are bools in there) to be forcibly cleared next time you activate the RenderTexture
2. IF you call it on active RT (render target) it will discard when you will set another RT as active."
Quoted from http://forum.unity3d.com/threads/where-to-call-rendertexture-discardcontents.215555/
This is what GM:S is missing. The draw pipeline itself is not broken in any way, it just misses a simple script that would call glclear() when the developer does not need to keep the contents of a frame buffer (surface) for more than one frame.
What glclear() does is it drops the contents of an FBO and it makes the API NOT copy it over to/from the slow memory depending on when it has been cleared (after binding or before unbinding).
The way to fix this would be to have an equivalent function for GM. My proposition is something like:
surface_discard(surfaceID);
I) Marks the appropriate surface to be forcibly cleared next time it is set via surface_set_target, thus eliminating a useless Restore from slow memory in case the game does not require multiple-frame rendering to the same FBO (surface).
II) If called on a currently active surface (if called while we have a surface on the surface target stack), it will forcibly discard the contents once the render target is reset via surface_reset_target(), thus eliminating an useless Resolve to slow memory in case this is for some reason not needed.
Such a function would solve the crippling slowdown that GM currently causes on 99.9% of mobile devices (and laptops with IGPs), finally allowing developers to actually be able to use shaders and surfaces in their games.
This function would not need to be for just the application_surface, but it should rather be an universal function for all surfaces.
Also, the problem here is NOT the drawing of surfaces on the screen. That is handled like any normal texture drawing. The problem is the current management of binding / unbinding the textures. It is bad practice on almost all GPUs, as pointed out by the OEMs themselves (please refer to link (2) again, where someone quotes actual Qualcomm Adreno datasheets).
SAMPLE URL:
https://drive.google.com/open?id=0B3hrBcUluiccaGZOdGtqazJPbTQ
STEPS TO REPRODUCE:
Please try multiple devices, as very fast ones such as the Nexus 5X do NOT (always) experience the same slowdown caused by FBO memory transfers as much as lower-end devices.
This should not affect the reproducibility rating, however, since the issue is not that all phones reach 20 FPS from 60, but rather, that a FBO management function is missing, which cripples any phone that does not have a high-end chipset.
Devices running the Snapdragon 400, 410, 600, 615, ULP GeForce + 720p or 1080p displays are good candidates for the test. They all run the demo at full 60FPS without texture rendering, and turn into a joke when enabling rendering to a surface.
Example A) Full rendering
1) Run attached example on Android
2) Admire the pretty demo
3) Press Back on device to switch to texture render target (surface)
4) Notice the slowdown
Example B) Bind / Unbind slowdown test
This second test shows that the slowdown does not occur from rendering to surfaces nor from rendering surfaces to the screen as normal textures (if that had been the case, simple sprite / background rendering would've killed the GPU).
1) Open attached example, switch the room order so that the game starts up in rm_test2
2) You should get a black screen with nothing rendered to it other than some text
3) Press Back on device to activate the surface target
4) Notice slowdown
The 2nd test does not even clear the surface. It just binds and unbinds it. This is enough to still cause the slowdown, thus further demonstrating the issue with the missing function for discarding the contents as pointed out in the previous links and paragraphs.