• Hello [name]! Thanks for joining the GMC. Before making any posts in the Tech Support forum, can we suggest you read the forum rules? These are simple guidelines that we ask you to follow so that you can get the best help possible for your issue.

Question - IDE Very low FPS on another computer - Help !

I

izark

Guest
Hi there. I hope you can help me with this, I really don´t know what the problem is.

My project on laptop A runs at 60 fps. - Windows 7.
Now I have a laptop with i7-8750, rtx 2060, Windows 10. Game runs at 45 fps.

I modified sleep margin from 10 to 5 , 1, 20... Nothing.
I changed vsync On, Off and nothing.

The code is not the problem, I get 2500+ fps with the debugger in my old computer.
This happened to another unrelated project I have too.
I installed the last release of GM2, then installed an older one, no changes.
So, projects are broken on Windows 10?

What can be happening? Other games run fine. Could you help?
 
Last edited by a moderator:
L

Lonewolff

Guest
45 could be the refresh rate of the new computer.

You can test this by setting display_reset(0,0) in your create event. If it suddenly speeds up, then you have your answer. :)
 
I

izark

Guest
45 could be the refresh rate of the new computer.

You can test this by setting display_reset(0,0) in your create event. If it suddenly speeds up, then you have your answer. :)
No, it stills run at 44-45 fps. But thanks anyway.
 
M

MishMash

Guest
Could be lots of things, not enough information to really say much other than complete shots in the dark.


- Latest drivers? Are you 100% sure that your Rtx 2060 is actually runing, and the game is running on it? You can verify this in task manager by enabling the GPU column on processes, and dumping the results of "dxdiag" (win + R).
- How are you measuring FPS? GM "fps" variable? What about fps_real? Could be CPU related.
- Are you building the new game on the new laptop, or transferring a compiled exe? (Want to ensure apples to apples comparison)
- Has your laptop got any of the following:
A) Special battery saver software (often comes installed on Acer computers and is a pile of crap)
B) Anti-virus software e.g. Norton or McAfee which could be interfering? (Try disabling them!)
C) Different power configuration profiles -- always make sure its set to high performance when verifying games.

- What about a blank GM project?

Code:
The code is not the problem
Don't necessarily assume this, you could be doing something crazy. Some operations can suddenly be super expensive on certain platforms due to emulation (recently had a run in with a get_time_of_day function on android which took 8ms to run per call ~ crazy right, it was doing some network lookup, as it didn't have a "last time" saved, to get the result and hanging the program in the meantime). Run the profiler and see what the CPU workload is like.
 
I

izark

Guest
Could be lots of things, not enough information to really say much other than complete shots in the dark.


- Latest drivers? Are you 100% sure that your Rtx 2060 is actually runing, and the game is running on it? You can verify this in task manager by enabling the GPU column on processes, and dumping the results of "dxdiag" (win + R).
- How are you measuring FPS? GM "fps" variable? What about fps_real? Could be CPU related.
- Are you building the new game on the new laptop, or transferring a compiled exe? (Want to ensure apples to apples comparison)
- Has your laptop got any of the following:
A) Special battery saver software (often comes installed on Acer computers and is a pile of crap)
B) Anti-virus software e.g. Norton or McAfee which could be interfering? (Try disabling them!)
C) Different power configuration profiles -- always make sure its set to high performance when verifying games.

- What about a blank GM project?

Code:
The code is not the problem
Don't necessarily assume this, you could be doing something crazy. Some operations can suddenly be super expensive on certain platforms due to emulation (recently had a run in with a get_time_of_day function on android which took 8ms to run per call ~ crazy right, it was doing some network lookup, as it didn't have a "last time" saved, to get the result and hanging the program in the meantime). Run the profiler and see what the CPU workload is like.
- Yeah, lastest drivers installed and working.
- fps are 45 -44. Real fps are over 700.
- I used an exported version of the game. I also used a copy of the folder where the game is. Both times with poor fps. In the last computer there were no performances issues at all, and execution of the code was very light.
- There is no battery save software.
- I disabled avast, but its the same.
- Yeah, I set battery to high performance, and activated game mode on W1O. (This is the first time I use w10 so I don´t really know much about all the options).
- I created an executable of the game and I get the same 44 fps.
- When running the game, GPU usage is 98-100%, which is too much. CPU usage is 0.1-0.5%.

Make sure your game is actually using your fancy GPU and not rendering the graphics on the CPU's onboard graphics:

https://www.howtogeek.com/351522/how-to-choose-which-gpu-a-game-uses-on-windows-10/
Thanks, but I tried this with GM2 and fps won´t change. I tried this on an executable of the game and fps are then 144, even when vsync is ON, strange.
 
L

Lonewolff

Guest
- fps are 45 -44. Real fps are over 700.
That seems extremely low for fps_real. If the RTX-2060 was really handling the graphics, I'd expect beyond 10,000 FPS on an empty project.

My GTX-1050 which is a third of the speed gets ~6000 - 7000 FPS on an empty project.
 
I

izark

Guest
That seems extremely low for fps_real. If the RTX-2060 was really handling the graphics, I'd expect beyond 10,000 FPS on an empty project.

My GTX-1050 which is a third of the speed gets ~6000 - 7000 FPS on an empty project.
Thanks for the info.
I get 50 fps on an empty project (room_speed set to 60) and 1500-2500 real fps on an empty project, which is much lower than expected.
 

Hyomoto

Member
I'm trying to rack my brain, there was a really old bug that was supposedly fixed with the sleep timer that would cause behaviors like this. I also used to have this kind of an issue. The symptoms were that it would always run under the game speed. So if you reduced your game speed to 45, it would run at, say, 37 instead. It seems like you might be experiencing something like that here, though other than adjusting the sleep timer I don't remember if there were any other fixes.
 
I

izark

Guest
I'm trying to rack my brain, there was a really old bug that was supposedly fixed with the sleep timer that would cause behaviors like this. I also used to have this kind of an issue. The symptoms were that it would always run under the game speed. So if you reduced your game speed to 45, it would run at, say, 37 instead. It seems like you might be experiencing something like that here, though other than adjusting the sleep timer I don't remember if there were any other fixes.
Thanks for the help but, in the end, the problem is half solved:

- When the game is at full screen while using a certain shader (on application surface) on drawGUI event, fps drop to 45.
- When windowed, using that same shader, fps go back to 60.
- If I stop using the shader, fps are always 60, windowed or full screen.

So, it seems the shader performs very badly on full screen, but this didn´t happen in my old computer. I don´t know the reason.
Why is this shader making this fps drop ?
The resolution of the game is 320 x 180.

Code:
varying vec2 v_vTexcoord;
varying vec4 v_vColour;

//uniform float iGlobalTime;
uniform vec2 iResolution;
//uniform float v_shape;
//uniform float t2;
//uniform float t3;

// Emulated input resolution.
// Fix resolution to set amount.
#define resX 321.0
#define resY 181.0
const vec2 res = vec2(resX, resY);

// Hardness of scanline.
//  -8.0 = soft
// -16.0 = medium
#define hardScan -16. //18

// Hardness of pixels in scanline.
// -2.0 = soft
// -4.0 = hard
#define hardPix -7. //-8.2.

// Hardness of short vertical bloom.
//  -1.0 = wide to the point of clipping (bad)
//  -1.5 = wide
//  -4.0 = not very wide at all
#define hardBloomScan -1.45 // -2

// Hardness of short horizontal bloom.
//  -0.5 = wide to the point of clipping (bad)
//  -1.0 = wide
//  -2.0 = not very wide at all
#define hardBloomPix -1.15 // -1.5

// Amount of small bloom effect.
//  1.0/1.0 = only bloom
//  1.0/16.0 = what I think is a good amount of small bloom
//  0.0     = no bloom
#define bloomAmount 1.0/16.0

vec2 warp=vec2(1.0/64.,1.0/24.);  //(1.0/64.0,1.0/24.0);

// Amount of shadow mask.
#define maskDark 1.2
#define maskLight 1.2

//------------------------------------------------------------------------

vec2 Warp(vec2 pos){
  pos=pos*2.0-1.0;
  pos*=vec2(1.0+(pos.y*pos.y)*warp.x,1.0+(pos.x*pos.x)*warp.y);
  return pos*0.5+0.5;}
 
// sRGB to Linear.
// Assuing using sRGB typed textures this should not be needed.
float ToLinear1(float c){return(c<=0.04045)?c/12.92:pow((c+0.055)/1.055,2.4);}
vec3 ToLinear(vec3 c){return vec3(ToLinear1(c.r),ToLinear1(c.g),ToLinear1(c.b));}

// Linear to sRGB.
// Assuing using sRGB typed textures this should not be needed.
float ToSrgb1(float c){return(c<0.0031308?c*12.92:1.055*pow(c,0.41666)-0.055);}
vec3 ToSrgb(vec3 c){return vec3(ToSrgb1(c.r),ToSrgb1(c.g),ToSrgb1(c.b));}

// Nearest emulated sample given floating point position and texel offset.
// Also zero's off screen.
const vec3 black = vec3(0.0,0.0,0.0);
vec3 Fetch(vec2 pos,vec2 off){
  pos=floor(pos*res+off)/res;
  if(max(abs(pos.x-0.5),abs(pos.y-0.5))>0.5)return black;
  return ToLinear(texture2D(gm_BaseTexture,pos.xy,-16.0).rgb);}

// Distance in emulated pixels to nearest texel.
vec2 Dist(vec2 pos) {return -(fract(pos*res)-vec2(0.5));}
 
// 1D Gaussian.
float shape=2.; //2.25
float Gaus(float pos,float scale){return exp2(scale*pow(abs(pos),shape));}
//float Gaus(float pos,float scale){return exp2(scale*pos*pos);}

// 3-tap Gaussian filter along horz line.
vec3 Horz3(vec2 pos,float off){
  vec3 b=Fetch(pos,vec2(-1.0,off));
  vec3 c=Fetch(pos,vec2( 0.0,off));
  vec3 d=Fetch(pos,vec2( 1.0,off));
  float dst=Dist(pos).x;
  // Convert distance to weight.
  float scale=hardPix;
  float wb=Gaus(dst-1.0,scale);
  float wc=Gaus(dst+0.0,scale);
  float wd=Gaus(dst+1.0,scale);
  // Return filtered sample.
  return (b*wb+c*wc+d*wd)/(wb+wc+wd);}

// 5-tap Gaussian filter along horz line.
vec3 Horz5(vec2 pos,float off){
  vec3 a=Fetch(pos,vec2(-2.0,off));
  vec3 b=Fetch(pos,vec2(-1.0,off));
  vec3 c=Fetch(pos,vec2( 0.0,off));
  vec3 d=Fetch(pos,vec2( 1.0,off));
  vec3 e=Fetch(pos,vec2( 2.0,off));
  float dst=Dist(pos).x;
  // Convert distance to weight.
  float scale=hardPix;
  float wa=Gaus(dst-2.0,scale);
  float wb=Gaus(dst-1.0,scale);
  float wc=Gaus(dst+0.0,scale);
  float wd=Gaus(dst+1.0,scale);
  float we=Gaus(dst+2.0,scale);
  // Return filtered sample.
  return (a*wa+b*wb+c*wc+d*wd+e*we)/(wa+wb+wc+wd+we);}

// 7-tap Gaussian filter along horz line.
vec3 Horz7(vec2 pos,float off){
  vec3 a=Fetch(pos,vec2(-3.0,off));
  vec3 b=Fetch(pos,vec2(-2.0,off));
  vec3 c=Fetch(pos,vec2(-1.0,off));
  vec3 d=Fetch(pos,vec2( 0.0,off));
  vec3 e=Fetch(pos,vec2( 1.0,off));
  vec3 f=Fetch(pos,vec2( 2.0,off));
  vec3 g=Fetch(pos,vec2( 3.0,off));
  float dst=Dist(pos).x;
  // Convert distance to weight.
  float scale=hardBloomPix;
  float wa=Gaus(dst-3.0,scale);
  float wb=Gaus(dst-2.0,scale);
  float wc=Gaus(dst-1.0,scale);
  float wd=Gaus(dst+0.0,scale);
  float we=Gaus(dst+1.0,scale);
  float wf=Gaus(dst+2.0,scale);
  float wg=Gaus(dst+3.0,scale);
  // Return filtered sample.
  return (a*wa+b*wb+c*wc+d*wd+e*we+f*wf+g*wg)/(wa+wb+wc+wd+we+wf+wg);}

// 5-tap Gaussian filter along horz line.
vec3 Horz5Bloom(vec2 pos,float off){
  vec3 b=Fetch(pos,vec2(-2.0,off));
  vec3 c=Fetch(pos,vec2(-1.0,off));
  vec3 d=Fetch(pos,vec2( 0.0,off));
  vec3 e=Fetch(pos,vec2( 1.0,off));
  vec3 f=Fetch(pos,vec2( 2.0,off));
  float dst=Dist(pos).x;
  // Convert distance to weight.
  float scale=hardBloomPix;
  float wb=Gaus(dst-2.0,scale);
  float wc=Gaus(dst-1.0,scale);
  float wd=Gaus(dst+0.0,scale);
  float we=Gaus(dst+1.0,scale);
  float wf=Gaus(dst+2.0,scale);
  // Return filtered sample.
  return (b*wb+c*wc+d*wd+e*we+f*wf)/(wb+wc+wd+we+wf);}

// Return scanline weight.
float Scan(vec2 pos,float off){
  float dst=Dist(pos).y;
  return Gaus(dst+off,hardScan);}

// Return scanline weight for bloom.
float BloomScan(vec2 pos,float off){
  float dst=Dist(pos).y;
  return Gaus(dst+off,hardBloomScan);}

// Allow nearest three lines to effect pixel.
vec3 Tri(vec2 pos){
  vec3 a=Horz3(pos,-1.0);
  vec3 b=Horz5(pos, 0.0);
  vec3 c=Horz3(pos, 1.0);
  float wa=Scan(pos,-1.0);
  float wb=Scan(pos, 0.0);
  float wc=Scan(pos, 1.0);
  return a*wa+b*wb+c*wc;}

// Small bloom.
vec3 Bloom(vec2 pos){
  vec3 a=Horz5(pos,-2.0);
  vec3 b=Horz7(pos,-1.0);
  vec3 c=Horz7(pos, 0.0);
  vec3 d=Horz7(pos, 1.0);
  vec3 e=Horz5(pos, 2.0);
  float wa=BloomScan(pos,-2.0);
  float wb=BloomScan(pos,-1.0);
  float wc=BloomScan(pos, 0.0);
  float wd=BloomScan(pos, 1.0);
  float we=BloomScan(pos, 2.0);
  return a*wa+b*wb+c*wc+d*wd+e*we;}

// Very compressed TV style shadow mask.
vec3 Mask(vec2 pos){
  float line=maskLight;
  float odd=0.0;
  if(fract(pos.x/6.0)<0.5)odd=1.0;
  if(fract((pos.y+odd)/2.0)<0.5)line=maskDark;
  pos.x=fract(pos.x/3.0);
  vec3 mask=vec3(maskDark,maskDark,maskDark);
  if(pos.x<0.333)mask.r=maskLight;
  else if(pos.x<0.666)mask.g=maskLight;
  else mask.b=maskLight;
  mask*=line;
  return mask;}

// Entry.
void main(){
  vec2 fragCoord= v_vTexcoord.xy * iResolution;
  vec2 pos=Warp(fragCoord.xy/iResolution.xy);
  gl_FragColor.rgb=Tri(pos)*Mask(fragCoord.xy);
  gl_FragColor.rgb+=Bloom(pos)*bloomAmount;
  gl_FragColor.a=1.0;
  gl_FragColor.rgb=ToSrgb(gl_FragColor.rgb);}
 
M

MishMash

Guest
Right, so we still need a bit of clarification on the case where you were testing an empty project. <-- I never trust these tests when people do them as for some reason they have a tendency to share only the results that corroborate their original claim.

However, putting that aside.. this shader is quite inefficient. Per pixel, seems to be:
  • 42 texture fetches (ouch) w/ cache inefficient sampling patterns
  • Lots of branching (if statements)
If this is being done on the final full-screen surface, i.e. as you are using the GUI layer, and are do a full-screen surface effect, running at native resolution, rather than just for the 320x180 then this is understandable. From this, a likely conclusion is that your new laptop has a higher resolution screen than the previous?

Now, there are two reasons this ends up being slow:
1) The sheer number of shader instructions and memory operations is heavy
2) You will often find a point when doing effects like blurring where the shader suddenly gets a lot slower. It's quite common with blurs that people use "relative" radius using UV coordinates rather than actual. I see you have an "iResolution" value here, but it will depend if that accurately reflects the resolution or not. Basically the point at which it gets slow is known as a texture-cache-crash. When you perform a texture fetch, a surrounding block of pixels is also collected and loaded into the texture cache (a faster piece of memory which is local to the shader cores on your GPU, rather than the global GPU memory which is where the big data is stored (that block of 4GB data)).

However, this texture cache is small, so the amount of local texture data that can be stored is minimal. if you perform sampling outside of this region, or sample with an odd texture pattern, then the cache gets flushed and new texture data loaded in (around that new sample). This then becomes significantly slower. (Note that a cluster of shader cores share this local memory, and once one of them does a texture fetch, all of them benefit from the cache data locality. The GPU also distributes shading tasks based on locality, to maximise this).


This is commonly referred to as texture cache thrashing.
------
So, the thing I'd like you to check is what the effective resolution of each of your laptops is. That is, the resolution of the GUI layer in GameMaker, as this is ultimately the one that matters, rather than factoring any weird device scaling going on. 4k laptops often have some scaling magic so that things are legible, but this won't apply to fullscreen apps that just scale up to native resolution.
 
I

izark

Guest
Right, so we still need a bit of clarification on the case where you were testing an empty project. <-- I never trust these tests when people do them as for some reason they have a tendency to share only the results that corroborate their original claim.

However, putting that aside.. this shader is quite inefficient. Per pixel, seems to be:
  • 42 texture fetches (ouch) w/ cache inefficient sampling patterns
  • Lots of branching (if statements)
If this is being done on the final full-screen surface, i.e. as you are using the GUI layer, and are do a full-screen surface effect, running at native resolution, rather than just for the 320x180 then this is understandable. From this, a likely conclusion is that your new laptop has a higher resolution screen than the previous?

Now, there are two reasons this ends up being slow:
1) The sheer number of shader instructions and memory operations is heavy
2) You will often find a point when doing effects like blurring where the shader suddenly gets a lot slower. It's quite common with blurs that people use "relative" radius using UV coordinates rather than actual. I see you have an "iResolution" value here, but it will depend if that accurately reflects the resolution or not. Basically the point at which it gets slow is known as a texture-cache-crash. When you perform a texture fetch, a surrounding block of pixels is also collected and loaded into the texture cache (a faster piece of memory which is local to the shader cores on your GPU, rather than the global GPU memory which is where the big data is stored (that block of 4GB data)).

However, this texture cache is small, so the amount of local texture data that can be stored is minimal. if you perform sampling outside of this region, or sample with an odd texture pattern, then the cache gets flushed and new texture data loaded in (around that new sample). This then becomes significantly slower. (Note that a cluster of shader cores share this local memory, and once one of them does a texture fetch, all of them benefit from the cache data locality. The GPU also distributes shading tasks based on locality, to maximise this).


This is commonly referred to as texture cache thrashing.
------
So, the thing I'd like you to check is what the effective resolution of each of your laptops is. That is, the resolution of the GUI layer in GameMaker, as this is ultimately the one that matters, rather than factoring any weird device scaling going on. 4k laptops often have some scaling magic so that things are legible, but this won't apply to fullscreen apps that just scale up to native resolution.
- Thank you for you answer and explanation, it is very useful.
- The resolution of the laptop is 1920x1080. The resolution of the game is 320*180, and it is played on fullscreen..
- I think I have found the problem: Idle CPU. The results I got before (50 fps on an empty project, 45 fps while playing my game) happened while CPU usage was around 0.1-0.4 %.
- I tried the game later and CPU usage was about 10%. Fps were 60 then. GPU usage was around 40% before and now.
- I am using now a simplified version of the shader, with 16 texture fetches, and it works well on fullscreen. Fps are 60 and real fps are above 2800.
- So, there is something in my laptop telling CPU to go to sleep while gamemaker projects are playing.

Code:
/// CREATION CODE

shaderOn = true

shader_to_use = s_CRT2
display_set_gui_size(x_res, y_res) // macros: 320 and 180.
application_surface_draw_enable(false)

surface_width  = window_get_width();
surface_height = window_get_height();

iResolution = shader_get_uniform(shader_to_use, "iResolution");
iTime = shader_get_uniform(shader_to_use, "iTime");

/// DRAW GUI EVENT

if shaderOn
   {
   shader_set(shader_to_use);
     shader_set_uniform_f(iResolution, surface_width, surface_height);
     draw_surface(application_surface,0,0);
     // time is not necessary for this shader
   shader_reset();
   }
 
Last edited by a moderator:
Top