Shaders What's the biggest uniform a shader can handle?

RujiK

Member
(Google has given me a lot of conflicting results.) How big of an array can I safely pass into a shader?

What if the array is full of integers instead of floats? Can I pass 4x more information? Does the size limit vary from computer to computer? If so, what's a safe limit?

Am I living dangerously if I do this?
uniform ivec3 int_array[500];

How about this?
uniform vec4 float_array[250];

THANKS!
 

kburkhart84

Firehammer Games
I don't know the exact values....but I DO know that this is the type of thing that textures are often used for. You basically encode your data into textures and then read the texels in the shader. Its the method used for example in UE4 to control data on GPU particles.
 

Joe Ellis

Member
^Yeah you get literally a million more values by putting them into textures.
I'm not sure the exact amount with dx11, I remember in gms 1.4 the limit of uniform values (floats) was about 600, and vecs add up to the same amount of individual values, once you pass the limit it says "no, too many, shader exiting" and the program stops. I know the limit is a lot more in dx11 but I don't know by how much. I don't think it varies with different computers, I remember reading something saying "the max amount of uniforms in dx9c is *** but in more recent versions it's a lot more" I think that was from a reliable source like microsoft, or some kind of tech news report saying about cool new advances in graphics. I don't actually read about stuff like that, I just found it when I was wondering about the same thing as you lol. I think I read a lot of conflicting things aswell but I remember trusting that certain source cus it seemed the most reliable.
 

Bart

WiseBart
In GLSL ES the value for that is defined in so-called shader constants.
There's more info on those in the GLSL ES spec under 7.4.

For uniforms, it's gl_MaxVertexUniformVectors and gl_MaxFragmentUniformVectors you'll be looking for.
The minimum required values for those in a GLSL ES implementation are defined as 1024.
And that implementation part seems to be important since some values seem to be different in WebGL when using HTML5.
Not entirely sure about this, but while it says 'vectors' it's probably 'components' since at one point in the past the variable seems to have been renamed (the two used to be called gl_MaxVertexUniformComponents and gl_MaxFragmentUniformComponents)
So, if that assumption is correct (I haven't verified that yet), it would mean that the largest uniform array you can send to a shader is 1024 floats.

In case that it could be of any use, I wrote a script that you can use to retrieve those values using a shader: Get Shader Constants

To get an exact value, however, you'll still need to subtract the size of the uniforms that GameMaker itself passes in by default (matrices, light positions/colours, etc.)
It should be possible to retrieve that using a tool called RenderDoc.

If you know the exact values of the number of uniforms available it should be possible to know exactly when you're out of uniforms and when it's necessary to send another batch to the shader.
 

RujiK

Member
@kburkhart84 Unfortunately this information is going into the vertex shader which doesn't support texture fetching 🙁

@Joe Ellis 600 is such a weird number for a limit, but I haven't been able to hit 1024 so it seems pretty accurate.

@Bart I think GMS2 uses GLSL ES 2.0 though? I think I remember seeing Mike Daily post that somewhere ages ago. Your asset looks interesting though, I'll try it out later today. I'm curious how you could pull those values from the GPU.
 

Bart

WiseBart
GameMaker does indeed use GLSL ES 2.0. But the version of the shader language that corresponds to that is 1.00, or, 100 as it's defined in the shader.
So OpenGL ES 2.0 actually uses shading language 1.0.
It may seem confusing at first but it's consistent with how it's shown in the Quick Reference Card.

EDIT: I just gave it a quick try with RenderDoc and it seems you'll have to subtract +/- 50 floats for the vertex shader (3 x 16 for the matrices and a couple of other values).
So the number of floats you can send using uniforms is probably somewhere close to 1000 (9XX).
 
Last edited:

GMWolf

aka fel666
It's dependent on your driver implementation and GPU.

In OpenGL the value is guaranteed to be at least 128 vec4s. So that's a good starting point.
 

Yal

🐧 *penguin noises*
GMC Elder
The memory allotted for uniforms is often also used for varyings, the built-in matrices (because that's actually uniforms GM handles for you to make things easier, shaders are more generic than "apply this transformation on the input data to represent it in a 3D space"), and potentially also for intermediate variables... so even if it's supposed to be 128 vec4s, you're probably not going to be able to use that much, and it's always better to use as few uniforms as possible (and encode data in textures instead when possible if you know you're gonna need a lot of data).

Another important thing: multibyte data types usually need to be aligned so the bytes for each variable are all in the same word (=vec4) (since memory accesses read one word at a time and just discard the parts you don't need if you read less than one word) so you probably can't fit in 129 vec3s despite there being lots of free space in theory, because there's only 128 words and there's no place to fit in the 129th vector.

Both of these things are implementation-dependent, so they're not an issue in every GPU, but your code gets more compatible if you assume they're always true.
 
On my computer I can add 192 float uniform components for the vertex shader, and a different number for the fragment shader. As far as I know, there is no way to determine the correct value from within gamemaker.

Gamemaker uses a ton of uniform components for the matrices, the lighting, and fog, as well as alpha testing, and probably a couple other things I'm forgetting about.

You can sample textures from the vertex shader using an extension. Don't have a link handy for you at the moment. But I know people have implemented that.
 

RujiK

Member
@flyingsaucerinvasion Can I ask what your setup is and which directx version you have? I was hoping the 128-192 limit would be only for the really ancient laptops.
Also is the192 limit consistent with ivec4, vec4, float, and int uniforms?

@GMWolf do you have any links where you got that number? Or do you remember if it changes with directx version?

According to STEAM SURVEY 95% of all steam users have windows 7 or higher, which will have a minimum of directx 11. I'm wondering if I really need to worry about any systems that run directx 9 or lower...

EDIT: Actually does directX version have any effect at all? I'm not sure where I got that idea now that I think about it...
 

Yal

🐧 *penguin noises*
GMC Elder
As far as I know, there is no way to determine the correct value from within gamemaker.
The shader is submitted in plaintext to the GPU driver (which actually compiles it) so the only thing you can check from within GM is whether the shader compiles properly or not. (This is the reason why you can't get any shader compile errors until AFTER you try using the shader - it's compiled at runtime). I guess it would've been nice to read GL constants and stuff, though...

Though this might be possible with a special hack shader: the shader encodes the various constants you're looking for (e.g. gl_MaxFragmentUniformComponents) in the output texture, allowing you to read them from GM after the shader has been run once if you use it to draw to a surface.
(The tricky bit here is that you don't have full control of the execution - you can only control a single pixel per shader iteration - so you need to use texture coordinates to tell what constant to read for this pixel, and which byte(s) of the constant to encode)

Also is the192 limit consistent with ivec4, vec4, float, and int uniforms?
vec4s take up 4 times as much space as floats, so if you have the limit for floats, divide it by 4 to have a rough estimate for how many vec4s you can use (the problem is that floats have size 1, so you can fit them into any gap left around larger data types - chances are the usable space is lower depending on alignment)
 

Bart

WiseBart
@GMWolf is right about that minimum value of 128. It's the minimum value of gl_MaxVertexUniformVectors that's given in the specification.
That 1024 is the value on my system so that may have made this more confusing than it had to be.

I gave it another try yesterday and it seems like GM only

Though this might be possible with a special hack shader: the shader encodes the various constants you're looking for (e.g. gl_MaxFragmentUniformComponents) in the output texture, allowing you to read them from GM after the shader has been run once if you use it to draw to a surface.
(The tricky bit here is that you don't have full control of the execution - you can only control a single pixel per shader iteration - so you need to use texture coordinates to tell what constant to read for this pixel, and which byte(s) of the constant to encode)
That's how I did it in that script I posted. It is possible :)
 

GMWolf

aka fel666
@GMWolf do you have any links where you got that number? Or do you remember if it changes with directx version?
GL_MAX_VERTEX_UNIFORM_COMPONENTS

Turns out since OpenGL 3.0 the guaranteed minimum is 1024.
So any hardware that is OpenGL 3.0+ Level will have at least 1024. The vast majority of hardware is at least OpenGL 3.0 Level.

Even if you are running DX11, you should expect 1024+ uniforms from OpenGL 3.+ Level hardware. It's a hardware limitations really.
I also can't be bothered looking through DX documentation tbh.
 

GMWolf

aka fel666
Another important thing: multibyte data types usually need to be aligned so the bytes for each variable are all in the same word (=vec4) (since memory accesses read one word at a time and just discard the parts you don't need if you read less than one word) so you probably can't fit in 129 vec3s despite there being lots of free space in theory, because there's only 128 words and there's no place to fit in the 129th vector.
That's really only the case on old hardware. Anything by NVidia or AMD from the last 10 years will be able to use just 3 slots for a vec3. (Idk about intel but probably the same too)
 

Yal

🐧 *penguin noises*
GMC Elder
That's really only the case on old hardware. Anything by NVidia or AMD from the last 10 years will be able to use just 3 slots for a vec3. (Idk about intel but probably the same too)
I'm not saying that the last byte is wasted entirely, I'm saying that you can't have it spread over multiple words (the entire vec needs to be in the same word)

It's probably more illustrative if I illustrate it:

This works:
word 1: vec3A vec3A vec3A floatD
word 2: vec3B vec3B vec3B floatE
word 3: vec3C vec3C vec3C floatF


This doesn't work:
word 1: vec3A vec3A vec3A vec3D
word 2: vec3B vec3B vec3B vec3D
word 3: vec3C vec3C vec3C vec3D
 

GMWolf

aka fel666
I'm not saying that the last byte is wasted entirely, I'm saying that you can't have it spread over multiple words (the entire vec needs to be in the same word)

It's probably more illustrative if I illustrate it:

This works:
word 1: vec3A vec3A vec3A floatD
word 2: vec3B vec3B vec3B floatE
word 3: vec3C vec3C vec3C floatF


This doesn't work:
word 1: vec3A vec3A vec3A vec3D
word 2: vec3B vec3B vec3B vec3D
word 3: vec3C vec3C vec3C vec3D
You are right.
Of course a vec3/vec4 only uses a single uniform location.

However the value returned by GL_MAX_VERTEX_UNIFORM_COMPONENTS refers to the amount of storage available.
On most relatively modern hardware, a vec3 will only use 3 of those components.
A vec4 will use 4.

GL_MAX_VERTEX_UNIFORM_COMPONENTS is at least 1024, which lets you fit 341 vec3s.

So even if you have 1024 uniform locations, you cannot fit 342 vec3s.

Im this case the limiting factor is not the number of locations but the number of components used.


Older vector based GPUs will not be able to use just 3 components for a vec3 and will pad them out to vec4s.
That's because older hardware would literally use a 4 wide register for each uniform.
Modern hardware uses buffer backed uniforms.



To illustrate:
Slot0 : vec3Ax vec3Ay vec3Az unused
Slot1: vec3Bx vec3By vec3Bz unused
...
Uniform buffer:
vec3Ax vec3Ay vec3Az vec3Bx
vec3By vec3Bz
...


Of course the actual layout it will use is unspecified. And you probably want to target as much hardware as possible, so if you can pad your vec3s with some useful data then do that (or Assume each vec3 uses 4 components)

[Edit]
I realize I used the word "location" earlier.
To be honest I was a little confused about this whole mess myself and for a moment though component limits and slot limits were the same thing.
 
Last edited:

GMWolf

aka fel666
Oh boy here we go:

So here is how things works in OpenGL:

First we have uniform locations.
every uniform has a location and is how you refer to uniforms between the GPU and client code.

scalars, vector, and matrix types all use one location.
Array types and struct types use multiple locations: basically each element uses a location. so an array of 10 floats uses 10 locations. an array of 10 matrices also only uses 10 locations, despite matrices being bigger than floats.

So if you have 1024 locations, thats enough locations to store 1024 matrices.
this is guaranteed to be at least 1024 locations.
the actual value is returned by GL_MAX_UNIFORM_LOCATIONS.

Note: the location limit is per shader program (all stages combined). Not per stage.

However, there is another limit: the number of component each shader stages can have.
A component is basically a 4 byte word.
A float uses one component. A vec4 uses 4 component. A vec3 is probably only using 3 components.
A mat4 will use 16 components.

opengl3+ guarantees at least 1024 components per stage, but in practice, especially on newer hardware, this is higher.
returned by GL_MAX_<STAGE>_UNIFORM_COMPONENTS.


So, on opengl 3 capable hardware, you are guaranteed to be able to store 1024 floats per stage, provided you lay them out in a way that doesn't use more than 1024 locations.
(This is possible because each stage has its own separate component limit, but locations are shared between stages).

With this whole mess you can see why we moved away from uniforms and use interface blocks (uniform buffers) instead.
Its much easier to just declare a block and bind a buffer rather than mess with uniform locations and components.
Code:
layout(std140, binding = 0) uniform {
   float anArray[1024];
   vec4 bunchOfVecs[10];
}
Just one thing to keep track of: the binding of the buffer (and make sure your client side buffer layout matches std140).
I really wish GM would modernize a bit and support interface blocks. They were introduces over 11 years ago!
 
Top