Shaders Using shaders for parellel computing



Hey guys!
I've been working on a project that involves something along the lines of cellular automation.
Simply put, I need to perform steps on every cell in a grid that may be as large as 100x100.

Sure I could run a loop of some kind and perform operations on every cell that way (each cell is a ds_map containing cell properties); however, doing it on as many as 10000 cells every step is kind of cpu intensive.

Another method I could use would be to make every cell an object and use the built in step-event run the logic. The issue I have with this approach is that it seems wasteful in terms of memory and there's the CPU-intense aspect here as well.

I know that shaders in Game Maker are largely specialized to image processing, that said, I also know that in other projects shaders and gpu programming have been used for other things such as encryption breaking and bitcoin mining. So I know that shaders can and have been used in other places for non-graphical tasks.

I am not a shader expert. I don't use them very often and and not super comfortable with them like I am with gml. I'm wondering if anybody knows if Game Maker can use shaders for such a purpose, and if so, where can I go to get headed in the right direction?
Yeah, you can do this to some extent. If your calculations can be performed on a per-fragment basis, you can encode the results in some way into the render surface. You'll only have 4 bytes to work with for each fragment (rgba). Getting the data from the surface though is not cheap, performance-wise. You'll have to use either surface_getpixel, or else turn the surface into a buffer and read from the buffer. If you don't have a lot else going on, the performance hit may be acceptable. Only read from the surface what must be read, put off until later whatever you don't need to read right away.

I'm can't say whether it is a good idea to attempt this. Alternative approaches might or might not be superior.


@flyingsaucerinvasion Okay, so the problem I'm trying to solve is for a strategy game. Each cell in the game board has a population. The population grows modeled after the logistic population equation: dN/dt = rN((K-n)/K). The problem I'm trying to solve is calculating the population growth of each cell and updating the corresponding cell's value (stored in ds_map).
To see if I understand what you're proposing, you're suggesting that I encode some value as rgba in the surface, pass it to a shader, do the calculations in the shader, and return the value encoded yet again as an rgba pixel.
It's a good idea, though what I'm also getting from this is that ultimately I'm going to have to do a loop of some kind to write the returned values back to each cell because the shader has no access to gml variables.
Is that accurate?
Hey, do I have this right? Are you trying to find the population like this?

x = 1 / (1 + (1/x0 - 1)*exp(-r*t))

x = population at time t, r = malthusian parameter.

I don't really understand how that works, but it occurs to me, do you need to update the population size all the time? Can you not just calculate it when you need to know it?


@flyingsaucerinvasion be completely honest, what you typed there looks completely foreign to me.
I'm basically doing this:
population += (*rate modifier=1*) * ((population_limit - population) / population_limit) * population.
What that means is that instead of population having exponential growth until it reaches the limit
It follows a more realistic 'S' type graph where it starts slow (due to small populations not growing fast) then it grows quickly and begins to slow again (because of limited resources) until it reaches the limit.

And yes. What I'm going to end up doing is calculating it once at the beginning of each player's turn, and by doing this I'm going to have to increase the growth rate.
That said. I was hoping that I could make the GPU do all the cell calculations in parallel which will save cycles on slower devices.

The other thing about this problem is that I'm using the cellular automation cave algorithm here to generate more natural looking patches of land, and stacking them on top of each other to generate semi-natural looking "biomes" that determine the population limit for that square. As it currently stands, the game board generation happens reasonably fast on my tower. But I'm a little worried about a less-beefy much older computer.

If I could get the shader method to work, it should make the generation, and the calculation a *constant time process rather than a *linear one.

Chances are high that I'm being a little too worried about performance, but personally I'd rather over-optimize and have amazing performance, than under-optimize and irritate the user.
I think the potential advantage from using a shader to do this would be outweighed by the cost of setting up surfaces / reading data from them. And since you're only updating at the start of turns, there will only be a momentary hiccup, if any at all.


Wanted to chip in here, first of all, implement your algorithm on the CPU using grids, just so you know it works how you want. It doesn't really matter how slow it is, the worst thing you can do is immediately dive into trying to program a shader for it, as shaders are objetively much more difficult to debug than CPU code. Once you have an algorithm working, if it is simple, and only relies on data from neighbouring cells it should be relatively easy to port over.

On the question of performance, depending on your simulation space size, you can get a huge performance benefit from doing the processing in a shader vs on the CPU. It is true that the transfer is slow, however the buffer_get_surface and buffer_set_surface functions are a very efficient way of transferring the data between the GPU and CPU. (The transfer speed isn't significantly slower than something you would find with OpenCL or Cuda, it's very similar in fact.)

The harder part with shaders vs some more conventional GPGPU programming API is that you are limited by data types. You can use multiple surfaces if you need more data, as generally, you dont need to copy all the data back, just the final results. I use a similar approach for the water simulation in my game, it uses a number of surfaces in total to store: Volume, Pressure, Velocity, however the volume surface is the only one that contains data I am interested in reading. I can still update the other surfaces if i need to re-write data by rendering to them however.

The whole premise of High performance computing is to try and limit the amount of data moving around your system. Regardless of the environment you use, minimising data transfer always helps as it is normally a bottleneck to performance in any situation. You should also bear in mind however that shader performance is relative to the power of the GPU. Integrated graphical processors will struggle to perform as well. They will still likely run faster than a GML implementation, given their execution environment and the simplistic nature of them, however it is something to keep in mind.

The other question is what your ds_map contains. If for each cell you have lots of changing information that needs to be constantly updated on the surfaces so that the shader works correctly, there is likely going to be too much transfer going on. If however you can re-model the problem such that you can keep the data on the GPU and only modified through shaders (using multiple-render-targets), then it is possible. It's only that GPU-CPU transfer bottleneck which you need to be careful of. It is definitely possible, though it may be hard without experience, as you may end up just creating something that is slower.

So yeah, to begin with, just implement it on the CPU, see how it performs and go from there. Options like YYC or a DLL may be more appropriate if you are targeting windows.


@MishMash Thank you. I've done some tests on slower, less beefy computers and have decided that the cpu approach isn't a huge problem. That said, I'd like to attempt the gpu approach for the sake of science.
Does anybody have any tutorials, or articles that I could use to get started. As I have stated before, I don't use shaders much and am a novice when writing them. The most I have ever done on my own was simple pixel shaders that were drawn over the whole of a surface. Thanks again for your help.


Bare in mind that slower machines very often have only integrated low-end graphics... and also it's popular in non gaming laptops and notebooks that there is no dedicated GPU, but still the CPU is mid/hi end...
So I would say that leaving your calculations in CPU scope is a best solution...
Additionally the graphics pipeline is very tight in gms... and mostly the draw event needs huge optimization...

With surface drawing swapping etc... it could end up that you'll have more problems with graphic performance than cpu on most of of the low end machines.

There are some things you can do if you see that your engine needs optimisation...
I.e. I doubt that your calculation needs to be done 30-60 times per second? 1Hz refresh rate is fairly enough... Than you can divide the execution on your loop into couple of frames...
You can implement also the delta time into calculation so any fps drop-down won't effect the speed of your simulation...


@Smiechu yes, thank you. At this point I have decided to keep the process in the CPU for the purposes of my game. I want to do the GPU method as an experiment, see how it compares.