GameMaker Buffers and Voxels

Kealor

Member
edit: i feel like this may be more suited to the advanced section, but i am not a member of the advanced superior life forms

tl;dr: I want to execute my voxel engine using buffers instead of DSs. How much more efficient is this likely to be in terms of optimization. And then what is the best way to do that.

so i have a thing. Im going to start refactoring a system from a DS based one to buffer based. but i havent exactly worked with buffers in the past an immense amount. i know the gist, but i havent done anything like this yet. first ill blurt out my outline for the system and see if anyone familiar can poke holes in it, or let me know if what im doing is less efficient than possible. Its tied to 3d, but its much much more of a buffer/data/CPU management issue rather than graphical.

So im working on making a voxel environment. i have the whole thing generating the chunks/voxels and running, but the performance is bad. 60 fps for a simplified room is far far below what i need. the previous non voxel system i was using was POM shader based to simulate the voxels, rather than the typical data. This method is allready far less GPU intensive, but far more CPU/load time intensive. most of the processing time is in the ds accessors and iteration methods, so im hoping by moving to buffers things will get alot better, if not i may have to revert to the previous method. the room im trying to set up is 11x20x10 chunks (2200) of 16x16x16 voxels (9,011,200 potential voxels, yikes). there are pretty much 5 stages to this:

roomStart
A: tilemap-to-point-cloud setup to allow for loading the voxels from the tile layout of the room
B: create empty chunk set, this sounds like it would be fast, but it actually takes a whopping 20 seconds to iterate for the whole room
C: load tiledata from room and use (A) to populate chunks
draw event
D: refresh needed chunk models
E: draw chunk models

A and C can be eliminated by pre-generating the data and exporting it to a file so it can be pre-loaded with the game which will negate alot of the time, but that would still need to be loaded so fixing this stuff up still matters .(edited)
D and E are the parts im most focussed on fixing up, since they make up the bulk of the active unavoidable performance cost. The (big) plus side is that its not actually mandatory to have active chunk loading, im happy with the environment being completely static since its the backdrop for the game rather than having any mechanical utility (this isnt minecraft). Voxels also have no texture, they are each a single colour, so ive allready done the optimization of turning like-voxels into cuboids (a 4x4x4 of the exact same colour is rendered as 1 cube rather than 64). All of the data is setup using 3d lists (main data is a map of lists of lists of lists of voxels, and each voxel is an size 9 array, r,g,b,spec,glow,alpha,xSize,ySize,zSize. Sizes are for the cuboid optimization) and maps, the maps contain the position/empty status/refresh flag/model data for each chunk. there is a final 3d-list-grid which has the map-key of each chunk sorted by their position.

Thats the outline for the engine as is currently. a completely empty room is still only ~400 fps, the simplified is ~60. the more developed room is ~40. This is on a GTX 1080/intel i7. if im understanding this correctly, this could be much faster if set up with buffers since scrolling through one is much much cheaper than iterating/accessing a DS. a buffer is a linear sequence of bitdata essentially, so is it just a matter of "indexing" the buffer in accordance with a format i define. for example:


if we ignore the chunk position/empty/refresh etc. and just do the basic voxel data, setup in a similar-ish method to before, there is a 7-dimensional structure that looks like: chunkX-chunkY-chunkZ-voxelX-voxelY-voxelZ-data. where the size of each part is known (number of chunks along X --- number of voxels along X within a chunk --- 9x 8bit int (could make sizes 4 bit but that means dropping the 8bit format to 4bit, which i beleive would be inefficient, please let me know if this is incorrect).

that gives us by my count a buffer which is (11x20x10x16x16x16) x 9bits; so 81,100,800bits. a 81mb size buffer. wowzers. this however only needs to be parsed on room load (still kinda cumbersome) and in small batches whenever a chunk is refreshed (not even planned to be a feature but for fun ill be testing it)


the draw event is then iterating across a 2200 size structure holding the indeces of each created model which is allready loaded in the gpu. empty chunks dont need to have a model ofc. and there will be a decent number of empty chunks

(i also havent experimented with chunk sizes yet, it may well end up being better to have 11x20x1 chunks of size 16x16x160, but ill be testing that after this process, so just take it as is for now i think)

alternatively, and i think this may be better, but harder, the buffer could just be a sequence of active voxels (post cuboid optimization) where data is encoded a certain way as to signal jumps in the sequence. so a certain kind of data indicates skipping a certain amount of empty voxels to parse, rather than a voxel. The encoding process would mean bumping up each address from 8bit, meaning the worst-case is a larger buffer/more work, but in the vast majority of cases itd be easier. Also this would mean that if i wanted to add voxel updates in the future (would change the size of buffers) i would need to have a 2d structure for this. So the main buffer contained references to a set of buffers, one for each chunk, and those chunks contain the data as i just described.
 
Last edited:
Top