SOLVED Buffers?

Neptune

Member
Trying to wrap my head around buffers... As I understand buffer_tell returns the last write position in bytes for the buffer.

Why would it be necessary to write to a buffer with byte gaps between the data?

And why must you call buffer_seek_start before working with a buffer?

Any information is much appreciated.
 

TsukaYuriko

☄️
Forum Staff
Moderator
Trying to wrap my head around buffers... As I understand buffer_tell returns the last write position in bytes for the buffer.

Why would it be necessary to write to a buffer with byte gaps between the data?
Individual pieces of data may be represented by a varying amount of bytes (e.g. the number 255 requiring 1 byte (11111111) and the number 65535 requiring 2 bytes (11111111 11111111)). The "cells" in a buffer have to be of the same size, as it's a continuous stream of data with no way to specify different sizes.

Imagine it like having a list of names where every name is exactly 8 characters long, and the list is really just a long string of characters. There are no newlines to separate the names. All you know is that every name is exactly 8 characters long.

Your list may look like this:

Code:
IsabellaJenniferMadelineVeronica
We have four people here, named Isabella, Jennifer, Madeline and Veronica.

Now let's add Neptune to the list...

Code:
IsabellaNeptuneJenniferMadelineVeronica
We now have five people, named Isabella, NeptuneJ, enniferM, adelineV and eronica. Uhh... (eronica probably also has some weird character at the end because there's no 8th character in that name, so who knows what that trailing null byte will be displayed as.)

Code:
Isabella NeptuneJenniferMadelineVeronica
By adding a space in front of the 7 letter name (and also discarding leading spaces when reading the data back) we can successfully store and retrieve the correct names.


To fully understand buffers, we'll have to delve into binary arithmetic for a bit.
If you store 65535 (11111111 11111111) and 255 (11111111) by just tacking their binary values together, you'd end up with "11111111 11111111 11111111". Storing that in a 2-byte-aligned buffer bit for bit would pad the end with zeroes, so we'd end up with "11111111 11111111 11111111 00000000". If you then read that back, you'd get the misaligned result of 65535 (11111111 11111111) and 65280 (11111111 00000000) instead of 65535 and 255.

The correct way to store this is therefore to pad 255 with an additional byte of zeroes as "empty space" - as in "00000000 11111111", which, since leading zeroes are insignificant, is still 255. The buffer will now contain "11111111 11111111 00000000 11111111", which will be read back as 65535 (11111111 11111111) and 255 (00000000 11111111, or just 11111111).

Setting the buffer alignment will make it so this padding is automatically done when you write values to the buffer that require less space to be stored than what's available.

And why must you call buffer_seek_start before working with a buffer?
Buffers have something similar to a hard drive's read/write head that indicates at which position data is being read from or written to. If you don't explicitly set this position, you could be reading contents from the middle of the buffer (e.g. if you were previously writing data to it equal to half the buffer's length) when your intention is to start reading from the start, so, to be safe, you should always specify this before reading or writing to ensure you're manipulating the correct data.


Edit: Clarity.
 
Last edited:

Neptune

Member
Wow, thanks for the in depth info @TsukaYuriko
I think I'm following - so buffers will automatically pad the smaller pieces of data to keep everything a uniform size. I assume grouping like-sized data together is most efficient especially with intention to send over a network...
 

Neptune

Member
Hmmm so if I wrote int 7 int 255 int 1 int 65535 and then a 1000 character long string (can I even write mixed types to a "grow" buffer?) would this cause an issue?

Would this make the buffer excessively large, because the ints are padding to accommodate the string size?
 
Last edited:

TsukaYuriko

☄️
Forum Staff
Moderator
Hmmm so if I wrote int 7 int 255 int 1 int 65535 and then a 1000 character long string (can I even write mixed types to a "grow" buffer?) would this cause an issue?
You can write mixed data to grow buffers with no issues. You are only limited to a specific data type when using the fast buffer type. Source (and potentially helpful for further reading): Guide To Using Buffers

Basically, as long as you can ensure that whatever receives the buffer data will be able to read it properly, you're good to go, since you can (or have to) specify how many bytes to read/write when performing operations on buffers. In the above example, you'll know that the buffer's format is:

1 byte (7)
1 byte (255)
1 byte (1)
2 bytes (65535)
1000 bytes (string) + 1 byte (null terminator of the string)

You could even have the string be variable length, as long as it contains no null bytes and the receiver will interpret the null byte as the end of the string. This also means you can have multiple strings of variable length.


The thing with mixing data like this is that it's a lot more error-prone than working with fixed alignments. Reading a byte too much? Oops, the value you just read as well as the entire rest of your data will now be read as garbage.
That kind of stuff won't happen if you use a fixed alignment for writing and reading. It may "waste" more space than necessary, but whether that matters depends on what you're using this for.

I generally recommend to start out by making things work first - then, if optimization will have substantial benefits, you can think about how to optimize it and will probably instantly notice if you messed up somehow. Starting out by packing everything as tightly as possible can make it difficult to pinpoint at which point things are going wrong, though, especially when operating on the "bit" level where you can't tell one zero apart from another without context.

Another thing to keep in mind is that while it's certainly possible to mix data of different sizes, it may be overall easier and more comfortable to split the data into two separate buffers.

Would this make the buffer excessively large, because the ints are padding to accommodate the string size?
That depends on the buffer's alignment. If you set it to an alignment of 1, no automatic padding will be applied, and you essentially have the endless and gapless stream of data I described in my previous post.

Note that using mixed-size data with an alignment of 1, and therefore having no padding/gaps, means that you have to ensure that the receiver of a buffer knows its format. There's no way to tell whether "11111111 11111111 11111111" is supposed to be (255, 255, 255), (255, 65535), (65535, 255) or (16777215) just by looking at the buffer's contents.

This would be a different story if the buffer had an alignment of 2 and had the contents "11111111 11111111 00000000 11111111". This is unmistakably (65535, 255).
 
Last edited:

Neptune

Member
Interesting, so say I packed 11111111 11111111 11111111 into a buffer (intended as 255 255 255), and I knew the contents, and the alignment was 1, how would I retrieve each of the three pieces of data from the buffer?
Like how do I say
Code:
var data1 = get datas1 from buffer;
var data2 = get datas2 from buffer;
var data3 = get datas3 from buffer;
 

TsukaYuriko

☄️
Forum Staff
Moderator
You'd use buffer_read with the type set to buffer_u8 (unsigned 8 bit / 1 byte value, 0~255) three times (as each read advances the read/write head by as many bytes as you're reading).

For the sake of visualization, let's take a look at a small example.

GML:
// Create a fixed-size buffer with a size of 3 bytes and an alignment of 1.
mybuffer = buffer_create(3, buffer_fixed, 1);

// Write three unsigned integers to the buffer.
buffer_write(mybuffer, buffer_u8, 255);
buffer_write(mybuffer, buffer_u8, 255);
buffer_write(mybuffer, buffer_u8, 255);

// Seek back to the start.
buffer_seek(mybuffer, buffer_seek_start, 0);

// Read the unsigned integers back into variables.
var data1 = buffer_read(mybuffer, buffer_u8);
var data2 = buffer_read(mybuffer, buffer_u8);
var data3 = buffer_read(mybuffer, buffer_u8);

// Output the variables to verify that everything worked as intended.
show_debug_message(data1); // 255
show_debug_message(data2); // 255
show_debug_message(data3); // 255
I hope this makes it as clear as possible! :)
 

TsukaYuriko

☄️
Forum Staff
Moderator
There's no "u8 buffer" - buffer_u8 means "the data type when writing to this buffer is an unsigned 8 bit integer". Buffers can hold any kind of data - all they store is a stream of binary digits. The whole signed vs. unsigned ordeal only comes into play when you're reading or writing data.

Another little binary lesson:
An 8 bit integer can have 256 different values (as there are 2^8 different combinations of 0s and 1s). This is why an unsigned integer has a value range of 0~255.
A signed 8 bit integer, on the other hand, can also be negative. However, its range is considerably smaller: -128 to 127. This is because we still only have 256 different combinations in total to represent a number. Roughly half of these are used for negative numbers, one for zero, the rest for positive numbers.

Storage-wise, there is no real difference - it's all about interpretation. The unsigned integer 128 would be stored as "10000000". If you tried to save 128 as a signed integer and then read it back, you'd be met with a surprise - it'll have magically gained a minus sign in front of it and turned into -128. There's a bit more to this and some actual logic behind how it's stored and interpreted other than "if the first bit is 1, it's negative", but that's the general gist of it. If you ever played a game, got a really high score and then suddenly turned negative, you now know why! The developer used a signed value for the score, and it just just rolled over from its positive range into its negative range.


If you write your value as unsigned and read it back as signed (or vice versa), you may get unexpected results. To write a signed 8 bit integer, specify the data type as buffer_s8.
 
Top