• Hey Guest! Ever feel like entering a Game Jam, but the time limit is always too much pressure? We get it... You lead a hectic life and dedicating 3 whole days to make a game just doesn't work for you! So, why not enter the GMC SLOW JAM? Take your time! Kick back and make your game over 4 months! Interested? Then just click here!

Data saving methods discussion(JSON vs DS strings vs token separated strings vs binary

kburkhart84

Firehammer Games
I'm familiar with all of these methods of saving data. However, I have not been able to find a nice full-fledged discussion on the merits of each of these, and what people have against certain ones. I will start off with what I know of them, and hopefully this sparks some opinions from other people.

JSON.....This has become a nice standard for data, and works great for anything that is getting sent over networks, etc... and Gamemaker has nice support for it as well. Compared to flat binary data, and even using token separated strings, JSON strings are typically quite longer, but in exchange you get something that is much closer to human-readable, and is a "standard" all over. With Gamemaker's built in support, you also get automatic destruction of maps and lists that are nested if you mark them properly simply by destroying the root ds_map.

DS strings...I'm referring to the strings you get with the read/write string functions for data structures. These strings are very much NOT human readable, and look almost like some sort of encryption, even though they aren't as far as I know. I have used these quite a bit anywhere that I don't care if you can understand the strings, and the support with Gamemaker is also great. This is a good way to go in my opinion if your data can easily be put into data structures or already is. You can also save multiple strings in the same file simply by adding a line ending to it. It seems I've seen somewhere that it is not recommended to use these strings, but I really don't know who said it and why that recommendation was made. Hopefully someone here can clarify if there is a good reason to NOT use them.

Token Separated Strings....these are strings with the data converted to string(from numerical values) and each one separated by a token character you choose(like ~ or | or & but can be whatever you want that won't be part of any of the actual stored strings, and can actually be multiple characters together too). From there, this data can be saved to text files. It is partially human readable, but less so than DS strings. However, they would have to know what order the data is being written to the string in to be able to take advantage of that. This method can work, and as far as strings go is going to be the smallest of the string methods, only being larger than binary files for the same data overall. And the functions you would need to make/find are actually pretty short(like 10 to 15 lines I think), so there isn't any massive coding needed to be done either. Although, you will need to make sure you put the data in the same order as you load it in. That last point applies for the other methods somewhat too, depending on how the data is originally stored.

Binary...this is going to be the most performant way generally, and also the harder way to work. With most of the data we are saving, this method isn't really very good in my opinion, simply because there are no performance issues with the other methods, and the data is usually so small to not matter even it is in text format. If we are going with something much bigger, like a file format for 3d models or similar, then binary could indeed be a much quicker option.

So, I'm hoping people will chime in as far as to what they prefer for different situations and why. I haven't seen any such discussion anywhere. And yes, I'm aware that in many cases, there are going to be multiple ways to get things done and it won't matter in the end what gets used, but like I said, I'm hoping to spark a discussion. Also, feel free to add ideas as well, along with pros/cons of those ideas.
 

FrostyCat

Redemption Seeker
JSON:
+ Built-in support in GML
+ Built-in support for nested data
+ No "taboo" characters; all escape sequences are perfectly defined
+ Excellent interoperability on all exports
+ Excellent interoperability other programming languages and text editors
+ Viewable using browsers and other easily obtainable tools
+ Does not require binary safety
- Verbose, lots of filler characters supporting the format (especially commas and quotes)
- Default form is not source-control friendly (e.g. trailing comma conflicts)

DS strings:
+ Built-in support in GML
+ No "taboo" characters
+ Interoperable between native exports
+ Restores directly to GML-native data structures without further processing
+ Does not require binary safety
- Very verbose
- No built-in nesting
- Not interoperable with the HTML5 export (it comes out as JSON in HTML5, so you can't take a native-made DS string and port that over to HTML5 or vice versa)
- Not interoperable with other programming languages or editors
- Not source-control friendly

Character-separated values
+ Easy to create using text editors or spreadsheet programs
+ Interoperable with other programming languages (most basic case, at least)
+ In case of comma-separated values, built-in support with GMS 2's load_csv()
+ Source-control friendly
+ Usually less verbose than comparable JSON or DS strings
+ Does not require binary safety
- No built-in support for nested structures
- No formal support for escaping "taboo" characters (e.g. can't tell a comma that's a separator from a comma used as content)
- Sometimes inconsistent standards for quoting and escaping "taboo" characters

Binary
+ Built-in support in GML via buffer functions
+ Very compact and efficient form
+ Built-in one-line loading via buffer_load() and buffer_save()
+ Built-in asynchronous loading via buffer_load_async() and buffer_save_async()
+ Interoperable with other programming languages (as long as the same read/write algorithm is implemented on the other side)
- Highly improvised format, everything requires manual design from the ground up
- Nested structures and flex-length data require manual handling
- Needs base64 as a middleman if embedded in non-binary-safe contexts
- Risk of buffer overflows and other low-level data integrity violations
- Not source-control friendly

Notice that I didn't talk about tamper-proofing, user readability or perceived difficulty, while a lot of other people around here would. I think all three of these, in terms of save file formats, are complete non-points.
  • If you want tamper-proofing, you need to add encryption or signatures on top of whatever base format you are using. Everything you have been talking about thus far are encodings (i.e. ensuring structural integrity of data in serialized form), not encryptions (i.e. ensuring non-trivial readability of data).
  • User readability is just a matter of having the right tools, be it a text editor, hex editor, or a copy of GMS 2.
  • The "hard" label that gets liberally applied on virtually every elementary skill around here is subjective, arbitrary, and unjustifiable. It's bad enough when slapped on JSON and binary files, but more generally also on other basic things like trigonometry, linear algebra and common algorithms. Everything can be arguably hard for someone who is sufficiently uneducated in the particular subject.
 
Well, personally, I too like JSON, it gives you a nice balance between readability and flexibility, getting the maps and lists automatically created (whereas most other save systems require you to write an interpreter of some sort). I haven't necessarily heard anything bad about using the ds write functions to turn them into strings, but I also only ever use that function for grids and only then specifically when saving data to disk. There probably is some esoteric flaw there (the in-built GMS functions like game_save, solid, room persistence etc, all often have a reason why they should not be used) but I have not run into any problems yet.

Token Separated Strings, I tend to use those for spreadsheet data (otherwise known as CSV), and using a comma as your token allows you to use the in-built function load_csv (which I discovered after writing my own interpreter a long time ago, lol).

Here:
With most of the data we are saving, this method isn't really very good in my opinion, simply because there are no performance issues with the other methods, and the data is usually so small to not matter even it is in text format.
I think you do not realise, but to some people on this forum those are going to be actual fighting words, lol. There's a lot of passion in a subset of the community for binary. Personally, I've never needed to use binary for performance, but then again, I'm rarely dealing with data flow at rates that require the performance of binary, so the tradeoff of performance vs ease of handling has never swung into binary for me. However, working with 3D, saving relatively large amounts of data continually, sending compressed data across a network, all good uses for binary (and of course, not the only uses). Interested to see the thoughts of other people.
 

kburkhart84

Firehammer Games
FrostyCat, you bring up a few more pros/cons than I specified, very good points. I hadn't thought about actual nesting of data structures(which makes ds_strings fail really quick if you are using them with nested data), and I hadn't thought of what works with source control either.

Notice that I didn't talk about tamper-proofing, user readability or perceived difficulty, while a lot of other people around here would. I think all three of these, in terms of save file formats, are complete non-points.
About those things, tamper-proofing is probably the only one that might give me a pause, but only in the sense of the low hanging fruit. DS_Strings and binary formats fit the "low-hanging fruit" category since they are by nature not human readable. Its another discussion about whether something is hackable or not, and I realize anything and everything can and will be hacked, but with those two there is at the least an automatic first step a hacker has to get around first. User readability....you could easily encrypt JSON to get around that if you really cared. Difficulty....realistically, on a scale between 1 and 10(10 being most difficult), all of these things are between 1 and 2 IMO, as in binary is more difficult for many people since there isn't anything automatic....but compared to other things none of this is difficult.

I think you do not realise, but to some people on this forum those are going to be actual fighting words, lol. There's a lot of passion in a subset of the community for binary.
I personally haven't noticed a massive love for binary here, must of simply lucked out. I'm with you on the realism of that though. Binary has a place, but I don't think its necessary for anything like game saves, configuration, etc... Like we both said, with bigger amounts of data it can start to make more sense of course.
 

FrostyCat

Redemption Seeker
It shouldn't give you any pause if adding the encryption or signature takes you just a line or two more, which is easily doable if you get a ready-made RC4, AES or SHA1-HMAC script library. They're out there for you to easily reuse behind the scenes, on top of any of the formats you cited. That's why I said tamper-proofing is a non-point.
 

kburkhart84

Firehammer Games
Yeah, when I say it gives me pause, I don't really mean so much that it would make me choose one over the other so much, rather that at the least I think it is worth adding the points to the discussion. I see your point though that the addition of encryption is quite trivial.

I'm glad I got at least a couple responses out of this. It actually surprises me that there has not been any discussion on this before(at least that I could find). There is probably a tutorial or something out there that compares these things, but I couldn't find one. But I think this community deserves to have this type of topic about a lot of things, makes it easier for someone who doesn't know how these things work.
 
D

Deleted member 16767

Guest
I use them all in my game. JSON for objects, ds for items, game_save for everything else (it's weak and may not load everything, but it saves some important stuff actually).
 

TheouAegis

Member
There's a lot of passion in a subset of the community for binary.
I'm a binary guy. lol But it also depends on the data. For user settings, I just use INI. I just like the relative compactness of binary and I only work with integers or values savable as integers. Also, my form of encryption with binary is to save it as a GIF or PNG. 😗 You can make binary fun!
 

kburkhart84

Firehammer Games
I'm a binary guy. lol But it also depends on the data. For user settings, I just use INI. I just like the relative compactness of binary and I only work with integers or values savable as integers. Also, my form of encryption with binary is to save it as a GIF or PNG. 😗 You can make binary fun!
It seems we agree on this at least partially. I've never liked INI for some reason, so I typically just use plain text or ds strings for simple configuration/saves kind of things. But binary is handy when there is lots more data of course. I've seen the idea of putting data into images all over the place...never done it myself though.
 
Top