GMS 2 What is best practice for String compares?

EvanSki

Raccoon Jam Host
Not sure if this qualifies as advanced programming but the way I see it the normal user wouldn't worry about this. Anyways...

I was watching A GDC talk(3:40) and the speaker was talking about how they avoid string to string compares, but didn't go into details
So after some research I've come to the understanding that, A string when used to compare is a "magic number" and that's not good(?)
What people in this situation normally do is have a enum that holds a constant number that is a reference for the actual string and just compare those constants
in other cases some people hash the strings and compare the Hashes

An example of my code currently:
GML:
var text = "Cakes made with bees are not tasty"

switch(text)
{
    case "Cakes made with bees are not tasty": game_end(); break;
   
    case "Everything is cake": game_restart(); break;
   
    default: break;
}
My question is Do I need to care about this or is just comparing strings fine? So far I've never encountered an issue comparing strings.
and if so What is the best way to go about solving this issue?
 

kburkhart84

Firehammer Games
I haven't done any actual test, but I don't think comparing strings are an issue here in GML land. The reason is simply because we already take a certain performance compared to things like raw C++, that it isn't going to really make a difference IMO. If it isn't something you are doing all too often its irrelevant anyway, don't fix something that isn't broken.

That said, maybe you could run some tests, comparing strings vs comparing numbers, and see how much difference there is, if any at all. If you do, please let us know how it goes.
 

EvanSki

Raccoon Jam Host
I haven't done any actual test, but I don't think comparing strings are an issue here in GML land. The reason is simply because we already take a certain performance compared to things like raw C++, that it isn't going to really make a difference IMO. If it isn't something you are doing all too often its irrelevant anyway, don't fix something that isn't broken.

That said, maybe you could run some tests, comparing strings vs comparing numbers, and see how much difference there is, if any at all. If you do, please let us know how it goes.
Well right now that's all my current project does is a bunch of string compares. If where talking about performance hits, I haven't done any tests but perception-ally I haven't noticed any significant difference between a integer compare vs a string compare
 
D

Deleted member 45063

Guest
Just thought I'd drop by and leave my two cents on the subject.

Regarding performance: a string can be arbitrarily long so in general it can be assumed that testing for its equality requires comparing a larger set of data than a simple number comparison (which is often just a few bytes in length). Therefore one can assume that comparing a string will generally be slower than comparing a number (although in cases where both strings have different lengths it might just boil down to a number comparison depending on how the strings are implemented). That said, this is probably only relevant if your either do a lot of string-based comparisons in tight loops and/or you compare extremely long strings. Not relevant for GML (I believe) but there is an extra caveat when comparing strings: the semantics of the comparison. If you are comparing two strings with different character encodings then both can refer to the same string of text but have extremely different memory representations. In this case a simple byte by byte comparison might not suffice so you need to add the extra character conversion effort to the overall effort of the string comparison.

Personal preference: I avoid logic based on strings unless I have to. This for a few reasons (the performance points noted above being one of them). Another reason is that "stringly typed" data is highly prone to errors. If you have (for example) a string dictating what operation happens at a given point in time then a simple typo can completely lead to different outcomes or errors (see spoiler below for an actual life story about this). Another reason is that strings aren't auto-complete friendly since they are highly contextual, so you always need to know exactly what string to use for comparison which can overall slow down the development process if a lot of your logic is based on string comparisons. A final reason is memory usage. If I can base all of my logic off of simple number comparisons then I can fine tune how much memory those numbers use (although in GM you'd probably always deal with doubles) whereas if I base my comparisons on strings then each comparison uses a different amount of memory. Note that this last point doesn't necessarily mean runtime memory usage as it can also be translated into overall executable size since the strings used for the comparison would need to be encoded in the final executable.

I used to work with an insurance company core system that was built on top of JVM-based technologies. This system was quite big and was distributed with a lot of base functionality, the purpose of it being extending it with the proprietary business processes for the particular insurance company or configuring it correctly. It also used a lot of custom tools, editors and a proprietary programming language (built on Java). Due to the whole code generation magic, extended feature set of the custom programming language and overall age of the (very big) codebase, the core system relied on a lot of string-based comparisons at times.

Now, we are talking about a very big and global insurance core system, so it obviously was developed by people of different nationalities. It just so happens that a specific source file that used some string-based comparison was developed by a person with a Cyrillic keyboard and the associated functionality obviously worked. However, when the file was extended with some custom logic for the insurance company I worked in it didn't work as expected. I can't remember which exact string caused the problem, but assume the comparison performed was between "Some Variable" and "Some Vаriаble".

Now in case you couldn't spot the difference in the previous code samples, one of them uses the Latin A while the other one uses the Cyrillic A. Chances are that both of these will be rendered identically or at least in an almost undistinguishable way (which was the case in the IDE we were using) while they have widely different Unicode representations. Now imagine the amount of debugging effort that went into this topic, at its core because of a string-based comparison piece of logic that could have just used simpler comparisons to begin with.
 

EvanSki

Raccoon Jam Host
Just thought I'd drop by and leave my two cents on the subject.

Regarding performance: a string can be arbitrarily long so in general it can be assumed that testing for its equality requires comparing a larger set of data than a simple number comparison (which is often just a few bytes in length). Therefore one can assume that comparing a string will generally be slower than comparing a number (although in cases where both strings have different lengths it might just boil down to a number comparison depending on how the strings are implemented). That said, this is probably only relevant if your either do a lot of string-based comparisons in tight loops and/or you compare extremely long strings. Not relevant for GML (I believe) but there is an extra caveat when comparing strings: the semantics of the comparison. If you are comparing two strings with different character encodings then both can refer to the same string of text but have extremely different memory representations. In this case a simple byte by byte comparison might not suffice so you need to add the extra character conversion effort to the overall effort of the string comparison.

Personal preference: I avoid logic based on strings unless I have to. This for a few reasons (the performance points noted above being one of them). Another reason is that "stringly typed" data is highly prone to errors. If you have (for example) a string dictating what operation happens at a given point in time then a simple typo can completely lead to different outcomes or errors (see spoiler below for an actual life story about this). Another reason is that strings aren't auto-complete friendly since they are highly contextual, so you always need to know exactly what string to use for comparison which can overall slow down the development process if a lot of your logic is based on string comparisons. A final reason is memory usage. If I can base all of my logic off of simple number comparisons then I can fine tune how much memory those numbers use (although in GM you'd probably always deal with doubles) whereas if I base my comparisons on strings then each comparison uses a different amount of memory. Note that this last point doesn't necessarily mean runtime memory usage as it can also be translated into overall executable size since the strings used for the comparison would need to be encoded in the final executable.

I used to work with an insurance company core system that was built on top of JVM-based technologies. This system was quite big and was distributed with a lot of base functionality, the purpose of it being extending it with the proprietary business processes for the particular insurance company or configuring it correctly. It also used a lot of custom tools, editors and a proprietary programming language (built on Java). Due to the whole code generation magic, extended feature set of the custom programming language and overall age of the (very big) codebase, the core system relied on a lot of string-based comparisons at times.

Now, we are talking about a very big and global insurance core system, so it obviously was developed by people of different nationalities. It just so happens that a specific source file that used some string-based comparison was developed by a person with a Cyrillic keyboard and the associated functionality obviously worked. However, when the file was extended with some custom logic for the insurance company I worked in it didn't work as expected. I can't remember which exact string caused the problem, but assume the comparison performed was between "Some Variable" and "Some Vаriаble".

Now in case you couldn't spot the difference in the previous code samples, one of them uses the Latin A while the other one uses the Cyrillic A. Chances are that both of these will be rendered identically or at least in an almost undistinguishable way (which was the case in the IDE we were using) while they have widely different Unicode representations. Now imagine the amount of debugging effort that went into this topic, at its core because of a string-based comparison piece of logic that could have just used simpler comparisons to begin with.
Thats really useful info thanks, The way I use my strings I have error catches for typos and Custom code for auto completion of the required strings so in this case those two are not big issues, how ever the memory part I should look into and pin down, but at the same time all of the code that handles these strings are made for a debug tool that I consider temporary for a project, Still I should look into the memory usage of all this

as for your story, I too have ran into some Unicode problems along the way, I managed to do away with worrying about most of it, but it will most likely be an issue later on down the line, none the less a very interesting story, thank you for sharing
 
Top