Getting a Line and Parsing a .CSV File without For loops or DS

GarbageHaus

Member
I've been searching the forums/help guide/googling this to no avail.

I have a .CSV file, it's a no frills .CSV file with the first column being a unique identifier (1 through 9), the second column being a list of names, and the third column being a color. I suppose it doesn't really matter that it is a .CSV, it could be any form of database-like plain text file. However, GMS seems incredibly obtuse when it comes to file handling.

I want to have have a random number be pulled, then use that random number to select an entry/line in my .CSV file. Then later use the name and color for other parts of the game.

Everywhere I look seems to suggest these workarounds:
  • Creates double-nested "for" loops (WHY!?!)
  • Uses the ds map/grids (At that point, why bother having any file at all?)
  • Specifies a command that is depreciated in GMS2.3+
I'm trying to accomplish this without using a "for" loop. I have my own reasons for avoiding "for" loops so I wont' discuss that aspect here. But it seems like the handling of an external file in this way is not feasible.

Is it really just impossible to do this normally? Should I just hard-code the .CSV data as just 1 big array/object and do it this way?
 

Gradius

Member
That is normal. The only way you're pulling something out of a comma delimited list is by iterating through it. Which involves a loop. The best another language is gonna get you is 'foreach'. Anything else is just hiding the loop from view, but there's just no other practical way to extract a specific line. If you really want to avoid iterating, then you'd be better using an Ini file or Json, which GM has decent native handling for (though you better believe it's just looping through each value behind the scenes).
 

Roldy

Member
Unless your lines are a fixed width then you will have to loop through and find the end of each line. If you lines are fixed width then you will be able to advance the read position and randomly access the format.

Alternatively you may extend your file format and place an index at the beginning of the file, that stores the starting locations of each line. Then once you read the index you know where each line starts in the file and can immediately jump to the location.

However, unless your data is very large or specifically needs to be user editable or external tool editable, then there isn't really a reason not too 'hard code' it in a script file into a data/array object.
 

rytan451

Member
You should be able to use file_text_readln(), alongside a while loop and file_text_eof(). Then, you can parse the comma-separated values to extract the number in the first column. Finally, if the value in the first column matches your unique ID, then you can parse the rest of the row and return the data.

Of course, for loops are extremely useful when you're parsing the CSV row. At least there, unlike with files, it's easy to get a length of the line.

As for why you would use a map or grid data structure when you have a file, the answer is probably persistence and accessibility. I... ended up writing a lot about it, but it isn't really pertinent to the main thrust of the question, so I've put it in a spoiler.

It's really easy to access data in data structures; that's what data structures are for. However, RAM is limited, and so if your application closes, the RAM it uses is freed for other uses, and the data stored there is lost. Thus, if you want to have data persist between runs of the game, then you need to store in persistent memory (usually referred to as the "hard drive", even if it's an SSD). Also, at least for now, most GMS games are limited to 2 GiB of RAM; this can be easily eaten up by long sounds, large images, or huge chunks of data. If you want to use more data, then you need to store the extra data on the hard drive, in a file, and only load it into RAM as needed.

Thus, if you have a big chunk of data you don't want to have loaded into your limited RAM, or some data that needs to be accessible between launches of an application, you'd put the data into a file (or multiple files).

But it isn't easy to access data in files; this is true for most languages, and I think that's in part due to how hard drives work (though I'm not entirely certain). It's hard to read data in files, and it's hard to find data in files. In fact, the way things work, it's usually easier to load a big chunk of the file into RAM, and then work on the file there. If you think of files as big chunks of data, however, it's still very limited. It's just a long string of bytes. This isn't easy to process or understand, which is intentional so that the hard drive can store all sorts of data. So, oftentimes, the data is formatted in a manner that's much easier to process and use. The data is changed from an unstructured string of bytes to some structured data (giving rise to the term "Data Structure", which is where the ds_ prefix comes from. That is why data in files is often loaded into data structures: to make the data easier to use and process.

This is the answer to your question, "At that point, why bother having any file at all?" Files are really good at storing lots of data that is persistent even when the application is closed. Data structures are really good at storing smaller amounts of data and quickly and easily returning useful results, but most are bad at storing loads of data, and the data structures in GMS are not persistent when the application is closed.

I actually agree that for loops are usually not the best tool for going through files, mostly because of the highly flexible nature of files. If you don't want to use for loops at all, however, I'll have to strongly question why. Of course, it's entirely possible to use for loops to go through files; here's an example how:

GML:
/// @desc Returns the line in a CSV with the first column being the id, where the id matches the given id
function getLineById(filename, _id) {
  var strId = string(_id);
  if (string_length(strId) == 1) {
    strId += ","
  }
  // Assuming that the ids are between -9 and 99 inclusive, or are strings of length either 1 or 2
  for (var f = file_text_open_read(filename), str = file_text_readln(f); !file_text_eof(f); str = file_text_readln(f)) {
    if (string_copy(str, 1, 2) == strId) return str;
  }
  return undefined;
}
 

GarbageHaus

Member
Thanks for all your replies. I'm not against loops as a whole, and thank you especially for the eof method when in a While loop.

@rytan451 I actually found your spoiler to be quite informative. I have some limited experience in Python and JavaScript, so those sorts of patient explanations help a lot. I didn't know the RAM limit was only 2gigs, still this will be large enough for any project I have in the immediate future.

As for why For Loops are traumatic to me...
They're basically nonsense from a Prodecural Programming perspective. I get that almost all programming languages are Object Orientated and so this form of loop is exceptionally common, but I still feel it really should be avoided. The reason has to do with the structure and arguments within the for loop. Each one operates on an entirely different principle. The last argument is run over and over, but the first argument is only run once. Consider: for (var i = 0; i < 5; i++) I know that this loops 5 times. But if you look at each argument individually, I would expect that the i = 0 gets executed on every iteration, thus making an infinite loop. This is obviously not the case, but you can see why this would be confusing. In addition, I wouldn't define a variable within an argument, what if I need the value of "i" later? The second argument is a boolean condition, this isn't executed but evaluated. Lastly the i++ also follows it's own rules, executing upon each loop. Plus, I tend to prefer x = x +1 as syntax.

Not saying that no function should ever have multiple required arguments. I'm just saying that arguments should be consistent within the same function and not follow 3 vastly different "rules". Something like draw_text(x,y,string) is certainly more straightforward as you can tell what is being drawn and where. It's a "Do this" not a "Do this thing only once and then do another thing any number of times depending on this other thing between them"

Compare this with a While, When, If/Goto or any other loop method I may have missed. There's no guesswork as to what the arguments are doing and in what order. It is much easier to read and to understand with the added benefit of being consistent. I understand that For will be more compact, but unless my code that is within the loop is so absurdly large that I won't be able to find the terminating condition for the loop, I just can't recommend this method. Especially when it isn't clear what argument does what and why. GMS2 doesn't show helper text when trying to set up a "For" loop.

Maybe all this can be chalked up to the fact that I was trained on computer programming using QBASIC, so order of operations have been hammered into me because the code was executed line by line.


I've found GMS2 to be exceptionally difficult doing the really basic stuff, so any way I can avoid more levels of confusion is always helpful.
 

rytan451

Member
Yeah, I can see how for loops are counterintuitive in that light. To be fair, it isn't GMS's fault; the canonical for loop came from C. Feel free to ask if you're confused about anything you consider "really basic".

Here's an isomorphism between for loops and while loops:

GML:
for (START; STOP; STEP) {
  do_something();
}

// Is equivalent to:

START;
while (STOP) {
  do_something();
  STEP;
}
For example:

GML:
// START => var i = 0
//  STOP => i < 100
//  STEP => i += 1
for (var i = 0; i < 100; i += 1) {
  array[i] = 0;
}

// Is equivalent to:

var i = 0;
while (i < 100) {
  array[i] = 0;
  i += 1;
}
There's a big advantage and a small advantage of for loops over while loops. The first advantage is this: your start, end, and step are defined at the stop of the loop. This prevents problems for when you use a while loop and accidentally forget the step part (since it's at the end). With for loops, all three are at the top of the loop, so you can tell at a glance if the step is missing.

The small advantage is that the iterator (i in my example) is not defined outside of the loop, which prevents accidental problems down the line.

(There's a semantic difference between them, too. For loops are always finite; either they have a fixed length — i < array_length(array) — or they stop at an ending sentinel — !file_text_eof(f)).
 

Nidoking

Member
For loops are always finite; either they have a fixed length — i < array_length(array) — or they stop at an ending sentinel — !file_text_eof(f)).
Well-formed for loops, sure, but C-style languages let you leave out any of the clauses, and nothing says that any of the defined clauses have to lead to termination. Here are a couple of trivial examples, with all three clauses defined because I don't want to check to see whether GML requires them all or not at the moment:

GML:
for (var i = 0; true; i++) // will probably eventually throw an exception when i overflows, at least

for (var i = 0; i < 5; i--) // same thing with underflow

for (var i = 0; i < 5; j++) // same as long as j is defined

for (var i = 0; i < 5; i = i) // truly infinite
Some programmers use for ( ; ; ) instead of while (true) and I don't know why. The point is that there's going to be a break in there somewhere, or the loop manipulates the index variable internally to force termination. The issues described above, though, are something akin to the problems people have with the ?: operator. Okay, you don't have to like it, but it does great stuff when used properly and makes certain things clearer to the people who are comfortable with them. And for loops are permitted in every coding standard I've ever seen.
 

Paskaler

Member
Not saying that no function should ever have multiple required arguments. I'm just saying that arguments should be consistent within the same function and not follow 3 vastly different "rules". Something like draw_text(x,y,string) is certainly more straightforward as you can tell what is being drawn and where. It's a "Do this" not a "Do this thing only once and then do another thing any number of times depending on this other thing between them"
From reading this, you definitely should use a for loop to parse through a file, if only for the purpose of learning about loops more and practice. You're mixing functions and language statements here. While both begin with a keyword and then are followed by a part in parenthesis, they have nothing else in common(in terms of what they are actually doing).
 
Top