[Solved-but-Slow] txt search and replace-type function

W

whale_cancer

Guest
Hello!

Working on a utility for seeking and replacing graphics in .NES ROM files. These (the ROMs that is) are just .bin files for my purposes.

What I have done so far:
1. take a strip of sprites, turn them in to a .txt file in a more manageable file format. In my case, this is a .txt file that just contains a serious of 8 bytes on each line, with 2 lines forming a single tile image
2. do the above but with a second file.
3. do the above, but with the original ROM files graphics.

This all works. I can convert the files into my .txt format and output them as graphics again. My problem is I want to take the text of the first strip (spr_marioStrip.txt) find all lines in the original ROM files graphics (bin.txt) that match that first strip and replace them with the values of the second strip (spr_warioStrip.txt).

However, I find no matches. If I manually look at the .txt files, I can see that it should be finding values spr_marioStrip.txt in bin.txt, but my code fails to do so.

For example, here are the first two lines of spr_marioStrip.txt:
Code:
0000001100001111000111110000101000111010010000000100111100111111
0000001100001100000100000000111100111111011111110111111100110000
Here are lines 10689 and 10690 in bin.txt
Code:
0000001100001111000111110000101000111010010000000100111100111111
0000001100001100000100000000111100111111011111110111111100110000
You can see they are the same, but my code doesn't find this. Here is my code:

My code:
Code:
for (var j = 0; j < 64; j += 1)
{
   mario_binRowOne = file_text_readln(mario_file);
   mario_binRowTwo = file_text_readln(mario_file);

   wario_binRowOne = file_text_readln(wario_file);
   wario_binRowTwo = file_text_readln(wario_file);
 
   show_debug_message("Searching through strip files at line "+string(j*2));

   for (var i = 0; i < 8192; i += 1)
   {


       bin_binRowOne = file_text_readln(bin_file);
       bin_binRowTwo = file_text_readln(bin_file);

       if bin_binRowOne == mario_binRowOne
       && bin_binRowTwo == mario_binRowTwo
       {
           bin_binRowOne = wario_binRowOne;
           bin_binRowTwo = wario_binRowTwo;
           show_debug_message("Match found!");
       }
       file_text_write_string(binNew_file, bin_binRowOne)
       file_text_write_string(binNew_file, bin_binRowTwo)
   }
}
Any ideas on what could be going wrong?
 
Last edited:
H

Homunculus

Guest
I don't get it. Why are you talking about txt files when you are clearly using binary data? It makes no sense. A binary file is not a txt file with a series of 0 and 1, it's not even close. You can not replace bits by using a string replace function.
 
W

whale_cancer

Guest
I don't get it. Why are you talking about txt files when you are clearly using binary data? It makes no sense. A binary file is not a txt file with a series of 0 and 1, it's not even close. You can not replace bits by using a string replace function.
I am literally converting data from a bin into txt files as I indicate in my post. Not sure how that doesn't make sense.

I do this because the data I get out requires me to perform a comparison I find much easier to do with strings.

Edit: Put another way, I have .txt files that contain binary data I intend to later convert into actual .bin files for re-insertion into a rom.

My .txt files have 8 bytes on each line, with two lines being paired to produce one tile. This is how the NES stores tile data.
 
Last edited:
H

Homunculus

Guest
Ok, I see what you are doing. I'm not sure you are using file_text_readln correctly though. I generally use file_text_read_string(file); followed by file_text_readln(file); to read lines. Can't remember if this is enforced or you not, it's worth checking out.

EDIT: If I remember correctly, file_text_readln also returns the cr / lf of the line. Try using the method described above and see if it works
 
Last edited by a moderator:
W

whale_cancer

Guest
Ok, I see what you are doing. I'm not sure you are using file_text_readln correctly though. I generally use file_text_read_string(file); followed by file_text_readln(file); to read lines. Can't remember if this is enforced or you not, it's worth checking out.

EDIT: If I remember correctly, file_text_readln also returns the cr / lf of the line. Try using the method described above and see if it works
I made the change you suggested and I get the same result. That false positive is actually not false; I wasn't accounting for the fact I was parsing two lines at a time. I can find the very first two lines in spr_marioStrip.txt in bin.txt.

Edit: There should be multiple matches per pair of lines/8-byte values. I know this from manually editing the tiles in the very recent past in preparation for writing this utility.

Edit2: I can also find the next pair of lines from spr_marioStrip.txt manually in bin.txt (they are only a few lines down, as expected)

Code:
-----
Match found at line 10689!
Mario
0000001100001111000111110000101000111010010000000100111100111111
0000001100001100000100000000111100111111011111110111111100110000
bin
0000001100001111000111110000101000111010010000000100111100111111
0000001100001100000100000000111100111111011111110111111100110000
-----
Here is the updated code:

Code:
/// @description Zultimate

//replace mario with wario

mario_file = file_text_open_read("spr_marioStrip.txt")
wario_file = file_text_open_read("spr_warioStrip.txt")
bin_file = file_text_open_read("bin.txt")
binNew_file = file_text_open_write("binNew.txt")

for (var j = 0; j < 64; j += 1)
{
    mario_binRowOne = file_text_read_string(mario_file);
    file_text_readln(mario_file);
    mario_binRowTwo = file_text_read_string(mario_file);
    file_text_readln(mario_file);
    show_debug_message("Searching bin_file for... ")
    show_debug_message(mario_binRowOne)
    show_debug_message(mario_binRowTwo)

    wario_binRowOne = file_text_read_string(wario_file);
    file_text_readln(wario_file);
    wario_binRowTwo = file_text_read_string(wario_file);
    file_text_readln(wario_file);
 
    show_debug_message("Searching through strip files at line "+string((j*2)+1));
    for (var i = 0; i < 8192; i += 1)
    {
        bin_binRowOne = file_text_read_string(bin_file);
        file_text_readln(bin_file);
        bin_binRowTwo = file_text_read_string(bin_file);
        file_text_readln(bin_file);

        if bin_binRowOne == mario_binRowOne
        && bin_binRowTwo == mario_binRowTwo
        {
            show_debug_message("-----")
            show_debug_message("Match found at line "+string((i*2)+1)+"!")
            show_debug_message("Mario")
            show_debug_message(mario_binRowOne)
            show_debug_message(mario_binRowTwo)
            show_debug_message("bin")
            show_debug_message(bin_binRowOne)
            show_debug_message(bin_binRowTwo)
            show_debug_message("-----")
            bin_binRowOne = wario_binRowOne;
            bin_binRowTwo = wario_binRowTwo;
        }
        file_text_write_string(binNew_file, bin_binRowOne)
        file_text_writeln(binNew_file)
        file_text_write_string(binNew_file, bin_binRowTwo)
        file_text_writeln(binNew_file)
    }
}

file_text_close(mario_file)
file_text_close(wario_file)
file_text_close(bin_file)
file_text_close(binNew_file)
show_message("Done!");
 
Last edited:
H

Homunculus

Guest
So does it work now? The problem I can see with this approach is that you are not considering the fact that the same binary sequence of your sprite may be present multiple times in the data, representing something that's not related at all to the sprite itself. Unless of course you are already reading only the parts of the rom containing images.

That's also why I was questioning the approach in my first message. I don't know the structure of a nes rom file, but there has to be some kind of pattern or index that lets you identify exactly where the sprite data is stored in the raw data.

Also, isn't it faster and cleaner to just compare the binary data instead of converting everything to txt? I agree that it's easier to debug this way though.
 
W

whale_cancer

Guest
The problem I can see with this approach is that you are not considering the fact that the same binary sequence of your sprite may be present multiple times in the data representing something that's not related at all to the sprite itself. Unless of course you are just reading the parts of the rom containing images.
I am only reading the tile data. As the pairs of lines each represent an entire tile, I don't think it could possibly match with the game data even if I included it (since such data just looks like static when viewed with a tile viewer). Multiple replacements per tile are expected, at least in the smb3 ROM.

That's also why I was questioning the approach in my first message. I don't know the structure of a nes rom file, but there has to be some kind of pattern or index that lets you identify exactly where the sprite data is stored in the raw data.
I am comfortable with the format and as I indicated I can read the date from the .NES file, turn it into my format, and then read it from my format producing the exact same image as a sprite. That sprite just looks like a dump of all the tile data in the ROM (since that is what it is).

Also, isn't it faster and cleaner to just compare the binary data instead of converting everything to txt? I agree that it's easier to debug this way though.
I am actually not super comfortable working with binary data in GM, but it probably would be for someone experienced in that area. My main concern was I need to compare the first bit of the 8 byte sequence to the first bit of the second 8 byte sequence and then iterate through the entire thing making these comparisons (this is how color data is stored in the .NES format). I knew how to do this easily with strings, but not with the byte data directly.

So does it work now?
It works exactly the same asbefore (save for the additional debugging output I added). It can find the very first pair of lines from spr_marioStrip.txt in bin.txt, but after that it craps out and finds 0 additional matches. I am able to find additional matches by looking at the .txt files directly, so I don't think there is any problem with the datasets (especially as I can convert both the original bin data and my reformatted .txt data into the same images).

Edit: Dump of my debugging log: https://pastebin.com/CLPg95U5
 
Last edited:
H

Homunculus

Guest
Is it possible for the first row of the sprite you are looking for to start at an even line number? Because you read two lines at a time, but what if line 2 is actually the start of the sprite binary sequence? I don't think this case will be considered a match in your setup
 
I am actually not super comfortable working with binary data in GM, but it probably would be for someone experienced in that area. My main concern was I need to compare the first bit of the 8 byte sequence to the first bit of the second 8 byte sequence and then iterate through the entire thing making these comparisons (this is how color data is stored in the .NES format). I knew how to do this easily with strings, but not with the byte data directly.
I really, really recommend learning how binary arithmetic works before doing this. What you're trying to do is far simpler when you don't introduce multiple failure points via conversion to and from strings.
 
W

whale_cancer

Guest
Is it possible for the first row of the sprite you are looking for to start at an even line number? Because you read two lines at a time, but what if line 2 is actually the start of the sprite binary sequence? I don't think this case will be considered a match in your setup
I considered this when manually checking for matches. Also, the .NES format would break if this were the case.

That being said, I am now trying a very slow method where I convert the text files to arrays and then compare every pair of lines rather than just those that should be tiles (e.g. try to find lines 1& 2 in bin.txt, try to find lines 2 & 3 in bin. txt, try to find line 3 & 4 in bin.txt).

Edit: Woah. Two matches. Still chugging along... The second match was at line 3 & 4, which suggests something was getting loaded incorrectly at some point in my original code? As in, the line numbers weren't corresponding to what my program thought the line numbers were?

I really, really recommend learning how binary arithmetic works before doing this. What you're trying to do is far simpler when you don't introduce multiple failure points via conversion to and from strings.
I am pretty confident there is no failure, given I can convert both the binary data from the ROM itself and my own txt format into the same graphical output.
 
H

Homunculus

Guest
Can you tell is what tuose 64 and 8192 represent? I assume you are iterating over sprites in the mario strip (for a total of 64*2 lines) and looking for a match for each of those in the strip data of the nes file, that is 8192*2 lines long.

If that’s the case, you are supposed to iterate over the strip data only once, since at the end of iteration j == 0 you are already at the end of the file.
 
W

whale_cancer

Guest
Can you tell is what tuose 64 and 8192 represent? I assume you are iterating over sprites in the mario strip (for a total of 64*2 lines) and looking for a match for each of those in the strip data of the nes file, that is 8192*2 lines long.

If that’s the case, you are supposed to iterate over the strip data only once, since at the end of iteration j == 0 you are already at the end of the file.
Your assumptions are correct.

Switched over to loading everything into arrays and comparing array indices and that works. This is quite slow however.

Edit: Whoops and I just noticed I didn't switch a 64 into a 128... 127? there goes all the time of it running...
Code:
/// @description Zultimate

//replace mario with wario

mario_file = file_text_open_read("spr_marioStrip.txt")
wario_file = file_text_open_read("spr_warioStrip.txt")
bin_file = file_text_open_read("bin.txt")
binNew_file = file_text_open_write("binNew.txt")

//go through each line of the .txt that contains those tiles we want to replace (in this case spr_marioStrip.txt)
//actual file is 128 lines long, but we are going by line pairs
    for (var b = 1; b < 128; b += 1)
    {
    mario_binRow[b] = file_text_read_string(mario_file);
    file_text_readln(mario_file);      
    wario_binRow[b] = file_text_read_string(wario_file);
    file_text_readln(wario_file);  
    }
    for (var a = 1; a < 16384; a += 1)
    {
        bin_row[a] = file_text_read_string(bin_file);
        file_text_readln(bin_file);
    }  
for (var j = 1; j < 63; j += 2)
{
                show_debug_message("########################")
    show_debug_message("Searching through strip files at line "+string(j)+" and "+string(j+1));  
    show_debug_message("Searching bin_file for... ")
    show_debug_message(mario_binRow[j])
    show_debug_message(mario_binRow[j+1])

   
    //go through all 16384 lines and see if we find the pair we are looking for
    //i is only 8192 which is half of 16384 since we are looking at pairs of lines

   
    for (var i = 1; i < 16383; i += 2)
    {
        if (bin_row[i] == mario_binRow[j])
        && (bin_row[i+1] == mario_binRow[j+1])
        {
            show_debug_message("-----")
            show_debug_message("Match found at line "+string(i)+"!")
            show_debug_message("Mario")
            show_debug_message(mario_binRow[j])
            show_debug_message(mario_binRow[j+1])
            show_debug_message("bin")
            show_debug_message(bin_row[i])
            show_debug_message(bin_row[i+1])
            show_debug_message("-----")
            bin_row[i] = wario_binRow[j];
            bin_row[i+1] = wario_binRow[j+1];
        }
        file_text_write_string(binNew_file, bin_row[i])
        file_text_writeln(binNew_file)
        file_text_write_string(binNew_file, bin_row[i+1])
        file_text_writeln(binNew_file)
    }
    show_debug_message("########################")
}

file_text_close(mario_file)
file_text_close(wario_file)
file_text_close(bin_file)
file_text_close(binNew_file)
show_message("Done!");
So slow, I haven't even gone through it once yet. Here is my debug log as of this moment... https://pastebin.com/hC8TB1KD
 
H

Homunculus

Guest
As said before, if the file is 16384 lines long, and you are supposed to go through the file every iteration of j, you are doing something wrong, since after reading the data the first time, you never “rewind” it. didn’t look at the array solution though
 
W

whale_cancer

Guest
As said before, if the file is 16384 lines long, and you are supposed to go through the file every iteration of j, you are doing something wrong, since after reading the data the first time, you never “rewind” it. didn’t look at the array solution though
It "rewinds" because it is in a for loop that looks at each pair of lines of spr_marioStrip.txt (i.e. controlled by "j")
 
H

Homunculus

Guest
Not talking about the “j” loop, but the “i” loop. At the first iteration of “j”, you read ALL the data in the “i” loop. Next “j” iteration, there is no more data in “i” to read. You are already at the end. Resetting “i” has no effect on the file.
 
W

whale_cancer

Guest
Not talking about the “j” loop, but the “i” loop. At the first iteration of “j”, you read ALL the data in the “i” loop. Next “j” iteration, there is no more data in “i” to read. You are already at the end. Resetting “i” has no effect on the file.
OHHHH, I see what you mean! So it was just me doing something dumb with file handling. Makes sense why the array solution works, since I don't need to go back to the beginning of the file at all.

Argghh, thanks!
 
Top