Android Efficient ds_list searching & getting

Mert

Member
Hi. I have a ds_list that contains 680.000 words. I alphabetically listed them (from A > Z)
I also have a list that contains the words used before. Whenever I get a word from the main list, I add this word's position to my second list so that I know that this word was used before.

What I want to do is;
  1. Get a word that starts with the letter "B".
  2. Check if I used this word before.
  3. Confirm that the word can be used, otherwise get back to the loop where I left off.
Currently, I wrote this piece of code.
Code:
//wordList & usedWords are ds_lists

var list_length = ds_list_size(wordList);

for (var i = 0; i < list_length; ++i) {
    var l = wordList[| i];
 
    if (ds_list_find_index(usedWords,i)==-1) //This word was never used before.
    {
    if (string_char_at(l,1)=="B") //Check if it starts with B
    {
        ///Use the Word.
        ///Do the Operation
       ds_list_add(usedWords,i);
       return l;
    }
    }
}
I'm worried about the performance because I'll be serving the game on Mobile & Web platforms. I wonder if you have better solutions for this ?
 

TheouAegis

Member
Well, firstly I'd index your list. Specify the range for all words starting with A, the range of all words with B, etc. That will narrow your search/random down a bit.
 

GMWolf

aka fel666
I agree with @TheouAegis , indexing will greatly speedup your search.
Why not even have one list per letter?

Then, of you use a word, remove it from the list.
If you have one list per letter, you can efficiently remove an element by swapping that element and the last element in the array, and the shrink the array by one.
This reduces the amount of copying.
 

NightFrost

Member
Since your list is in alphabetic order, you can binary search. Start in the middle of the list (320 000). If the found word is earlier in order, divide the step value by two (into 160 000) and add that to position; if the found word was later in order, subtract from position instead. Keep halving and stepping until you find the word. You can combine this with indexing since you'll know the size of each block: if there are 40 000 words starting with 'A' and 30 000 starting with 'B', when you search for 'Banana' you start at 40 000 + 30 000 / 2 = 55 000 and keep halving the 15 000.

Edit: I've never tested, so I don't know if GML can do alphabetic order comparisons simply by doing if(String1 < String2).
 

TheouAegis

Member
Actually, he doesn't need to search the list at all. All he needs to know is if the word picked at random has already been used before. In which case, all he needs to do is just pick a random number from 0 to ds_list_size()-1. If that number is in his list of already used numbers, then he picks a different number. Then he just fetches the word at that chosen index.
 

Mert

Member
Well, firstly I'd index your list. Specify the range for all words starting with A, the range of all words with B, etc. That will narrow your search/random down a bit.
Good call. I'll try that


I agree with @TheouAegis , indexing will greatly speedup your search.
Why not even have one list per letter?

Then, of you use a word, remove it from the list.
If you have one list per letter, you can efficiently remove an element by swapping that element and the last element in the array, and the shrink the array by one.
This reduces the amount of copying.
Hi. I will also have other lists for other languages. It will make things really messier to make A>Z lists for all languages(And some of them have non-latin letters and weird latin ones. German Ü for example). @TheouAegis 's suggestions seem more appealing. Also, the main list must be something like "read-only" as I'll have to clear my second list once the game ends(So that user can use the same words for another round)

Is there any other option for the main list other than using ds_list ? Maybe using buffers would be faster ?

Since your list is in alphabetic order, you can binary search. Start in the middle of the list (320 000). If the found word is earlier in order, divide the step value by two (into 160 000) and add that to position; if the found word was later in order, subtract from position instead. Keep halving and stepping until you find the word. You can combine this with indexing since you'll know the size of each block: if there are 40 000 words starting with 'A' and 30 000 starting with 'B', when you search for 'Banana' you start at 40 000 + 30 000 / 2 = 55 000 and keep halving the 15 000.

Edit: I've never tested, so I don't know if GML can do alphabetic order comparisons simply by doing if(String1 < String2).
Perhaps, noting the order by numbers would be a better choice(The last letter that ends with A is 34000th one etc.). I'll try this too.

Actually, he doesn't need to search the list at all. All he needs to know is if the word picked at random has already been used before. In which case, all he needs to do is just pick a random number from 0 to ds_list_size()-1. If that number is in his list of already used numbers, then he picks a different number. Then he just fetches the word at that chosen index.
But picking the word with the starting letter is important too, that's why I completely skipped the operation in the loop with
if (ds_list_find_index(usedWords,i)==-1) //This word was never used before.
 

GMWolf

aka fel666
Hi. I will also have other lists for other languages. It will make things really messier to make A>Z lists for all languages
How so? just wrap your data structures with scripts to access them neatly.
Also, the main list must be something like "read-only" as I'll have to clear my second list once the game ends(So that user can use the same words for another round)
Just clear and repopulate your lists when you reset.
Is there any other option for the main list other than using ds_list ? Maybe using buffers would be faster ?
Lists or arrays are probably the fastest options. Buffers have a lot of overhead in GM.
 

Ido-f

Member
Also, the main list must be something like "read-only" as I'll have to clear my second list once the game ends(So that user can use the same words for another round)
You can have a main read-only list and a list that starts as a copy of that main list and has each word used deleted from it.
That way you can use the updated list to easily pick unused words and still keep that complete list.
 

GMWolf

aka fel666
One thing to remember is that having duplicate data is not necearily expensive. What is important is to keep 'Hot' memory (that is memory you access a lot) as small as possible. You can have as much cold data as you like as long as you dont touch it too often.

Ho wmuch this applies to GM? well, in GM its hard to make things use up little memory, and memory management is tricky, but things like lists and buffers are generally a good way to keep your structures efficient. (Avoid maps like the plague).

But before you do all that, consider: Is efficiency important, or is maintainability and expandibility more important?
If this code runs every frame, then make if efficient. If the code only happens during an interaction, for example, then you can get away with less efficient code.
 
Last edited:
Top