1. Hello Guest! It's with a heavy heart that we must announce the removal of the Legacy GMC Archive. If you wish to save anything from it, now's the time! Please see this topic for more information.
    Dismiss Notice

ds_map_exists() question

Discussion in 'Programming' started by csanyk, Dec 19, 2016.

Tags:
  1. Mike

    Mike nobody important GMC Elder

    Joined:
    Apr 12, 2016
    Posts:
    2,412
    Try yourself.... create an array.... do a loop, look up a load of the, Now use a ds_map, look up a load of them.
    ds_maps are MUCH slower than simple arrays because it has to do LOADS of work to find each element, where arrays have the index at once.

    Now if you start doing this, then things like grids and lists where you do sit in loops and process stuff will be adversely affected. I use grids a lot not just for maps etc but for data structures as they are slightly faster than arrays. if this changes, all this code will slow down hugely. My C64 emulator which grids it massively would be a third of the speed. every lookup hit by this - why? Because you can't clear a variable?

    On the subject of this.... If your passing around list handles and don't check that it's free'd then your still asking for bugs, unless you're going to check it exists every time you use it of course, which is just a waste of CPU time.

    And this isn't about being stuck in the past, it's about not wasting CPU time on stupid things that will cripple performance in a lot of cases. CPU performance with data structures is the ONLY reason I don't want to change them, it's ALL about performance and how much it would hurt.

    What a load of tosh. I've hired a lot of them, most of the ones we interview barely know how many bits are in an INT, or even what a "bit" is - I kid you not. There's also a reason I've been in this industry for 27 years, so try not be quite so insulting.


    Actually...... it IS technically possible that we could (now) turn handles into 64bit numbers (rather than a 32bit INT), as the runner now has proper support for them (I think), and we could store the index in the lower 32bits, and a unique number in the upper 32bits. Then it would just be an AND to get the index back. I'm not sure INT64's are complete throughout the VM yet, but if they are, this might work. let me file this as a suggestion, and in the new year we'll discuss it, because it would solve the problem without crippling everything - but it is totally dependant on the VM internals.
     
    csanyk, renex and fxokz like this.
  2. icuurd12b42

    icuurd12b42 TMC Founder GMC Elder

    Joined:
    Apr 22, 2016
    Posts:
    1,840
    I'm going to agree with mike on the "a new graduated engineer can think out the box and improve things..." quote...

    It is my experience as well that new graduates don't have enough experience to think outside the box. they think they do because they have not seen the walls of their own box yet... These new concepts your learned in college, it's old fogies like us who invented those.
     
    Last edited: Dec 22, 2016
  3. HammerOn

    HammerOn Member

    Joined:
    Jun 22, 2016
    Posts:
    92
    It makes perfect sense.
    We allocate memory, use it, delete and set the variable to a null value. This is common practice (actually a need) even in dynamic typed languages or when the "delete" piece is automatic handled by a garbage collector.
    The only cases where you don't set it to a null value is when the allocated memory lasts the lifetime of the object that holds it or when you immediately set other handle to it.
    Also, we can't query the actual value of the handles in most languages and there is a chance that a new allocation would be done on top of a freed one so it makes sense that a VM can reuse them.
    If all your code follow this concept, ds_exists is in fact not necessary since you can do "ds == undefined" which is how you would do in other languages.

    Now that we have an undefined value, the manual could add this to the example codes and helps make it a common practice in GML too.
    Ds handles could be special types like arrays to avoid further misconceptions. But I feel like GML development is too much restricted by "backward compatibility" and C-like handles for this...
     
    Last edited: Dec 22, 2016
  4. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    In fact, according to my tests, using a hashmap with quadratic probing gives very good performance with unique identifiers. Yes, not quite as fast as using a simple table, but the advantages of using unique IDs far outweigh it IMO.

    It can even lead to better performance, if you use nested data structures. When usung a list of lists, if you delete a list, as it stands, you have to iterate through the whole array to set the entries to undefined if the equal the list being destroyed. That becomes O(N) performance as opposed to O(1) performance if using unique IDs with a hashmap.
     
  5. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    Well, yeah. Number of bits in an INT is system dependant....
    But if they can't tell you that, you probably haven't been looking at graduates from respectable universities ^^
     
  6. Timothy

    Timothy Member

    Joined:
    Aug 7, 2016
    Posts:
    64
    Lol, you can learn more online than you ever can at a school... Any school. Some say you can't trust online sources (which is true for some) but truthfully you can't trust instructors either. All voice opinion over fact.

    As for the topic. I recently had a bug because of this and it stumped for a while but lesson was learned. Though it would be nice to have unique identifiers, I don't think it's completely necessary if it isn't able to be implemented in GMS current state. (I would much rather see simple c struct like data containers, I know, off topic)
     
  7. Llama_Code

    Llama_Code Member

    Joined:
    Jun 29, 2016
    Posts:
    267
    Actually, I have seem graduates, even from respectful universities, that are taught less than I was taught 20 years ago, simply because they take for granted what they have available. Things like bit shifting are lost on these people because they are taught to just do the arithmetic since computers have enough power now so it doesent matter.

    In most cases, that probably true, but I think they should still know this stuff because they are likely to encounter it (which is how we found our 4 month out of school new hire didn't understand it) and in some cases it can still be needed depending on where you code is being used.

    You can bet they think they know it all, but most are taught in very controlled circumstances and the code they are working with is the same code everyone else worked with, and there is very little problem solving. They don't hit the same walls a real work developer does, and they don't have to work within the same limitations of old school developers. Imagine making your platformer and trying to cram it in to 384 kB. Most of them would look at you like your absolutely insane.

    My Nephew recenty graduated from a very accredited school, and was bored most of the way through because he started coding at 8, and had self taught himself most things and made several programs before he was 15. They were teaching him very little he had not already learned himself, but still landed a job and hit some trouble because he was faced with a problem outside the realm of what even he had dealt with.

    The only real way to get seasoned is to do it in real world situations, and solve the problems that slap you in your face you never knew could exist until you hit them, because they won't teach you that in school.

    I will give the YoYo guys credit for one thing, no way could I take on a code base like GameMaker, that I am sure was all patched up and hacked to hell from Mark's late night coding sessions, and make what they did out of it.
     
  8. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    I dont know what universities you are talking about. I am currently at Newcastle University. Not top ranked, but still respectable IMO. And we learn all of that. Sure, some students dont qute get it, but its still in our curriculum...

    [edit] also, about using linear/quadratic probing, it does devolve a little as time goes on unless you rehash. doing more tests
     
    Last edited: Dec 22, 2016
  9. Mike

    Mike nobody important GMC Elder

    Joined:
    Apr 12, 2016
    Posts:
    2,412
    Well...we were just talking about windows, and "normal" word sizes, so you'd think they'd have a stab at it - 32bit machines (at the time) and all....

    Actually.... we've interviewed loads that don't even know how negative integers are stored - this is across the board, not just from one place. They just don't teach the low levels any more, it's horrible. It's also due to the fact many go in looking for a 9 to 5 job so don't play at home with ideas/demos etc. You lose so much when you're not passionate about your work and don't try things at home.

    Aside from a pointer, you'll never beat a straight index. Hashing code is at least 3 or 4 times longer (if you're lucky), not to mention the all the extra memory it touches. It is dependant on how much you use them, but certain things - like my emulator, would suffer massively because all the emulated memory is held in a ds_grid, so that would cripple it I suspect. For "casual" use, you probably wouldn't notice, as the slight differences in a single lookup would be swallowed by the "app", but heavy usage would suck.
     
  10. rui.rosario

    rui.rosario Guest

    My University is not taken in a very high regard and even there they taught all those items (two's complement, bit shifting, bits in general xD, etc).

    Maybe it's just the sample your job offers' attracted that are just.... "flawed" (can't think of a better word)
     
    GMWolf likes this.
  11. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    Again, what universities? This was in our first years of study...
    Though i must admit, I cant quite remember exactly what systems uses what, with 2's complements and all...

    Yeah, My tests indicate a 2x loss in performance. Though im sure this can be improved greatly: The test was written in java, using tons of objects on the hashing side of things. Im sure if i where to rewrite it in C to use promitive types, the difference would drop down to about 1.5x
    Its true that its not great when simply accessing a data structure, but there are scenarios where you loose a ton of performance, such as nested data structures. having to itterate through data structures to set values to undefined seems like a massive loss in time.

    Also, I would like to point out that if GM used data structure types, then would we not get a massive gain in performance?
     
  12. Alice

    Alice Toolmaker of Bucuresti Forum Staff Moderator

    Joined:
    Jun 20, 2016
    Posts:
    792
    Wait, wait, let me get this straight.

    The problem here is that right now data structures are stored in re-usable number indices, which makes for various oddities and such, right?

    ...isn't it something that will be fixed altogether when a proper type system finally gets implemented, as Russell outlined here? ^^'

    I dunno, I just feel like there's a somewhat large discussion going on, over something that will eventually become outdated anyway...
     
    csanyk and GMWolf like this.
  13. Mike

    Mike nobody important GMC Elder

    Joined:
    Apr 12, 2016
    Posts:
    2,412
    tut-tut..... binary and how data is stored should be something that's always on instant recall :D

    It might do depending on how it was all put together, but again.... this would be a long way off - if ever. It's still very much a "wish" at the moment. We've been wanting it here for almost as long as we've been working here.... that and proper structures :)

    This might go away over time, but that time could be a looooong time. If we can do a short term fix that keeps performance, and gives mostly what folk are after, then i have no problems with it.
     
  14. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    Honestly I don't understand how this isnt priority number one right now!
    Structs would help GM so much! I can't believe how much time I loose using arrays and enums to emulate them!
     
  15. Mike

    Mike nobody important GMC Elder

    Joined:
    Apr 12, 2016
    Posts:
    2,412
    yes..... but adding it is a massive task, and there's always been even more important things on the plate.
     
  16. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    Well that's interesting! And promising... ish... but @rwkay said it's in the indefinite future, so who knows when or if it'll really happen?

    I wouldn't want to waste time on a solution to a problem for which an official fix will be forthcoming. But it'd be good to have some idea of when that might be.
     
  17. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    I'll buy this... I have no counter-argument against it, and no reason to believe it isn't true.

    That said, WHY *recycle* the index, using the first available index that is not currently assigned? That's what seems crazy to me.

    It would be so much better if you just auto-incremented the index counter every time a ds_*_create() returned an index. You could still have lightning fast access to the index, and the problem code that we've been talking with would not be a problem any longer:

    Code:
    a = ds_stack_create(); //returns index 0; index_counter++
    b = a; copies 0 to b
    ds_stack_destroy(a); releases index 0 
    c = ds_stack_create(); returns index 1
    
    ds_exists(a, ds_type_stack); //false; 0 is still a destroyed index; 0 was not reused.
    ds_exists(b, ds_type_stack); //false; as above.
    ds_exists(c, ds_type_stack); //true
    True, you might have to recycle the indexes *eventually* after the game runs a really long time, but if you use int64 for the index counter, that's going to happen sometime after the sun turns into a red giant...
     
  18. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    That would make for a huge lookup table!
    Its far better to use an array, paired with a queue.
    The array would map the index to the Ds, whilst the queue would hold the index of indices to recycle. This leads to O(1) time.
    Doing it your way would require searching for a free space, resulting in o(n) time....
     
  19. Alice

    Alice Toolmaker of Bucuresti Forum Staff Moderator

    Joined:
    Jun 20, 2016
    Posts:
    792
    @csanyk: I think the reason why indices should be re-used is so that the collection of structures doesn't grow too large. Imagine creating and destroying ~100 lists every step, or something. After 10 seconds on 50 FPS, you have a list with 50k entries, out of which only 100 refer to the valid structures at a time. What's worse, the more structures you create, the larger memory you use to keep that overly long list of non-reusable indices.

    ----------

    Of course, that could be avoided by accessing structures via a hash map instead of list, so you wouldn't have list with 100 valid indices and 49900 effectively useless indices, but instead a map with 100 entries. However, I think what Mike is saying that such a hash map performs significantly poorer than a list, and in some performance-critical applications using structures to great extent it would cause a big issue.

    At the same time, I see fel's point about hash map being useful in many cases, and if the map was common for all data structures, it means we would be able to tell structures apart from each other (still no ability to tell integers apart from structures, though). Perhaps that could be implemented as some sort of option (i.e. developer could choose whether a game should use high-performing structures list or strongly-typed structures map; kind of like people could choose whether to short-circuit or not) but it might be too much effort to bother.

    A little quality of life upgrade I could suggest: make ds_*_destroy(...) functions return undefined. That way instead of something like:
    Code:
    a = ds_list_create();
    ...
    ds_list_destroy(a);
    a = undefined;
    it could be instead:
    Code:
    a = ds_list_create();
    ...
    a = ds_list_destroy(a);
    It doesn't seem like a massive improvement (and it doesn't fix some other issues coming with clashing indices, like arbitrary JSON structures), but it does help encouraging the good practices at least a little (and aside from legacy reasons, I don't see why it should return 0 as it does now, anyway).
     
  20. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    No, that's basically what I had in mind. I just didn't speak to the precise implemenation details.

    I'm not trying to propose a fully engineered idea; I'm just stating what I feel are ideal requirements. I have no idea how the indexes for data structures are stored/tracked internally within the runtime. I just don't see how it slows the engine down to have to look up data structure zero vs. data structure one thousand, regardless of how many data structures currently exist. Accessing a particular data structure by index should always take the same amount of time, given a proper implementation (whatever that is.)

    Maybe I don't fully understand some key points, but as far as I can see, it shouldn't take the CPU any longer to address byte 0 in RAM than byte 2^64-1 -- if you provide it the address, it retrieves it. By similar logic, I don't see why it should slow down the GM engine to access a data structure who's got a GUID rather than a data structure who always has the smallest available ID at the time it was created.
     
  21. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    The problem here is with memory usage...
    If each Id is a 32 bit integer, and you allow ids to range fro 0 to INT_MAX, then you are using all available memory just to have a table of indices! That's simply not a viable solution.

    If you use a simple table, then you must reuse indices.
    If you want to have unique ID's, then you cannot effectively use a simple table.

    Unless you use a far more complex data structure (perhaps a heap or something would do), then those two statements hold true.

    Actually, would a heap not do? Isnt that how OO languages handle their object identifiers?
     
  22. icuurd12b42

    icuurd12b42 TMC Founder GMC Elder

    Joined:
    Apr 22, 2016
    Posts:
    1,840
    It's always a question of balance between memory size and access speed (defined by the method)

    you could have a 32k array each element holding a list and every time you create a list the index climbs. but what happens when you reach the last number, increase the array size to 64 k? how long until you run out? fall back to 0 and then proceed on looping to find the entry that's free? that's basically what's going on now I guess. but in reverse. is there a free spot in the allocated array? no? resize the array and put the new ds in there

    a hash table for lookup.. (ds_map) whatever is the faster method that takes less memory
     
  23. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    I don't get it. I agree with you that what you just said is basically what I have in mind, and now you're saying there's a problem with it.

    I don't get the skepticism. GUID is a thing. There have been implementations. It is a solved problem.

    I don't see much point in continuing to discuss, for me personally, until I actually get to a point where I'm having problems, or until I can actually address the issue with a fix.

    I gather that others are already at this point. Me, I'm glad to have gained some understanding and awareness of the issue.
     
  24. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    As far as I'm aware, GUID or UUID is generally done with maps or heaps.

    @Mike says that Maps are too slow, and that using a simple lookup table is far faster.
    This is true.
    However, using a simple lookup table means the table must be as large as the largest ID. Therefore, to avoid eating up all available memory, YYG decided to reuse IDs.
    I agree this is a sensible solution, when using a simple lookup table. Though I do feel a map with unique ids would be nicer.

    Essentially, I was disagreeing with you that both the simple lookup table and unique IDs could be used. If unique IDs are used, a simple lookup table simply uses too much memory.

    Much like you however, I would prefer GM to adopt a unique ID system at the cost of performance. I was not disagreeing on that.
     
  25. JaimitoEs

    JaimitoEs Member

    Joined:
    Aug 9, 2016
    Posts:
    175
    Right, i don´t want to be unsulting, sorry about that, but let me disagree with new engineer, yes, maybe they does´t know what is a bit, but can create good things in the principal languages like java, c, c#.. Why learn the begining of computing when you can focus your skills on the present and learn in the near future the principles of computing?? Are you coded GMS2 "bit to bit"?? My best friend is newly graduated , he knows what is a bit, is passionate of coding, and now, he´s working with AirBus like software engineer with no experience , one of the best company creating aircrafts. Surely the company knows "he is not prepared", but this is a team, a team to teach and give lessons when new staff does not know, and maybe one day this engineer can teach a lesson of his masters. Open your mind, maybe a new staff are not productive instantly, like a new staff with his new job, the team must to give the push toward new talents.

    And, what is the problem if your Knowledge are top than a new staff, you are the director to control your staff, is because this you are a CEO or another better rank than newest incorporations, control your team to do what you want and distribute the work.

    Well i am nobody to tell suggestions for a company. So the better for me is run and not look back..

    Again, sorry about if you feel insulted. I Know you are a big of the industrie with an a big weight over....
     
    Last edited: Dec 23, 2016
  26. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    I appreciate the clarification :) I understand better now.
     
  27. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    I'm OK with id's being recycled... eventually. I recognize that you can't use all the ram just to hold the address space of every possible id. But could they at least not re-use them *immediately*? Is there a middle ground solution, whereby we use an id, still in a lookup table, and it can't be re-apportioned for a time after being destroyed?
     
  28. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    All that would do is make it harder to notice a bug!
    If your code currently has bugs because of this, then changing it to have ids recycled after will only delay the bugs, not fix them.
    With the current system, you have to do as mike said and set all your references to undefined. At which point it doesnt matter if ids are reused.

    Even if ids are recycled later, you should still set your references to undefined.

    This is, unless you can intrinsically know if a DS exists or not.
     
    csanyk likes this.
  29. csanyk

    csanyk Member

    Joined:
    Jun 20, 2016
    Posts:
    821
    Well, granted. But it would also make it less likely that a bug would cause a problem that would be noticed... so hopefully, the player could enjoy the game and never notice there was a potential bug. It cuts both ways. Although, I would agree with you, anything that makes bugs harder to find is something that should be considered very carefully.

    I'm just saying that if it's inevitable that the unique id's for data structures will be exhausted at some point, rather than running out of data structures, I'd accept recycling the id's.
     
  30. Mike

    Mike nobody important GMC Elder

    Joined:
    Apr 12, 2016
    Posts:
    2,412
    Reuse will just depend on how big the table is, chances are it's (currently) a simple loop to find one, so the 1st free one (I'd guess) is what's returned. But being a simple array, the array size will be the maximum number you've had allocated at once (give or take). So if you allocate 4, then free them, the array size will only be (around) about that number. If you have thousands, then the array size will be that. We tend not to shrink the array as this is wasted time - usually if you've allocated this amount before, you will do so again.

    Finding a chunk of memory from a handle can be done in several ways. Either with a pointer, an index, or some kind of "handle/GUID". The index and the pointer will both be subject to the reuse as the if structures are ever pooled for faster allocation, then you may well get the same pointer back. Indices we know about. For true handles/GUIDs you have to do look up a table to find out the real index/pointer. This is slower as you are doing more work.

    As I said earlier.... it is possible we might be able to use a 64bit INT as the handle, so the lower part is the index, and the upper part the "unique" identifier, but this will depend on the VM internals which I will with folk here discuss next year. This would solve the issue as we could throw away the unique part and just keep the index. This shouldn't really be any slower than what we have, but will need testing and a chunk of code rewritten so might take time regardless.
     
    csanyk likes this.
  31. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    And thats supposed to be efficient? please, at least use a queue to keep track of ids.
    Code:
    ///finding next id
    if (queue.size > 0) {
        return queue.dequeue();
    } else {
       return nextID++;
    }
    Code:
    ///Delete an id
    queue.enqueue(id_to_delete);
    Thats O(1) rather that O(n).


    Thats would be great! :)
     
    csanyk likes this.
  32. Mike

    Mike nobody important GMC Elder

    Joined:
    Apr 12, 2016
    Posts:
    2,412
    Ideally it should be a stack allocater, but really you shouldn't be allocating a lot of them for this to make a massive difference. When you allocate anything, you have to assume its going deep down into memory allocation and doing all kinds of nasty things. Trust me.....a small loop to get an index will be the least of your issues.

    If you've allocated tens of thousands and are freeing/allocating the 20,000th one over and over, you'll probably notice, but for normal small numbers it'll not be much.
    On the odd occasion you DO run into this, a simple pool on you end would solve it. yes, it'd be nice to have internally - I don't disagree, but it's not a big enough issue "at the moment" to really warrant the time/testing to change it.
     
  33. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    Unique IDs is something I feel very strongly about.
    But thinking about it some more, and how i deal with memory in other languages, this isnt actually such a big deal.
    Yes I got some bugs because I assumed ids where unique, but that was mostly my fault.


    :)
    Also I'm starting to understand what some people mean when they say OO makes for bad programmers ^^ I'm totally stuck in that way of thinking now.
     
  34. Startoz

    Startoz Member

    Joined:
    Nov 22, 2016
    Posts:
    10
    @Mike , I know it's 9 months after the discussion, but I would like to point out that a unique identifier would have the added benefit that the engine could differentiate between different kinds of data structures:
    • using ds_map_destroy() instead of ds_list_destroy() by mistake can currently destroy a "random" ds_map, because the indexes of ds_maps and ds_lists are separate (e.g. both the first ds_map and the first ds_list that you create have index 0). with a unique identifier, the function will throw an error instead of deleting a "random" structure.
    • adding a list to a map or viceversa could automatically add them as nested data structures, so functions like ds_map_add_list() would not be needed anymore
     
  35. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    It's already been confirmed that YYG are looking to add actual data types for data structures (much like how arrays currently work).
    Check out the roadmap.
     
    Startoz and Yal like this.
  36. Yal

    Yal GMC Memer GMC Elder

    Joined:
    Jun 20, 2016
    Posts:
    3,930
    Lol, when I saw this pop up again I assumed the discussion had been going on all this time instead of the topic getting necrobumped :p

    Great news about actual types, though... I've been looking forward to that~
     
  37. Startoz

    Startoz Member

    Joined:
    Nov 22, 2016
    Posts:
    10
    That sounds great, thank you!
     
  38. GMWolf

    GMWolf aka fel666

    Joined:
    Jun 21, 2016
    Posts:
    3,470
    But given current development speeds, if you truly need unique ids, I would recommend you wrap the ds functions into something that would use unique ids and keep track of types.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice