Wellllll....I can't correct you if you're wrong, because I don't know that for sure. I did also say that my input could be wrong.
The number of draw calls will be the same if it used instances with their own sprites, but then you'd have to reference the instances. That could be an extra bit of cost, or hassle, orchestrating it across multiple objects, compared to one object handling it all.
For example: Say that it is correct that they could have "snapshotted" the inventory, and use a surface to show the whole of it. That would be the initial cost of drawing it all, and then after that (probably) a much reduced cost of just drawing the surface. If the inventory doesn't change in a whole second, then why make the extra draw calls per step, for every step, when you could "scoop" them up into one graphic, hold it in memory, and display that instead? One object and one draw call, versus several instances each with their own draw calls.
Also, unlike sprites, I believe a surface is not held like a texture page, so maybe doesn't have an engine cost like flushing the graphics would have? It's not quite my level of capability to say with 100% certainty.
Instead of using instances, and doing some collision events to see if the mouse is colliding with a particular object of the inventory, and then having to get that object to communicate back some things, they just went on the mouse position and figured out the rest. By defining beforehand how they set up the UI / inventory etc they could be reducing unnecessary things, simply by sticking to a framework where they know its proportions.
If you did the above, and used instances, it might go like this:
1) Setting all 12 to be visible / invisible when wanting to display, or turn off, the inventory
2) 12 draw calls when they are visible, per step
3) 12 instances to be checked if you use some form of collision for finding out which one currently has a mouse on it
4) If you went with point 3 it maybe takes referring back some info to the inventory object as to what was clicked on, and then another piece of data from the inventory object back to the "clicked" instance as to what it should do.
Take the same set up and do it through one drawing (a surface) or one object drawing it:
1) One instance being set to invisible / visible when wanting the inventory to be displayed or not
2) 12 draw calls for, say, one step, and then one draw call after that if using a surface to show everything.
3) No collision checking, at all. Because you know where it is displayed on the screen, what each block will be, and what the dimensions are, you instead use the mouse coordinates.
if mouse_x div block_width == 5
{if mouse_y div block_height == 10
{is_a_carrot = true;
do whatever}
if mouse_y div block_height == 20
{is_a_cauliflower = true;
do whatever}
if mouse_y div block_height == 30
{is_a_carrot = true;
do whatever}
}
You wouldn't have to interact with any other instances, and I think a purely mathematical function is faster than any collision checking to achieve the same result. It would take some thought setting it up, and be a little rigid in it's execution, but ultimately takes off some the strain and performance cost.
Without that developer explicitly stating why they've done certain things you can only guess at the intention. If my best guesses are wrong....hopefully someone more knowledgeable than me will answer this thread.