A few years ago I wanted to make a shoot-em-up game using C# and XNA I could play with my kids on an Xbox. It worked fine except for slight pauses for GC every once in a while which ruined the experience.
The best solution I found was to get rid of objects and allocation entirely and instead store the state of the game in per-type tables like in this article.
Then I realized that I didn't need per-type tables at all, and if I used what amounted to a union of all types I could store all the game state in a single table, excluding graphical data and textures.
Next I realized that since most objects had properties like position and velocity, I could update those with a single method by iterating through the table.
That led to each "object" being a collection of various properties acted on by independent methods which were things like gravity and collision detection. I could specify that a particular game entity was subject to gravity or not, etc.
The resulting performance was fantastic and I found I could have many thousands of on-screen objects at the time with no hiccups. The design also made it really easy to save and replay game state; just save or serialize the single state table and input for each frame.
The final optimization would have been to write a language that would define game entities in terms of the game components they were subject to and automatically generate the single class that was union of all possible types and would be a "row" in the table. I didn't get around to that because it just wasn't necessary for the simple game I made.
I was inspired to do this from various articles about "component based game design" published at the time that were variations on this technique, and the common thread was that a hierarchical OO structure ended up adding a lot of unneeded complexity for games that hindered flexibility as requirements changed or different behaviors for in-game entities were added.
The general approach described by the author is the one used by many game enginges. The main reason, however, the refactor that you've described here isnt used is that it bloats memory by a significant factor (which thereby reduces performance). Since you have to allocate for each array index the size of the largest item in the union, there is a lot of wasted space when you have lots of smaller items. If you have fewer than a couple billion points, a triangle will fit in 12 bytes, a game object can easily have 10 fields for a total of 30 bytes. Thats a massive expansion of your triangle array which will result in more cache misses.
Of course, this isn't a general solution to every problem and I wouldn't use it if there was a huge distribution in entity sizes like you described.
The reason it worked in my case was that for a 2d game, the state of even the maximum sized entity was very small. For animated objects I had a per-entity integer that stored what frame to display. The texture and animation data was stored elsewhere outside the table, but that was also loaded before the game started so there was no allocation after that point.
For a 3d game with complex animations it would be foolish to try to store all the animation and texture data in the same table, but I'd probably do something very similar for the game state.
I've tried a lot of ways of structuring game data but I basically agree with this assessment. Simple game logic can usually be premised on a single "God Object" type of deal that contains every possible property, or if you're in a language that supports it, is just a form of dynamic type. In old 8-bit games bit packing and unions would be used to conserve memory, but the games generally weren't complex enough in architectural scope to require a generalized approach to custom data structures and containers - just a few for optimization's sake.
It took until the late 90's and some overenthusiastic misfires with class inheritance-based designs(they don't compose cleanly) for the modern component-driven styles to start emerging. The tradeoff being made in current engines is basically one of late vs early binding - it's easier to write tooling against a late binding system where you can attach more nodes arbitrarily(e.g. Godot's scene system), but your optimization potential is better if you can compute exactly the data structures and containers needed for each category of entity and asset(there's a talk from Handmadecon about how Destiny went too far down this path and worked themselves into a corner).
Edit: Also, the biggest sticking point for working a GC in games isn't really with the entity and asset management since pooling will suffice for guaranteeing the quantities of elements needed for any given scene; it's with algorithms that work with standard functions that expect a GC environment, like string manipulation.
Shows what I know... I came around to the same conclusion as drblast with my own ecs project and assumed it would perform better than multiple arrays being a single array of the same type.
Of course, I also didn't consider any extra space wasted since it could potentially be used for recycling new components.
Reminds of some assembly (!) source code I read once for, I think it was, Super Mario brothers as ported to the TI-89 calculator. Every object in the game had the same bit structure (position, velocity, weight, width, some others), with the final word being a pointer to an "update" function. The game loop was a very straightforward physics update on each object followed by a call to its update function.
Pretty much. The fact that it solved a specific performance problem (avoiding GC pauses) in a tiny game I was making, along with the ease of reasoning about and debugging game state was really interesting to me. Most of the articles about it were about managing large code bases which didn't apply to my little toy game project.
I then took it to the furthest extreme I could (everything in a single table) just to see if it would work well, and it did. Almost a purely functional approach to game architecture.
Frame State 0 -> Collection of composable functions to modify state -> Frame State 1
I found that to be vastly simpler and more effective than the OO approach of segregating state and functions into individual opaque objects. At that point OO seemed like such a brain-dead way to make things more difficult for myself I wondered if the entity-component idea wouldn't be more universally applicable to systems other than games. It is.
This was one of those rare cases where taking something to a logical extreme had an architectural benefit (all state in one easy-to-reason about table) that matched up really well with the performance benefit (memory locality). For systems where the system state is too large or varied to fit in a human's head, the benefits wouldn't be as apparent. But game states are, by definition, something that should be fairly easy to reason about because that's what the player is seeing and interacting with.
This is a bit pedantic, but this type of entity component architecture, isn't really the architecture that "Entity Component System" has come to describe. ECS usually refers storing the components separately and looping over each one (or small groups of them) separately.
For instance, you could a physics system that loops over all of the velocity and position components and increments the positions by the velocity and a collision system that loops over the position components and handles collision.
>memory locality
The system you're describing has less memory locality than the ECS system described above. That's the main benefit of the added complexity of storing components separately. Another big benefit is that you can store different components in different types of data structures depending on how you need to access them.
Don't get me wrong, I'm not saying the ECS I'm talking about is necessarily better. It's not normally needed, and it's almost definitely overused by games that don't need it.
Quick rant: it's "Entity Component Systems". The "System" doesn't describe the approach in general, it describes an object called a "System". Entities are IDs (equivalent to indexes into his array), Components are property bags (values in his array), Systems are things that act on components (his update loop could be considered a System).
The idea behind it is that you can have a "WorldPosition" Component which stores all world positions as a flat array, and a "Gravity" System which acts upon WorldPosition components and makes them move. The idea is that since your data is tightly packed, it's easier to update.
A lot of people get this confused, though, so no worries.
> The final optimization would have been to write a language that would define game entities in terms of the game components they were subject to and automatically generate the single class that was union of all possible types and would be a "row" in the table
> polymorphic django models using automatic type-field downcasting
> The actual type of each object is stored in the database, and when the object is retrieved it is automatically cast to the correct model class
...
> the common thread was that a hierarchical OO structure ended up adding a lot of unneeded complexity for games that hindered flexibility as requirements changed or different behaviors for in-game entities were added.
So, in order to draw a bounding box for an ensemble of hierarchically/tree/graph-linked objects (possibly modified in supersteps for reproducibility), is an array-based adjacency matrix still fastest?
Are sparse arrays any faster for this data architecture?
Interesting, never came across it. But I've got a Django GitHub library (not yet open source, because it's in a project I've been working with on and off) that does the same for managing GitHub's accounts. An Organization and User inherit the same properties and I downcast them based on their polymorphic type field.
IDK how to do partial indexes with the Django ORM? A simple filter(bool, rows) could probably significantly shrink the indexes for such a wide table.
Arrays are fast if the features/dimensions are known at compile time (if the TBox/schema is static). There's probably an intersection between object reference overhead and array copy costs.
Arrow (with e.g. parquet on disk) can help minimize data serialization/deserialization costs and maximize copy-free data interoperability (with columnar arrays that may have different performance characteristics for whole-scene transformation operations than regular arrays).
Many implementations of SQL ALTER TABLE don't have to create a full copy in order to add a column, but do require a permission that probably shouldn't be GRANTed to the application user and so online schema changes are scheduled downtime operations.
If you're not discovering new features at runtime and your access pattern is generally linear, arrays probably are the fastest data structure.
Types in RDF are additive: a thing may have zero or more rdf:type property instances. RDF quads can be stored in one SQL table like:
_id,g,s,p,o,xsd:datatype,xml:lang
... with a few compound indexes that are combinations of (s,p,o) so that triple pattern graph queries like (?s,?p,1) are fast. Partial indexes (SQLite, PostgreSQL,) would be faster than full-table indexes for RDF in SQL, too.
Under XNA, the garbage collector on PC was rudimentary compared to the generational garbage collector on Xbox 360. I had the same problems when working on a game in 2008.
Ugh yeah, you're right - I had to look this up again, it's been that long. Makes sense as we always ran it on the machine as running it on PC would run fine.
ECS style is nice from a design perspective but as far as GC goes it was the pooling objects into tables that really solved your issue. You can pool objects in many different styles. Preallocation is a good idea as well as stack allocating what you can.
Centralised memory management doesn't inherently conflict with a decentralised OO design.
If you refactor your object construction into factory functions, you can make the factory responsible for implementing the memory management, and the objects themselves responsible for implementing the memory use. The objects can be ignorant about the handle array. The objects can delegate any memory-related functions to the factory methods.
> If you refactor your object construction into factory functions
Given this article is about c++, no need for factory patterns or anything like that. You can overload the new operator on a global, and a per type basis and do it that way.
A potential problem with "operator new" is that it you can only have one allocator per type for the entire process. When you explicitly pass a factory, you can have separate factory instances.
Agreed that there's not an inherent conflict, but it seems the author is also arguing against an OO kind of design. In typical OO, all components of a given object are "owned" by that object which provides an interface through which they are accessed. This article seems to propose that the different components of an object are owned and accessed via different systems.
It's an abstraction used to create objects and manage their lifetime. There is no inherent opposition to managing memory in that pattern. Also, it's been done before and it works, so what is the issue specifically (i.e. not some religious nonsense about separation of concerns)?
I use handles quite a bit in some rust code because you can't have self referential objects yet since rust does not have have immovable objects. While it felt like a huge hack in the beginning I really warmed up to the idea now.
Using handles instead of proper references is still a huge hack IMO, because you lose all the soundness checks that the compiler can do for references. It's a good thing that immovable objects are coming soon.
You can, but that only solves part of the problem: keeping pointer types from being used as the wrong type. You still have the equivalent of use-after-free to guard against which the lifetime system tracks for you for pointers, but not for handles.
I'm old enough to remember programming for the original Macintosh which used handles and relocatable memory. It was a nightmare. Constantly locking, unlocking handles was a chore and error prone. I also remember early Windows APIs that had opaque handle-based memory allocations. Sometimes in life it's useful to check your history books and understand why some things were consigned to history.
Constantly locking and unlocking handles was the problem. The whole point of having handles was so they could be moved around, and locking them defeated the purpose. People just created their own problems by dereferencing a handle and then wondering why their pointer became invalid and then locking stuff willy-nilly to prevent it.
And sometimes, the reasons themselves are historical and the technique can get a resurgence.
Specifically, this can easily be used to create a narrative for many functional idioms. The cost of memory and compute has been a large part of the ride in many modern tricks.
I've done this when I've had heavily interlinked structures. I had a 3D physics engine with elaborate data structures for all the polyhedra objects. Vertices link to edges, edges link to faces, faces link to edges and vertices, and the collision detection system follows those links to compute the closest points on two nearby objects efficiently.
All those data items were structures in arrays, part of the "Polyhedron" object. All interconnections between them were array indices. The Polyhedron object owned everything for that polyhedron. Delete that and all the parts went away.
Given the fixed amount of memory available back then, handles were the only way to avoid fragmentation, although you had to be careful if you required locking them temporarily. I even wrote a tech note on this when I was at Apple back then. How far we have come since then.
This was my first thought too. For extra confusion, Mac handles were pointers to pointers managed by the OS while on Windows, a "handle" just meant a pointer to a struct.
Originally, a handle was an index in a table of allocated memory blocks; the idea was to be able to move these blocks around without affecting the program.
It worked out fine for 16 bit systems and the like, just as it did on the Mac. As machines became more powerful, the need for this technique disappeared.
yes this was my immediate thought after reading about handles with upper bits used for meta-data .... right before a shudder passed down my spine .... it brought back all those early bad mac memories
Early versions of Mac OS ran on the 68000 and 68010, which had a 24-bit address bus. The upper 8 bits of the address were completely ignored, so some parts of the Memory Manager (documented in my link above!) would use those bits to store flags for handles, like "locked", "purgeable", and "resource". Seeing that Apple used these bits, some third-party developers followed suit in their own applications (or accessed those bits directly as a questionable optimization).
When Apple released computers which used 68030 processors, which had a 32-bit address bus, this trick stopped working. There was an awkward period where "32-bit addressing" was an option which could be turned on and off in the Memory control panel for compatibility with older software (and hardware!) which wasn't yet "32-bit clean".
sure they used upper bits of addresses that the hardware of the day ignored, pretty predictably the very next hardware release (68020) supported 32-bit addresses potentially breaking just about every mac binary (initially they faked it out with a PAL in the address path, they eventually faked this out in the MMU resulting in a bit of TLB thrashing)
This was definitely a case of someone not planning to be successful and not looking far enough into the future
Are we seeing the emergence of a generic software best practice of centralised state management emergence across the industry? This table based storage pattern is now common to:
- functional frontend apps via redux
- classical apps and server backends via relational databases
- game engines like the one mentioned here, and Unity's Entity Component System
- probably more systems I don't know about
It seems to me like separation of state and behaviours is one of those fundamental rules that should apply to most large software applications.
> However, the underlying design philosophy doesn’t fit very well into a classical OOP world
These problems indeed belong to a non-OOP view of programming. In OOP, the behavior is concentrated in objects while here a significant part of the behavior is concentrated in references.
I have been working on one possible generic solution which is called concept-oriented programming [1] and where references are first-class elements of the program where a great deal of the program behavior is performed during object access using these references.
One of the main ideas is that each object class is accompanied with a reference class. Such a couple is referred to as a concept (hence the name of the model). If reference class is empty then we get normal OOP. If object class is empty then we can model normal values (which are passed by copy). Of course, we can build a hierarchy by inheriting the behavior of references and objects (which is different from the classical inheritance but generalizes it in case we have only object classes).
COP is still on early stage and cannot be used just because there are other quite serious unsolved problems. But I see that the problems described in this post are not exceptions - describing how objects are represented and how they are accessed is an important piece of custom functionality in any complex system and it is precisely COP tries to formalize (along with some other tricky problems).
I've always wondered what the performance impact would be on 64 bit, assuming 32 bit indexes: on the one hand, half-size for most pointers, on the other hand extra indirection and the size of the table.
> Most of the following blog post is written from a game developer’s perspective, but should also apply to other areas ...
Video games are a special case where allocations and deallocations are often well scheduled and predictable. For the more general case, using a data type like ivector[1] for the object arrays would provide the desired memory safety. It gives you the flexibility of safely accessing an object via index, iterator or (zero-overhead smart) pointer. Even though it's a vector (like std::vector) its iterators behave like list iterators. That is, after an insertion or removal operation, unlike indexes, they continue to point to the same item (not necessarily the same position) and they only become invalid when the item they point to is removed. In particular, they don't become invalid after a resize/reallocation operation.
So using ivector's iterators instead of index handles would allow you to safely perform insertion/removal operations if necessary.
Also, for performance reasons you may sometimes want to temporarily obtain a direct pointer to the item. ivector allows you to obtain a zero-overhead smart pointer to the item that is guaranteed not to become invalid. [2] It does this by disabling "resize" operations on the vector while it exists.
The next step is to use the old structure called gap buffer, piece table or the more general zipper... And drop the pretense you are inventing something new. :)
Pretense not intended, thanks for clarifying. :) Rather than algorithmic novelty, the point is more that ivector has an API and implementation that is memory safe (memory safety seemed to be an emphasis of the blog post).
As for those next steps, I'm under the vague impression that they aren't as popular these days because larger cpu caches have greatly reduced the cost of moving contiguous data in most real-world scenarios. So that more often than not the cache coherency benefits of simple contiguous data outweigh the avoidance of shifting some of that (contiguous) data.[1]
Exactly why a gap buffer, piece table and zipper are a thing. These structures are optimized exactly for local modification, especially chunked or contiguous. These are semi-contiguous structures.
At some point with bigger data you will hit the brick wall where the memory move is too expensive. This is typically at a bunch of megabytes in size for localized block inserts. The idea of these only slightly more advanced structures is to amortize the cost or defer it.
For instance, a gap buffer is essentially 3 vectors/arrays. Piece table is a few more, merged as needed, usually few. A zipper can mix contiguous (potentially heap) access with tree structuring for rarely modified or accessed parts.
As for iterator stability. Either the structure knows about a active iterators (expensive) and fixes them up or you should probably use plain old pointers as such.
Bet you didn't know you can and should use a pointer as an iterator to a contiguous memory... And if you really for some reason do need to hold stable iterators to anything, std::array is good enough. Remember you cannot safely resize due to potential for memory copies on reallocation. (yes there are tricks, no they're very not cheap nor threadsafe nor concurrent)
The consistency model of ivector iterators is not quite explicit.
From what I can tell: that while this memory-management strategy can stop a program from seg-faulting or from executing remote code, the style of programming is the same as with `malloc` and `free`. You still have to `alloc_texture()` and `free_texture()` in your code even if those functions return a handle instead of a raw pointer.
Some observations about the strategy based on the article:
- It does *not* prevent memory leaks. It uses fixed size arrays, so once you allocate more memory
than there is room in the array, then the program has to exit.
- It does not prevent use after free. It just makes it easier to detect when this has happened.
This could make debugging easier in some situations. I do wonder though how it interacts with code analyzers.
Also this could work in any language. I wonder if it's common in game development for Android, because of Java and garbage collection spikes.
Oh, and another observation: I think this strategy is fully compatible with RAII and smart pointers. Smart pointers are implemented using malloc/free, and this strategy is isomorphic to malloc/free, so combining them should work fine.
I starting using rust recently and I noticed my programs tend to use the techniques described in this article. Using an "object" pool that is responsible for the creation / management / destruction of values and their referencing is a good way to reason about lifetimes when you need to manage DAGs or graphs (where reference ownerships and lifecycles are hard to reason about).
Centralized resource management, public index handles... So basically OpenGL? (They're called "object names" rather than "index handles" in OpenGL, but the principle seems to be the same.)
I had envisaged my 'manager' variable above as type widget_manager. Having a global manager of all things, or independent manager makes no difference to my argument.
The manager has to live in file-scope or above since there is no room in the handle to save a reference to it. At that point, you're creating a singleton and reducing the flexibility of the module. You now have all the usual issues with globals and singletons.
Unless of course you're passing around structures containing a handle and pointer to its manager...
There is nothing wrong with globals and singletons. They are very good patterns. Some areas of code deal with global state, and there is no sense in pretending it is not so. On the other hand some other areas definetly need a closure of their own (like a workload dumped to a worker thread).
The usual issues with globals pop up when people whant to run shortcuts and never implement the data separation from global to local when they should, think they can be used to replace a database like system, or - the horrors - use some multi level stack based system to push and pop global state as they merrily progress along the call stack - i.e run computations while using the global state as some sort of auxiliary storage unit.
Generally it is better to have a context with a well defined lifetime rather than the other two options.
Additionally thread safety puts very stringent requirements on global state, further making singletons and language goals worthless with few exceptions.
I think OO would be way faster if instead of keeping all of the object properties together in memory and dealing with one object at a time you'd have object collection as you base type (you rarely have one of anything in you program). It would have array for each object property and you would never access change or pass whole one object, just specific properties of specific object. This should be very cache friendly.
If you just want to inherit from the object collection you already have to have new object collection that behaves slightly differently then it should work exactly the same way as usual.
If you wanted to have different types of objects in one collection first you'd have to ask yourself if you really need to keep them together. But if so I suppose you'd just put base fields in one collection and new additional fields in another. Then handle to derived object would consist of two indexes, one in the base collection and the other in the collection of additional fields. Methods that access base field would need to use first index for them, and second index for additional fields.
Not sure if I'm being clear enough. Imagine you want to store collection of various subtyped objects in the database. This is pretty much the same problem. You keep base fields in one table, and specific fields in other tables (one for each subtype), and if you want to operate just on base fields you use base table, but if you want to operate on fields specific for a subtype you join base table with table that contains fields specific for that subtype.
The method of filling spare bits with random values to guard against stale object access seems like it would probably work well in practice, but is there a way to make it 100% air-tight and guarantee that stale object access will always be prevented? Otherwise it seems like that could end up as the basis of a security vulnerability in a program that uses this pattern.
Use a lot of bits, increment upon re-use of the slot, and throw when the pattern reaches its maximum.
Alternatively, give each item a unique handle (again counting up) and implement some sort of lookup table to convert those handles into actual indexes.
If you don't mind the handle being rather large, make the sequence number 64 bits, and store it alongside the index. 2^64 nanoseconds in years = 585... computers just aren't fast enough to wrap it around in any reasonable time frame.
Depending on how many index bits you want, you might be good with only 64 bits in total. 2^56 ns in years = 2.3; 2^52 ns in days = 52; 2^48 ns in days = 3.3; and so on.
All of these figures are worst case scenarios, anyway. Your program probably simply can't actually be made to allocate objects at that rate.
It's only a security vulnerability if there is a bug already present in the code. This mechanism is not introducing the vulnerability. In fact, it's helping find out the bug during development. At release runtime, the spare bits should always match.
I mean that if an object is deleted and its space in the array is reused for a new object, there's a small but non-zero chance that the spare bits will be identical, in which case the code would gain access to the new object when it is not supposed to. Or am I misunderstanding how the handle concept works?
Yeah, I think you are misunderstanding something here.
Yes, there is a non-zero chance that spare bits will be identical. But that's only relevant, if you have a bug in your code wherein you are trying to use a destroyed handle. If you are never using a destroyed handle, then the spare bits being identical is completely okay.
Another way to think of this is that the spare bits are optional. The interface is good even if there were no spare bits. The spare bits only add an additional probabilistic security check.
The fact that it's probabilistic is what makes it unreliable as a real security check. That's how I understood the original poster's concern. It might also lend a false sense of security, like most other solutions that don't actually solve the full problem.
* >A good example for this ‘system ___domain knowledge’ is the destruction of rendering
resource objects with modern 3D APIs: a resource object can not simply be destroyed
immediately when the user code says so because the resource might still be referenced
in a command list waiting to be consumed by the GPU. Instead the rendering system would only mark the resource object for destruction when the user code requests its destruction, but the actual destruction happens at a later time when the GPU no longer uses the resource.*
This is exactly the same behavior you get with shared_ptr, both the user code AND the rendering
system would hold a reference to the object and it would only be deallocated once BOTH of them
destory their shared_ptr. weak_ptr allows even more fine grained control.
You still need to manage your resource lifetime somehow, with either some equivalent of unique_ptr or shared_ptr. Otherwise you've just reinvented global variables. How else would you know when to free an object from your object pool? A "least-used" pattern could be useful in some scenarios but then you also need to be able to reconstruct the object again in case someone asks for it after a long time of being idle (like swapping to disk).
I am actually trying to implement generic pointers in Rust, that could safely work with object pools and/or preallocated memory regions of arbitrary bitness.
Deterministic performance hasn't been a thing since the early 90s. I'll settle for just performance.
But sure, there are cases where you want the level of predictability of reference counting, in which case I'd rather take the performance penalty than spend time and pollute the codebase doing a lot of tricks to arrange memory as a compacting garbage collector would.
The best solution I found was to get rid of objects and allocation entirely and instead store the state of the game in per-type tables like in this article.
Then I realized that I didn't need per-type tables at all, and if I used what amounted to a union of all types I could store all the game state in a single table, excluding graphical data and textures.
Next I realized that since most objects had properties like position and velocity, I could update those with a single method by iterating through the table.
That led to each "object" being a collection of various properties acted on by independent methods which were things like gravity and collision detection. I could specify that a particular game entity was subject to gravity or not, etc.
The resulting performance was fantastic and I found I could have many thousands of on-screen objects at the time with no hiccups. The design also made it really easy to save and replay game state; just save or serialize the single state table and input for each frame.
The final optimization would have been to write a language that would define game entities in terms of the game components they were subject to and automatically generate the single class that was union of all possible types and would be a "row" in the table. I didn't get around to that because it just wasn't necessary for the simple game I made.
I was inspired to do this from various articles about "component based game design" published at the time that were variations on this technique, and the common thread was that a hierarchical OO structure ended up adding a lot of unneeded complexity for games that hindered flexibility as requirements changed or different behaviors for in-game entities were added.
Edit: This is a good article on the approach. http://cowboyprogramming.com/2007/01/05/evolve-your-heirachy...