Game Data Tools: Object-Oriented Design, but Data-Driven at Run-Time

Fragment of an XML document used in Civilization game

About twelve months ago, I had a vertical slice of Huscarlas running, and, up until this point, had been able to create the small amount of game data required, by directly entering it into the IDE. However, I now needed to author large amounts of content, expanding the scope of that vertical slice.

Most games have complete levels hand-authored, in their own, game-specific level editor. Each atomic element in the game engine must be placed, and then customised by a level designer. These elements may vary: from interactive objects, like enemy spawn points, to static props, like background textures. At its most basic, creating elements may involve typing lines of data into a text file, which can then be read-in, and parsed at run-time by the engine. This approach is sufficient for game prototypes that require only a small amount of data. The benefit of this approach is that only a simple text editor needs to be incorporated into a game's tool-chain.

A text (or hex) editor is a very general, off-the-shelf data-entry application, but there are no data-entry checks to ensure that data is correctly formatted, or that it is within a range of acceptable bounds. Also, a text editor does not offer you guidance on what that data will mean in the context of a game's engine. Without any contextual information about how the game will treat the data, qualitative judgement can not be used by a designer. For example, how many milliseconds should there be between two adjacent frames in an animation sequence? This is a subjective value that requires the resultant animation to be observed, whilst it is tweaked. Without visual feedback showing how a data-set manifests in a game engine, we are left to pick numbers out of thin air.

A more sophisticated approach is to build a custom, visual editor, which exposes a toolbox of graphical widgets to level designers, each widget representing a game element. These widgets can be placed onto a canvas, and then the editor outputs the represented elements' data into static files, derived from its own, internal model of a game level. The bespoke editor can be coded to sanitise data entry, and the widgets' appearances can be used to provide contextual feedback, showing how the data will be treated within a game's engine.

The amount of contextual information can be improved further, by having the editor interface directly with a game's engine. By allowing an element's data to be placed into an engine's run-time memory, the designer can observe exactly how the data manifests.

Whatever form the authorship tools for a game may take, they themselves must be written. If you are able to repurpose a packaged tool as your editor, then this will save you a great deal of effort. However, I have found the existing tools available to me to be inappropriate, as they are either: too general (not providing me with enough context about how my engine will interpret the data), or too specific (are tightly coupled to a related engine, requiring its use).

Coming from a software engineering background, this seems odd. There are several ubiquitous, contemporary data formats on which to build our game editors. My preference is XML, which has been around for over 15 years. It allows for the expression of most object-oriented concepts, is human-readable, and has had its veracity tested in a large number of applications. It would be logical to assume that these commonplace data formats are widely used to store game data.

In addition, as there are similar, atomic elements shared between most game engines (sprites, 3D models, animations etc.), it follows that the data representing these elements could be manipulated for most games with a single editor. Reusable tools should exist to author similar data, across a range of game engines, storing it in an open format.

This is not the case.

For example, I trialled dozens of editors in order to compose 2D sprite animations from a texture atlas, but found none fit for this commonplace task. Despite this, tools that could be reused across a wide range of game projects must exist; I just think that they have been created multiple times over, behind closed doors. This was frustrating, as it means that we have to replicate this work once again, for our own purposes.

Huscarlas' levels are procedurally generated; an algorithm determines the terrain and enemy placement each time a new level is started. However, there are still manually defined, atomic elements that the engine needs to be provided with. These are the building blocks for the engine to piece together at run-time. In essence, what I needed was an editor that allowed me to control which GUI widgets should be used to input each elements' data (for most data, this would be a simple text entry box, with validation). Specifically, using XML terminology, I should be able to define a schema for a document, and then have an editor configure itself by parsing that schema.

After using extensive search-fu, and systematically working my way through xml.com's list of XML editors, I found just two existing projects to create a self-generating GUIs:

XAmple (developed by Felix Golubov in Java, distributed in the public domain)
A Dynamically Generated XML Editor (developed by Marc Clifton in C#, distributed under a permissive licence)

Although neither contained GUI widgets to contextualise game engine elements, both offered a starting point, and had the potential to become the editor that I needed. Also, as I have described above, they could become an editor for the wider industry. Currently, I'm working with XAmple, as it was built with extensibility in mind, is well documented, and I have experience with Java.

Why, though, am I so focussed on storing game data in XML? Why should it be the data format of choice for the games industry?

My preference is mainly due to XML's ability to represent object-oriented data models. It is easiest to describe a game's elements as objects, listing their attributes and interactions. This approach is grounded in how we describe the world around us, in our own natural languages. XML has a grammar that we, human beings, can use to communicate easily with one another, whilst still being a useful representation for computing.

However, the considerations we have when composing our data are different to those when we are computing the data in our game engine. The data will no longer be used to communicate amongst humans, and so we can store it in a format that is better suited for computation only: optimised for walking, accessing and manipulation. It doesn't often make sense to instantiate the data at run-time as objects, with all of their individual attributes held adjacent in memory. If your code processes large sets of objects, accessing only a small number of their attributes at any time, then it would more efficient to break out similar attributes into their own data structures. In Huscarlas' code, attributes of the same type are stored in arrays. By tailoring our data structures, based on the logic that will process them, we improve the run-time performance. This is a data-driven approach.

When storing objects' attributes in their own, break-out data structures, each must have an index, to indicate which object they belong to. Once an object has a unique identifier, then all of its attributes can share this index wherever they are stored.

We humans tend to give our objects names that are descriptive, and easily remembered, but these are costly to use as indices. If we use strings as identifiers, they must go through a hash function, in order to reveal their underlying index values. When using arrays, it makes sense to assign a group of objects unique identifiers from a sequential range of numbers, starting from zero. This way, each of an object's attributes can be accessed directly in their respective arrays, using the object's numeric identifier as an index.

Manually maintaining sequential, numeric identifiers is unsustainable for a moderate to large data set. In order to facilitate a data-driven model at run-time, one of the first extensions I needed to make to my XML editor, was the ability to generated unique, numeric identifiers for objects.

However, our memorable names for objects must also be preserved. This is so that specific objects can be referenced in-code, without having to know their numeric identifier. By storing the identifiers as named values, using the native syntax of the code base, objects' indices can be easily recalled by name. In the case of Huscarlas, the code-base is written in C, and named values can be stored in enumeration blocks, output to header files.

I continue to add functionality to my XAmple-based editor, and intend to distribute it for wider use once it has matured. In the meantime, I've written Perl scripts to assign numeric identifiers to XML elements, as described above, and have uploaded the code to GitHub. If you would like to implement data-driven, run-time structures, derived from static XML documents, then you can find these helper scripts here: