Way back when I started work on GOAL! (or Kick Off 3 as it was then) I realised that I needed something a little more robust than a flat state machine.
Picture a football player a moment. If you want to simulate a football player, it is quite logical to think of states. In fact if you have been following my train of thought in the previous post on this subject, you will know that thinking in states is actually something programmers do all the time, because each line of code is a state:
StandInLineUp();
WaitUntilAnthemsFinished();
if(isTeamCaptain()) {
WalkToCentreCircle();
WaitForRefereeCoinToss();
.....
}
You can also show the above as a state diagram, when suddenly people start thinking of it as a state machine, instead of just lines of code. But there’s no difference, really.

- Code as a state diagram
The reason for this kind of split thinking has to do with the need to multitask, i.e. do many things at once. The above code, if written as straight C++ code, has high level concepts such as “Wait”. It is logically impossible for a piece of traditional C/C++ code to “Wait” without devoting the entire program to that single task. If it is running in its own thread, one of many processes, then it can ‘wait’, whatever that actually means in practice (loop? Stopped process waiting for a message?). However we do not usually use operating system threading, or multiple processor threading for this kind of simple stuff. It’s not portable. It’s usually pre-emptive (meaning that you cannot control when the CPU switches between threads, such a switch can happen at any time and that’s usually not desirable in game code). It can be costly if you have thousands of threads.
So… what we do as programmers is not encode the above state graph as straight code; instead we create a state machine that is effectively a virtual machine running inside our program that takes each of the boxes (aside from the decision represented by the diamond) and labels them as a state, represented by some code number.

- Labeled states
We then have a variable associated with the player, called something like “state”, which holds the value of the current state, so that the player ‘knows’ what he is doing at all times. According to the state, the appropriate piece of code is executed. In this example, let’s start with state ’1′. ”Stand up in line” sets the direction and speed of the player to move, one game loop at a time, to the correct position in the line up, for instance. When the player gets to the right position, it then changes state to ’2′.
“Wait until anthems finished” could simply be a test to see if the music of the anthems are complete. This could be a thread somewhere else… maybe when the anthems are finished, this thread communicates that by setting a flag. The player would be looking at the flag every game loop until it shows that the anthems are finished. When this happens, it then checks to see if the player is the captain or not, and if yes sets the state to ’3′. And so on.
It’s a far cry from simple code. It’s messy, in fact it’s a bit like coding in assembler, only worse… as you shall see shortly.
Well with the original football games, I actually did program them in assembler, and in Kick Off, Kick Off 2 and Player Manger I used precisely this kind of system. There were 23 threads (22 players, and one game control thread) and each of these had a flat state machine. There were around 55 states for players and 45 states for the control thread, with different variations for player controlled players and the goalkeeper. The state variable was used as a lookup into a table of pointers to subroutines. Pause states were done by a general pause function that could be applied at any time and would return the player back to what he was doing when complete. This last point is interesting because it is a kind of ‘hardwired’ nesting of states: a crude hierarchy. Pausing was in effect a subtask, although this really did not make the system a HFSM (Hierarchical Finite State Machine). It was more of a hack addition for a general state applied everywhere.
All of these states transitioned between each other in much the same way as a goto statement can transition among lines of code. Funny that, isn’t it? When you type goto in C or C++, you hear this kind of booming voice overhead, accompanied with a crash of thunder “Thou Shalt Not Use The Goto!”. Yet a lot of programmers don’t think twice about the fact that when you make your typical state machine, say, using switch statements and jump between states, then you are in fact using a goto within your virtual state machine!
Have you heard of spaghetti code? Well I have a different anti-pattern, which is actually the same one really: spaghetti states!
But it gets worse: without the ability to nest states (i.e. have a sub state) then you are not only coding by use of your own goto statements, but you are coding without the aid of the humble subroutine. Yes, that’s right: imagine for a moment trying to code without functions. Ludicrous to attempt it? Well… that is in effect what you are doing when you work with a flat state machine, because it is like working with a microprocessor that has no stack. Think about that, and chew on it for a while….
Hierarchical Finite State Machines (or having a stack)
A long time ago, the creators of computers figured out that just being able to execute a stream of instructions and jump around, while useful, was a bit limiting because there was no way to create any form of modularity. If you had some code and wanted it usable by other bits of code, this was tricky to achieve, because although you could jump to it, there would need to be a way of returning to the calling code.

- Subroutines: the first important abstraction of programming
Now you could solve this by storing the return address somewhere, of course. If you look at the above example, you will see a very basic behavior. The entity goes to a series of waypoints. After reaching it waypoint it pauses for a certain length of time, then scratches its nose, then pauses again and then moves to the next waypoint. Clearly breaking down the code into three subroutines (Pause, Scratch Nose, Goto Next Waypoint) is a great way to manage complexity (which is the main purpose of abstraction). To achieve this, the code to call, let’s say ‘Pause’ could be stored some place where the Pause routine should return to when completed. In this way we get a useful modularity; we only need to write one Pause routine, and we can call it anywhere and then continue with the next instruction following that call.
“OK, OK, Dino” (say’s a group of programmers with a quizzical look) “We know all this”. Yes, quite. But I really want to hammer home an important point here. This revisiting of the most basic of programming principles is relevant because it seems to often get completely forgotten when it comes to implementing AI, to the extent that we have scripting languages at one end and encoding of the entire ‘code base’ in a data structures on the other (for example, in Behavior Trees). Let’s be clear why we jump through hoops: we are creating a virtual machine, to provide the same functionality that we take for granted when we write simple, straight forward code. When we create a finite state machine in C++ (or other language), if it is a flat state machine then make no mistake about it: we are creating a virtual execution environment (similar to what you find in a microprocessor) that does not have a stack, and cannot easily support sub-states (which are in fact subroutines).
The stack is a wonderful invention of course. The problem of where to store return addresses is non trivial. If we only have the memory to remember a single return address, then we cannot call a subroutine that calls another subroutine. That requires remembering two addresses. So this is why the invented stacks: so that you could call subroutines that themselves call subroutines. The stack is implemented often in hardware on microprocessors for the main purpose of remembering return addresses.
It is also important to note that if we encode the AI algorithms in a data driven virtual machine, then all we are doing is creating a virtual machine using complex data structures that replaces compiled code as we normally know it. Does that really make sense? I mean what happened to the idea of writing code in a language? Does it make sense to ditch that concept and replace it with some magnificent, complex system that basically does everything that a compiler does, except it stores the virtual ‘code’ in a tree? The only really valid direct advantage is the ability to modify the code while the code is running. That of course is valuable, but for most projects it is not *that* relevant. Most development still takes place with the good old compile and run approach.
What I am saying, in a nutshell, is:
- Scripting languages; implemented state machines; data trees (e.g. behavior trees) and so on are solutions that replace simply writing the code in C++.
- We do this mostly because C++ does not support multi tasking.
- We engineer a virtual machine that allows us to multitask in C++.
- We then implement much of the behavior at the virtual machine state level, instead of directly in C++.
Now I am not saying that this is wrong. What I am saying is that we should be clear about why we do it. However having said that, I do feel there are some major disadvantages to typically observed solutions to this problem:
- If you use a scripting language, you now have to work in two codebases and two languages instead of one
- If you use data trees, you have to develop a lot of tools just to be able to visualize what is going on, or use a domain specific language to populate the data tree.
- It adds a lot of complexity to your project
- It can have major performance costs (I do wonder sometimes about programmers who obsess about virtual function calls, but are quite happy to have the gameplay code running in Lua, Lisp or Python)
- You need to inevitably duplicate some code between the two languages
- Debugging can often be difficult if you do not spend time developing debugging tools
- If you do this without having an architecture to manage it, the code will quickly become a spaghetti.
What I want to do is simply write everything in C++… or to get as close to this as possible… and still have my codebase manageable. This is what my PROC system aims to do.
Coroutines: The Starting Point of PROC
Around 1992, I invented for myself a concept actually previously invented in the 1960′s called coroutines. I had no idea of this of course. The internet as we know it today did not exist, and it was actually quite hard to get information on programming other than with paper books. Research was difficult, and so many of us pioneers (you can forgive me labeling myself as such I hope) had to invent things for ourselves.
The assembler I used was that which came with the AZTEC C compiler. This was a macro assembler; this meant it had a macro capability similar to that of the C pre-processor. I used this to solve a problem. It was really bugging me that if I had a sequence of states that need to be executed one after the other (for example the loop in the previous diagram) I would have to encode this with a unique state number for each stage. It made for a lot of clumsy code.

- Part of a flat state machine for handling animations
In the diagram above, you can see an example of what I mean. I am showing three states of the flat state machine for the AI of a game entity which are all concerned with playing an animation. It’s a fictitious example to demonstrate a point. Note that there are two levels of state here: the state of the CPU (it’s a flowchart and in effect where you are pointing with your finger as you trace it is where the program counter is). The second level of state is the state variable. To start playing the animation, some other code in the game will set that state to 5 to commence playing the animation.
Let’s work through it. State 5 is all well and good, but we must make sure to store the parameter (the ID of the animation to be played) somewhere. The next time that the AI tick is called, this will be forgotten otherwise. So that’s what happens first. Then we have to set the state to 6, because we do not want to do 5 again (it would corrupt the animation ID).
We then check to see if a non interruptible animation is playing. If yes, then we just return (i.e. that is the end of the AI for this entity for this game tick).
When we have the all-clear, we remember the current animation (if one is playing). We ask the animation system to start playing the new animation, and then set the state to 7, because we do not want to call 6 again (it would invalidate the save of the current animation and re-trigger the animation from the beginning again).
If the animation is not finished, we return… so we are waiting until the animation is completed. When it is completed, we restore the previous animation, and then go on to do something else (not indicated in the diagram).
Simple, eh? Well, not really. It’s a bit of a mess. If we could, we would want to to write this code something like this:
AI {
State: PlayAnimation(_anim) {
anim = _anim;
WaitUntilInterruptable();
previousAnim = getCurrentAnim();
StartAnim(anim);
WaitUntilAnimFinished();
retsoreAnim(previousAnim);
}
Let’s compare that with C code for the above flowchart:
AI() {
switch(state) {
case 5:
anim = _anim; // where are these stored?
state = 6;
case 6:
if(isAnimInterruptable()) {
previousAnim = getCurrentAnim(); // where is this stored?
startAnim(anim);
state = 7;
} else {
return;
}
case 7:
if(isAnimFinished()) {
restoreAnim(previousAnim);
} else {
return;
}
// etc
}
}
This is quite a simple example, but as you can see it’s much messier and harder to work with in C++ than our ‘ideal’ language solution. Even in assembler, where I could make jump tables rather than use switch statements, it was not much better than the C solution. The particular thing that is really painful is how simple transitions between sequential states require a named state and code to set the that state after every stage. What a pain!
Then suddenly, I came up with an idea, and something that cannot be readily done in C++ (actually there are ways by using setjmp or inline assembler… but… they only take you so far as we will see later). I created a macro called “FromNowOn”. This macro simply took the program counter address of the first instruction following it, and stored that as a jump target. Instead of the switch, the address of the code to execute for the current state was stored in memory, so as soon as you executed FromNowOn, only the code following this macro would be called on the next tick. It is in effect a virtual program counter implementation. I found out a few years ago that this idea was known as a coroutine.
To conclude this part of the series, here is a version of the above in pseudo C++ that implements a FromNowOn instruction in a macro. It is definitely an improvement, as we no longer have to create labelled states for every stage of a sequence!
AI() {
switch(state) {
case 5:
anim = _anim; // where are these stored?
FromNowOn;
if(isAnimInterruptable()) {
previousAnim = getCurrentAnim(); // where is this stored?
startAnim(anim);
} else {
return;
}
FromNowOn;
if(isAnimFinished()) {
restoreAnim(previousAnim);
} else {
return;
}
// etc
}
}