Threading and Concurrency

Posted in The IOs of Game Netcode on 09/04/2014 at 09:00

In the last post of The IOs of Game Netcode, I talked about a few general rules of thumb I usually follow when starting a new netcode framework. Over the next few posts I’m going to go a little deeper into the technical options available to us as netcode developers and what routes I take based on different scenarios. The first topic I’m going to start with is threading, and the resulting issue – concurrency. This post is quite long so I’ve only addressed the issues the developer should keep in mind while working with threads. The solutions to these issues will be covered in the next post 🙂

Threading

As any netcode developer will know, the first hurdle you will hit when writing netcode or working with sockets is the threading issue. Normally, you can only execute a single piece of program code at a time. Threading allows you to execute multiple pieces program code simultaneously by running it on different threads. By default, reading and writing via sockets will block current execution of program code until it is complete. This isn’t necessarily an issue with writing to a socket unless you are writing faster than the hardware can handle (network card or modem), but if you’re reading, the operation will block until there is data to be read. One of the ways around this is to check the number of available bytes to be read before reading and if there are bytes available, only read that much. However, even with using this method, the actual reading and writing of bytes will block, even if it’s just a moment, and you don’t want this in your main game or render loop. Another way around this is to use asynchronous read and write operations. These run in their own threads automatically (provided by the socket library you are using) and pass the data back via a callback when complete. They have their uses, such as web services, but for real-time game netcode they can begin to cause problems as you cannot be sure of the order of transmission and you start to encroach of race condition territory.

So, in order to effectively read and write across a network with sockets, without causing the main program to hang while it is doing so, you’ll need at least one additional thread to perform the socket operations on. The reason I say at least one additional thread is because when developing your game client, you only really need a single socket to connect to your server. However, for the server, you’ll need a thread for every connecting client to handle each of the socket operations. For those of you who are now thinking “Why not use non-blocking sockets?”, I’m aware of this and it will be covered in a future post, but for the time being I’m focusing on the standard variety of sockets as there’s a lot more to consider with using non-blocking sockets 🙂

Anyway, as soon as you start working with multiple threads, it opens up a whole new bag of worms in the form of concurrent modifications.

Concurrency

Concurrent modifications are when you are reading a value of a variable in one thread, while it is being modified in another. Another example is looping over a collection while different thread is adding or removing an element from the collection. Most languages languages allow this type of access with unpredictable results. This is due to two main reasons.

Race Conditions

The first is the race condition, you just don’t know what thread is going to access the variable first. There are three scenarios for a race condition:

Read & Read – Both threads want to read the value of a variable. It is unknown which thread reads the variable first but it doesn’t matter as it does not change. Both threads read the same value.
Read & Write – One thread reads the variable, while the other writes to it. The final value of the variable will always be what is written, but the value read by the reading thread may be that of the variable before the write, or after. This can lead to the aforementioned unpredictable behaviour and potential crashes.
Write & Write – Both threads want to write. No read operations are carried out, but that does not eliminate an unpredictable value being read later. This is because the final value of the variable is unknown. It is the value of whichever thread wrote to the variable last.

The above scenarios are very specific and only show two threads accessing a single variable. However, in reality, these threads would be doing more than just reading and writing to a variable. For example, we have a shared (global or static) float variable called currentSpeed accessed by both threads:

Thread 1 – Anti speed-hacking protection

currentSpeed = player.getVelocity().getMagnitude();
if (currentSpeed > MAX_PLAYER_SPEED) {
    player.disconnect();
}

Thread 2 – Find the fastest moving entity

Entity fastestEntity = null;
currentSpeed = 0;
for (Entity entity : entities) {
    if (entity.getVelocity().getMagnitude() > currentSpeed) {
        currentSpeed = entity.getVelocity().getMagnitude();
        fastestEntity = entity;
    }
}
return fastestEntity;

For the record, you should never share a variable between two different tasks like this, but if you did this is how it might play out. For this example we are going to assume that the player is moving at a speed of 4 and there is one other entity in the world moving at a speed of 10:

// Start with Thread 2
Entity fastestEntity = null;
currentSpeed = 0;
for (Entity entity : entities) {
    if (entity.getVelocity().getMagnitude() > currentSpeed) {
// Switch to Thread 1
currentSpeed = player.getVelocity().getMagnitude();
// Switch to Thread 2
        currentSpeed = entity.getVelocity().getMagnitude();
        fastestEntity = entity;
    }
}
// Switch to Thread 1
if (currentSpeed > MAX_PLAYER_SPEED) {
    player.disconnect();
}

In this scenario, the first time currentSpeed is assigned is after the first switch, where it gets set to 4. Before it can test the value of currentSpeed, the process switches to thread 2, where currentSpeed is set to the value of the fastest moving entity’s speed, which is 10. Then the process switches back to thread 1 to perform the test. Oh look, the player is moving at a speed of 10, they must be speed-hacking, better disconnect them!

This occurs because the threads can switch at any point in during normal processing and is always something you need to keep in mind while working with multiple threads. There are mechanisms to get around these issues, but first…

Non-Atomic Operations

The second reason concurrent access can cause unpredictable results is due to non-atomic load and store operations. This is a bit more low level and might be harder to grasp for those who aren’t familiar with CPU architecture. There are a couple of definitions when it comes to the atomicity of an operation. It can refer to a single instruction or an operation of multiple instructions. Essentially, an operation is considered atomic if it completes in a single step relative to other threads. Therefore, a non-atomic operation can also result in a race condition as described above, but for the purposes of this section, we’ll be focusing on single instructions.

When you want to run your game (or program), you need to compile it into machine code first. Every developer knows this. During the compilation process, the compiler reads our source code and optimises it internally before outputting machine code, therefore the machine code doesn’t directly reflect the logic that we’ve defined. Most developers know this. One of the optimisations compilers do is to maximise CPU register usage. General purpose CPU registers typically have a size equal to the bit-architecture of the system. Modern day systems are 64 bit architecture and have 64 bit general purpose CPU registers. If you have two 32 bit integers that have some operation performed on them, the compiler will attempt to optimise the machine code to load them both into the same 64 bit register to perform the operation more efficiently. Some developers know this.

Now, the problem lies in the scenario where you attempt to perform an operation on a data type that has a larger bit requirement than the CPU register can handle, or the register already has some active data in it. The data ends up being split into multiple machine code instructions – and this is what causes the problem. Can you remember when I mentioned that threads can switch at any point in normal processing? Well, this happens at the machine code instruction level. So, a simple variable assignment such as:

long timestamp = 1L;

Can be split into two machine code instructions, with a thread switch right in the middle.

Not many developers know this.

This is the very essence of non-atomic operations. If processing switches to another thread during a multi-instruction load or store, the race condition is the least of your worries. Depending on your operation, you’ll either end up with a torn-read or a torn-write. One thread attempts to write a 64 bit integer to a variable but only gets as far as the first 32 bit store instruction, another thread reads the full 64 bit contents of the variable, then the first thread writes the second 32 bit store instruction. What the second thread ends up reading is one-half correct data, one-half bad data and one-whole big problem.

Some of you may now be thinking “Holy crap, threads are dangerous, how the hell do programs even function without exploding into a flaming ball of random corruption!?” Well, the answer is yes, they are dangerous, but there are also certain principles you can abide by and mechanisms you can use that prevent this pseudo-random behaviour. However, these topics will be addressed in the next post 😉

Like before, if you have any questions about this post or just want to chat netcode, please comment below or fire me a tweet at @Jargon64.

Thanks for reading! 🙂