Race Conditions

A race condition is where the behaviour of code depends on the interleaving of multiple threads. This is perhaps the most fundamental problem with multi-threaded programming.

When analysing or writing single-threaded code we only have to think about the sequence of statements right in front of us; we can assume that data will not magically change between statements. However, with improperly written multi-threaded code non-local data can change unexpectedly due to the actions of another thread.

Race conditions can result in a high-level logical fault in your program, or (more excitingly) it may even pierce C++'s statement-level abstraction. That is, we cannot even assume that single C++ statements execute atomically because they may compile to multiple assembly instructions. In short, this means that we cannot guarantee the outcome of a statement such as foo += 1; if foo is non-local and may be accessed from multiple threads.

A contrived example follows.

Listing 1. A logical race condition

int sharedCounter = 50;

void* workerThread(void*)
{
    while(sharedCounter > 0)
    {
        doSomeWork();
        --sharedCounter;
    }
}

Now imagine that we start a number of threads, all executing workerThread(). If we have just one thread, doSomeWork() is going to be executed the correct number of times (whatever sharedCounter starts out at).

However, with more than one thread doSomeWork() will most likely be executed too many times. Exactly how many times depends on the number of threads spawned, computer architecture, operating system scheduling and...chance. The problem arises because we do not test and update sharedCounter as an atomic operation, so there is a period where the value of sharedCounter is incorrect. During this time other threads can pass the test when they really shouldn't have.

The value of sharedCounter on exit tells us how many extra times doSomeWork() is called. With a single thread, the final value of sharedCounter is of course 0. With multiple threads running, it will be between 0 and -N where N is the number of threads.

Moving the update adjacent to the test will not make these two operations atomic. The window during which sharedCounter is out of date will be smaller, but the race condition remains. An illustration of this non-solution follows:

Listing 2. Still a race condition

void* workerThread(void*)
{
    while(sharedCounter > 0)
    {
        --sharedCounter;
        doSomeWork();
    }

The solution is to use a mutex to synchronise the threads with respect to the test and update. Another way of saying this is that we need to define a critical section in which we both test and update the sharedCounter. The next section introduces mutexes and solves the example race condition.