The tragic tale of the deadlocking Python queue

This is a story about how very difficult it is to build concurrent programs. It’s also a story about a bug in Python’s Queue class, a class which happens to be the easiest way to make concurrency simple in Python. This is not a happy story: this is a tragedy, a story of deadlocks and despair.

This story will take you on a veritable roller coaster of emotion and elucidation, as you:

  • Shiver at the horror that is concurrent programming!
  • Bask in the simplicity of using Queue!
  • Frown at a mysteriously freezing program!
  • Marvel as you discover how to debug deadlocks with gdb!
  • Groan as reentrancy rears its ugly head!
  • Gasp as you realize that this bug is not theoretical!
  • Weep when you read the response of Python’s maintainers!
  • Rejoice when you hear the Python’s maintainers new response!

Join me, then, as I share this tale of woe.

Concurrency is hard

Writing programs with concurrency, programs with multiple threads, is hard. Without threads code is linear: line 2 is executed after line 1, with nothing happening in between. Add in threads, and now changes can happen behind your back.

Race conditions

The following counter, for example, will become corrupted if increment() is called from multiple threads:

from threading import Thread

class Counter(object):
    def __init__(self):
        self.value = 0
    def increment(self):
        self.value += 1

c = Counter()

def go():
    for i in range(1000000):
        c.increment()
    
# Run two threads that increment the counter:
t1 = Thread(target=go)
t1.start()
t2 = Thread(target=go)
t2.start()
t1.join()
t2.join()
print(c.value)

Run the program, and:

$ python3 racecondition.py
1686797

We incremented 2,000,000 times, but that’s not what we got. The problem is that self.value += 1 actually takes three distinct steps:

  1. Getting the attribute,
  2. incrementing it,
  3. then setting the attribute.

If two threads call increment() on the same object around the same time, the following series steps may happen:

  1. Thread 1: Get self.value, which happens to be 17.
  2. Thread 2: Get self.value, which happens to be 17.
  3. Thread 1: Increment 17 to 18.
  4. Thread 1: Set self.value to 18.
  5. Thread 2: Increment 17 to 18.
  6. Thread 1: Set self.value to 18.

An increment was lost due to a race condition.

One way to solve this is with locks:

from threading import Lock

class Counter(object):
    def __init__(self):
        self.value = 0
        self.lock = Lock()
    def increment(self):
        with self.lock:
            self.value += 1

Only one thread at a time can hold the lock, so only one increment happens at a time.

Deadlocks

Locks introduce their own set of problems. For example, you start having potential issues with deadlocks. Imagine you have two locks, L1 and L2, and one thread tries to acquire L1 followed by L2, whereas another thread tries to acquire L2 followed by L1.

  1. Thread 1: Acquire and hold L1.
  2. Thread 2: Acquire and hold L2.
  3. Thread 1: Try to acquire L2, but it’s in use, so wait.
  4. Thread 2: Try to acquire L1, but it’s in use, so wait.

The threads are now deadlocked: no execution will proceed.

Queues make concurrency simpler

One way to make concurrency simpler is by using queues, and trying to have no other shared data structures. If threads can only send messages to other threads using queues, and threads never mutate data structures shared with other threads, the result is code that is much closer to single-threaded code. Each function just runs one line at a time, and you don’t need to worry about some other thread interrupting you.

For example, we can have a single thread whose job it is to manage a collection of counters:

from collections import defaultdict
from threading import Thread
from queue import Queue

class Counter(object):
    def __init__(self):
        self.value = 0
    def increment(self):
        self.value += 1


counter_queue = Queue()


def counters_thread():
    counters = defaultdict(Counter)
    while True:
        # Get next command out the queue:
        command, name = counter_queue.get()
        if command == "increment":
            counters[name].increment()

# Start a new thread:
Thread(target=counters_thread).start()

Now other threads can safely increment a named counter by doing:

counter_queue.put(("increment", "shared_counter_1"))

A buggy program

Unfortunately, queues have some broken edge cases. Consider the following program, a program which involves no threads at all:

from queue import Queue

q = Queue()


class Circular(object):
    def __init__(self):
        self.circular = self

    def __del__(self):
        print("Adding to queue in GC")
        q.put(1)


for i in range(1000000000):
    print("iteration", i)
    # Create an object that will be garbage collected
    # asynchronously, and therefore have its __del__
    # method called later:
    Circular()
    print("Adding to queue regularly")
    q.put(2)

What I’m doing here is a little trickery involving a circular reference, in order to add an item to the queue during garbage collection.

By default CPython (the default Python VM) uses reference counting to garbage collect objects. When an object is created the count is incremented, when a reference is removed the count is decremented. When the reference count hits zero the object is removed from memory and __del__ is called on it.

However, an object with a reference to itself—like the Circular class above—will always have a reference count of at least 1. So Python also runs a garbage collection pass every once in a while that catches these objects. By using a circular reference we are causing Circular.__del__ to be called asynchronously (eventually), rather than immediately.

Let’s run the program:

$ python3 bug.py 
iteration 0
Adding to queue regularly
Adding to queue in GC

That’s it: the program continues to run, but prints out nothing more. There are no further iterations, no progress.

What’s going on?

Debugging a deadlock with gdb

Modern versions of the gdb debugger have some neat Python-specific features, including ability to print out a Python traceback. Setup is a little annoying, see here and here and maybe do a bit of googling, but once you have it setup it’s extremely useful.

Let’s see what gdb tells us about this process. First we attach to the running process, and then use the bt command to see the C backtrace:

$ ps x | grep bug.py
28920 pts/4    S+     0:00 python3 bug.py
$ gdb --pid 28920
...
(gdb) bt
#0  0x00007f756c6d0946 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x6464e96e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x6464e96e00, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f756c6d09f4 in __new_sem_wait_slow (sem=0x6464e96e00, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f756c6d0a9a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4  0x00007f756ca7cbd5 in PyThread_acquire_lock_timed () at /usr/src/debug/Python-3.5.3/Python/thread_pthread.h:352
...

Looks like the process is waiting for a lock. I wonder why?

Next, we take a look at the Python backtrace:

(gdb) py-bt
Traceback (most recent call first):
  <built-in method __enter__ of _thread.lock object at remote 0x7f756cef36e8>
  File "/usr/lib64/python3.5/threading.py", line 238, in __enter__
    return self._lock.__enter__()
  File "/usr/lib64/python3.5/queue.py", line 126, in put
    with self.not_full:
  File "bug.py", line 12, in __del__
    q.put(1)
  Garbage-collecting
  File "/usr/lib64/python3.5/threading.py", line 345, in notify
    waiters_to_notify = _deque(_islice(all_waiters, n))
  File "/usr/lib64/python3.5/queue.py", line 145, in put
    self.not_empty.notify()
  File "bug.py", line 21, in <module>
    q.put(2)

Do you see what’s going on?

Reentrancy!

Remember when I said that, lacking concurrency, code just runs one line at a time? That was a lie.

Garbage collection can interrupt Python functions at any point, and run arbitrary other Python code: __del__ methods and weakref callbacks. So can signal handlers, which happen e.g. when you hit Ctrl-C (your process gets the SIGINT signal) or a subprocess dies (your process gets the SIGCHLD signal).

In this case:

  1. The program was calling q.put(2).
  2. This involves acquiring a lock.
  3. Half-way through the function call, garbage collection happens.
  4. Garbage collection calls Circular.__del__.
  5. Circular.__del__ calls q.put(1).
  6. q.put(1) trys to acquire the lock… but the lock is already held, so it waits.

Now q.put(2) is stuck waiting for garbage collection to finish, and garbage collection can’t finish until q.put(2) releases the lock.

The program is deadlocked.

Why this is a real bug…

The above scenario may seem a little far-fetched, but it has been encountered by multiple people in the real world. A common cause is logging.

If you’re writing logs to disk you have to worry about the disk write blocking, i.e. taking a long time. This is particularly the case when log writes are followed by syncing-to-disk, which is often done to ensure logs aren’t lost in a crash.

A common pattern is to create log messages in your application thread or threads, and do the actual writing to disk in a different thread. The easiest way to communicate the messages is, of course, a queue.Queue.

This use case is in fact directly supported by the Python standard library:

from queue import Queue
import logging
from logging.handlers import QueueListener, QueueHandler

# Write out queued logs to a file:
_log_queue = Queue()
QueueListener(
    _log_queue, logging.FileHandler("out.log")).start()

# Push all logs into the queue:
logging.getLogger().addHandler(QueueHandler(_log_queue))

Given this common setup, all you need to do to trigger the bug is to log a message in __del__, a weakref callback, or a signal handler. This happens in real code. For example, if you don’t explicitly close a file, Python will warn you about it inside file.__del__, and Python also has a standard API for routing warnings to the logging system.

It’s not just logging, though: the bug was also encountered among others by the SQLAlchemy ORM.

…and why Python maintainers haven’t fixed it

This bug was originally reported in 2012, and in 2016 it was closed as “wont fix” because it’s a “difficult problem”.

I feel this is a cop-out. If you’re using an extremely common logging pattern, where writes happen in a different thread, a logging pattern explicitly supported by the Python standard library… your program might deadlock. In particular, it will deadlock if any of the libraries you’re using writes a log message in __del__.

This can happen just by using standard Python APIs like files and warning→logging routing. This happened to one of the users of my Crochet library, due to some logging in __del__ by the Twisted framework. I had to implement my own queuing system to ensure users weren’t impacted by this problem. If I can fix the problem, so can the Python maintainers. For example, Queue.get and Queue.put could be atomic operations (which can be done in CPython by rewriting them in C).

Now, you could argue that __del__ shouldn’t do anything: it should schedule stuff that is run outside it. But scheduling from reentrant code is tricky, and in fact not that different from mutating a shared data structure from multiple threads. If only there was a queue of some sort that we could call from __del__… but there isn’t, because of this bug.

… oh wait, they have fixed it! Thank you!

Update: after I wrote this blog post the Python dev team reopened the bug, and then fixed it. Python 3.7 has a new queue class! Thanks python-dev!

Some takeaways

  1. Concurrency is hard to deal with, but queue.Queue helps.
  2. Reentrancy is hard to deal with, and Python helps you a lot less.
  3. If you’re using queue.Queue on Python, beware of interacting with the queue in __del__, weakref callbacks, or signal handlers.
  4. In Python 3.7, you can use SimpleQueue, apparently.

And by the way, if you enjoyed reading this and would like to hear about all the many ways I’ve screwed up my own software, sign up for my Software Clown newsletter. Every week I share one of my mistakes and how you can avoid it.

Thanks to Maciej Fijalkowski for suggesting actually demonstrating the race condition, and pointing out that __del__ probably really shouldn’t do anything. Thanks to Ann Yanich for pointing out a typo in the code. Thanks to the Python-dev team for fixing the problem after I wrote this.


You might also enjoy:

» Never use the word “User” in your code
» Resource cleanup, compared: Python, Go and C++
»» Get the work/life balance you need
»» Level up your technical skills