The tragic tale of the deadlocking Python queue
This is a story about how very difficult it is to build concurrent programs.
It’s also a story about a bug in Python’s
Queue class, a class which happens to be the easiest way to make concurrency simple in Python.
This is not a happy story: this is a tragedy, a story of deadlocks and despair.
This story will take you on a veritable roller coaster of emotion and elucidation, as you:
- Shiver at the horror that is concurrent programming!
- Bask in the simplicity of using
- Frown at a mysteriously freezing program!
- Marvel as you discover how to debug deadlocks with
- Groan as reentrancy rears its ugly head!
- Gasp as you realize that this bug is not theoretical!
- Weep when you read the response of Python’s maintainers!
Join me, then, as I share this tale of woe.
Concurrency is hard
Writing programs with concurrency, programs with multiple threads, is hard. Without threads code is linear: line 2 is executed after line 1, with nothing happening in between. Add in threads, and now changes can happen behind your back.
The following counter, for example, will become corrupted if
increment() is called from multiple threads:
from threading import Thread class Counter(object): def __init__(self): self.value = 0 def increment(self): self.value += 1 c = Counter() def go(): for i in range(1000000): c.increment() # Run two threads that increment the counter: t1 = Thread(target=go) t1.start() t2 = Thread(target=go) t2.start() t1.join() t2.join() print(c.value)
Run the program, and:
$ python3 racecondition.py 1686797
We incremented 2,000,000 times, but that’s not what we got.
The problem is that
self.value += 1 actually takes three distinct steps:
- Getting the attribute,
- incrementing it,
- then setting the attribute.
If two threads call
increment() on the same object around the same time, the following series steps may happen:
- Thread 1: Get
self.value, which happens to be 17.
- Thread 2: Get
self.value, which happens to be 17.
- Thread 1: Increment 17 to 18.
- Thread 1: Set
- Thread 2: Increment 17 to 18.
- Thread 1: Set
An increment was lost due to a race condition.
One way to solve this is with locks:
from threading import Lock class Counter(object): def __init__(self): self.value = 0 self.lock = Lock() def increment(self): with self.lock: self.value += 1
Only one thread at a time can hold the lock, so only one increment happens at a time.
Locks introduce their own set of problems. For example, you start having potential issues with deadlocks. Imagine you have two locks, L1 and L2, and one thread tries to acquire L1 followed by L2, whereas another thread tries to acquire L2 followed by L1.
- Thread 1: Acquire and hold L1.
- Thread 2: Acquire and hold L2.
- Thread 1: Try to acquire L2, but it’s in use, so wait.
- Thread 2: Try to acquire L1, but it’s in use, so wait.
The threads are now deadlocked: no execution will proceed.
Queues make concurrency simpler
One way to make concurrency simpler is by using queues, and trying to have no other shared data structures. If threads can only send messages to other threads using queues, and threads never mutate data structures shared with other threads, the result is code that is much closer to single-threaded code. Each function just runs one line at a time, and you don’t need to worry about some other thread interrupting you.
For example, we can have a single thread whose job it is to manage a collection of counters:
from collections import defaultdict from threading import Thread from queue import Queue class Counter(object): def __init__(self): self.value = 0 def increment(self): self.value += 1 counter_queue = Queue() def counters_thread(): counters = defaultdict(Counter) while True: # Get next command out the queue: command, name = counter_queue.get() if command == "increment": counters[name].increment() # Start a new thread: Thread(target=counters_thread).start()
Now other threads can safely increment a named counter by doing:
A buggy program
Unfortunately, queues have some broken edge cases. Consider the following program, a program which involves no threads at all:
from queue import Queue q = Queue() class Circular(object): def __init__(self): self.circular = self def __del__(self): print("Adding to queue in GC") q.put(1) for i in range(1000000000): print("iteration", i) # Create an object that will be garbage collected # asynchronously, and therefore have its __del__ # method called later: Circular() print("Adding to queue regularly") q.put(2)
What I’m doing here is a little trickery involving a circular reference, in order to add an item to the queue during garbage collection.
By default CPython (the default Python VM) uses reference counting to garbage collect objects.
When an object is created the count is incremented, when a reference is removed the count is decremented.
When the reference count hits zero the object is removed from memory and
__del__ is called on it.
However, an object with a reference to itself—like the
Circular class above—will always have a reference count of at least 1.
So Python also runs a garbage collection pass every once in a while that catches these objects.
By using a circular reference we are causing
Circular.__del__ to be called asynchronously (eventually), rather than immediately.
Let’s run the program:
$ python3 bug.py iteration 0 Adding to queue regularly Adding to queue in GC
That’s it: the program continues to run, but prints out nothing more. There are no further iterations, no progress.
What’s going on?
Debugging a deadlock with
Modern versions of the
gdb debugger have some neat Python-specific features, including ability to print out a Python traceback.
Setup is a little annoying, see here and here and maybe do a bit of googling, but once you have it setup it’s extremely useful.
Let’s see what
gdb tells us about this process.
First we attach to the running process, and then use the
bt command to see the C backtrace:
$ ps x | grep bug.py 28920 pts/4 S+ 0:00 python3 bug.py $ gdb --pid 28920 ... (gdb) bt #0 0x00007f756c6d0946 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x6464e96e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x6464e96e00, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007f756c6d09f4 in __new_sem_wait_slow (sem=0x6464e96e00, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007f756c6d0a9a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29 #4 0x00007f756ca7cbd5 in PyThread_acquire_lock_timed () at /usr/src/debug/Python-3.5.3/Python/thread_pthread.h:352 ...
Looks like the process is waiting for a lock. I wonder why?
Next, we take a look at the Python backtrace:
(gdb) py-bt Traceback (most recent call first): <built-in method __enter__ of _thread.lock object at remote 0x7f756cef36e8> File "/usr/lib64/python3.5/threading.py", line 238, in __enter__ return self._lock.__enter__() File "/usr/lib64/python3.5/queue.py", line 126, in put with self.not_full: File "bug.py", line 12, in __del__ q.put(1) Garbage-collecting File "/usr/lib64/python3.5/threading.py", line 345, in notify waiters_to_notify = _deque(_islice(all_waiters, n)) File "/usr/lib64/python3.5/queue.py", line 145, in put self.not_empty.notify() File "bug.py", line 21, in <module> q.put(2)
Do you see what’s going on?
Remember when I said that, lacking concurrency, code just runs one line at a time? That was a lie.
Garbage collection can interrupt Python functions at any point, and run arbitrary other Python code:
__del__ methods and weakref callbacks.
So can signal handlers, which happen e.g. when you hit Ctrl-C (your process gets the SIGINT signal) or a subprocess dies (your process gets the SIGCHLD signal).
In this case:
- The program was calling
- This involves acquiring a lock.
- Half-way through the function call, garbage collection happens.
- Garbage collection calls
q.put(1)trys to acquire the lock… but the lock is already held, so it waits.
q.put(2) is stuck waiting for garbage collection to finish, and garbage collection can’t finish until
q.put(2) releases the lock.
The program is deadlocked.
Why this is a real bug…
The above scenario may seem a little far-fetched, but it has been encountered by multiple people in the real world. A common cause is logging.
If you’re writing logs to disk you have to worry about the disk write blocking, i.e. taking a long time. This is particularly the case when log writes are followed by syncing-to-disk, which is often done to ensure logs aren’t lost in a crash.
A common pattern is to create log messages in your application thread or threads, and do the actual writing to disk in a different thread.
The easiest way to communicate the messages is, of course, a
This use case is in fact directly supported by the Python standard library:
from queue import Queue import logging from logging.handlers import QueueListener, QueueHandler # Write out queued logs to a file: _log_queue = Queue() QueueListener( _log_queue, logging.FileHandler("out.log")).start() # Push all logs into the queue: logging.getLogger().addHandler(QueueHandler(_log_queue))
Given this common setup, all you need to do to trigger the bug is to log a message in
__del__, a weakref callback, or a signal handler.
This happens in real code.
For example, if you don’t explicitly close a file, Python will warn you about it inside
file.__del__, and Python also has a standard API for routing warnings to the logging system.
It’s not just logging, though: the bug was also encountered among others by the SQLAlchemy ORM.
…and why Python maintainers haven’t fixed it
I feel this is a cop-out.
If you’re using an extremely common logging pattern, where writes happen in a different thread, a logging pattern explicitly supported by the Python standard library… your program might deadlock.
In particular, it will deadlock if any of the libraries you’re using writes a log message in
This can happen just by using standard Python APIs like files and warning→logging routing.
This happened to one of the users of my Crochet library, due to some logging in
__del__ by the Twisted framework.
I had to implement my own queuing system to ensure users weren’t impacted by this problem.
If I can fix the problem, so can the Python maintainers.
Queue.put could be atomic operations (which can be done in CPython by rewriting them in C).
Now, you could argue that
__del__ shouldn’t do anything: it should schedule stuff that is run outside it.
But scheduling from reentrant code is tricky, and in fact not that different from mutating a shared data structure from multiple threads.
If only there was a queue of some sort that we could call from
__del__… but there isn’t, because of this bug.
- Concurrency is hard to deal with, but
- Reentrancy is hard to deal with, and Python helps you a lot less.
- If you’re using
queue.Queueon Python, beware of interacting with the queue in
__del__, weakref callbacks, or signal handlers.
And by the way, if you enjoyed reading this and would like to hear about all the many ways I’ve screwed up my own software, sign up for my Software Clown newsletter. Every week I share one of my mistakes and how you can avoid it.
Update: Thanks to Maciej Fijalkowski for suggesting actually demonstrating the race condition, and pointing out that
__del__ probably really shouldn’t do anything. Thanks to Ann Yanich for pointing out a typo in the code.