Object ownership across programming languages
There are many skills you need to acquire as a programmer, and some of them are not part of the standard software engineering curriculum. Instead you’re expected to learn them by osmosis, or by working with someone more experienced. David MacIver covers one such skill: tracking which type a value has.
Another skill you need is an understanding object ownership in your code: knowing which part of your code owns a particular object in memory, and what its expectations are for access. Lacking this understanding you might write code that causes your program to crash or to suffer from subtle bugs. Even worse, some programming languages won’t even provide you with facilities to help you in this task.
Learning by osmosis
Here’s how I learned this skill. When I was in university I was once had to implement a Red-Black Tree in C. I’d basically skipped all the classes for the intro to C class, and still gotten a perfect grade, so as far as the school was concerned I knew how to code in C.
In reality I had no clue what I was doing. I couldn’t write a working tree implementation: my code kept segfaulting, I couldn’t keep the structure straight. Eventually I turned in a half-broken solution and squeaked by with a grade of 60 out of 100.
I spent the next 5 years writing Python code, and then got a job writing C and C++. I did make some mistakes, e.g. my first project crashed every night (you can hear that story by signing up for my newsletter), but in general I was able to write working code even though I hadn’t really written any C or C++ for years.
What changed?
I believe the one of the key skills I learned was object ownership, as a result of all the concurrent Python I was writing, plus the fact that C++ has a better model than C for object ownership. Let’s see what I learned over those years.
Object ownership for memory deallocation
Consider the following C function:
char* do_something(char* input);
Someone is going to have deallocate input
and someone is going to have to deallocate the returned result of do_something()
.
But who?
If two different functions try to deallocate the same allocation your program’s memory will be corrupted.
If no one deallocates the memory your program will suffer from memory leaks.
This is where object ownership comes in: you ensure each allocation has only one owner, and only that owner should deallocate it. The GNOME project’s developer documentation explains how their codebase makes this work:
Each allocation has exactly one owner; this owner may change as the program runs, by transferring ownership to another piece of code. Each variable is owned or unowned, according to whether the scope containing it is always its owner. Each function parameter and return type either transfers ownership of the values passed to it, or it doesn’t. … By statically calculating which variables are owned, memory management becomes a simple task of unconditionally freeing the owned variables before they leave their scope, and not freeing the unowned variables.
GNOME has a whole set of libraries, conventions and rules for making this happen, because the C programming language doesn’t have many built-in facilities to deal with ownership.
C++, on the other hand, has built a broad range of utilities for just this purpose.
For example, you can wrap an allocation in a shared_ptr
object.
Every time it is copied it will increment a counter, every time it is deallocated it will decrement the counter.
When the counter hits zero the wrapped allocation will be deallocated.
That means you don’t need to track ownership for purposes of deallocation: the shared_ptr
is the owner, and will deallocate at the right time.
This can be simplified even further by using languages like Java or Python that provide garbage collection: the language runtime will do all the work for you. You never have to track ownership for purposes of deallocating memory.
Object access rights
Even when memory allocation is handled by the language runtime, there are still reasons to think about object ownership. In particular there is the question of mutation: modifying an object’s contents. Memory deallocation is the ultimate form of mutation, but normal mutation can also break your program.
Consider the following Python program:
words = ["hello", "there", "another"]
counts = wordcount(words)
print(words)
What do you expect to be printed?
Typically you’d expect to see ["hello", "there", "another"]
, but there is another option.
You may also get []
printed if wordcount()
was implemented as follows:
def wordcount(words):
result = Counter()
while words:
word = words.pop()
result[word] += 1
return result
In this implementation wordcount()
is mutating the list it is given.
Reintroducing the concept of an object owner makes this clearer: each object is owned by a scope, and that scope might not want to grant write access to the object when it passes it to a function.
Unfortunately in Python, Java and friends there is no real way for a caller to tell whether a function will mutate an input, nor whether a parameter to a function can be mutated.
So you need to learn a set of conventions and assumptions about when this will happen and when it’s safe to do so: you build a mental model of object ownership and access rights.
I suspect most Python programmers wouldn’t expect wordcount()
to mutate its inputs: it violates the mental model we have for object ownership and access.
The concept of private attributes (explicit in Java, by convention in Python) is one way access rights are controlled, but it doesn’t solve the problem in all cases. When conventions don’t help and you’re uncertain you have to refer to API docs, or sometimes even the implementation, to know who can or might modify objects. This is similar to how C programmers deal with memory allocation.
Interestingly, C and C++ have a mechanism that can often solve this problem: const
.
You can define arguments as being const
, i.e. unchangeable:
std::map<string,int> wordcount(const vector<string> &words);
If you try to mutate the argument the compiler will complain and prevent you from doing so.
Other approaches: Rust vs. functional languages
Outside of const
the C++ features for object ownership management grew over time, as library code.
The Rust programming language, in contrast, provides object ownership as a feature within the compiler itself.
And this applies both to ownership for purposes of memory allocation but also for purposes of mutation.
Rust attempts to provide this features while still providing the full power of C and C++, and in particular control over memory allocation.
Where Rust code requires the programmer to make explicit decisions about object ownership, functional languages take a different approach.
In functional languages like Haskell or Clojure if you call a wordcount
function you know the arguments you pass in won’t be mutated, because objects are immutable and can’t be changed.
If objects can’t be mutated it doesn’t matter who owns them: they are the same everywhere.
The need to track object ownership in your head or in code for mutation control is obviated by making objects immutable. Couple this with garbage collection and you need spend much less time thinking about object ownership when writing purely functional code.
Summary: what you need to know
Depending on the programming language you’re coding in you need to learn different models of thinking about object ownership:
- When you’re writing functional code you don’t have to think about it much of the time.
- When you’re writing Java/Ruby/Python/JS/Go you need to think about object ownership as it applies to mutation: who is allowed to mutate an object? Will a function mutate an object when you don’t expect it? If you’re writing concurrent code this becomes much more important: conventions no longer suffice, and access needs to be explicitly controlled with locks.
- When you’re writing Rust the compiler understands a broad range explicit annotations for object ownership, and will enforce safe interactions.
- When you’re writing C++ you can rely on
const
for mutation control, up to a point, and on library code for automatic memory deallocation. - When you’re writing C you can rely on
const
for mutation control, up to a point, and the rest is up to you.
Next time you’re writing some code think about who owns each object, and what guarantees the owner expects when it passes an object to other code. Over time you’ll build a better mental model of how ownership works.
And if you’re writing in a non-functional language consider using immutable (sometimes known as “persistent”) data structures. You can get much of the benefits of a functional language by simply reducing the scope of mutation, and therefore how much you need to track object ownership.