Confused by testing terminology?

Have you ever been confused by the term “unit testing,” or heard it used in a way you didn’t expect? Or wondered what exactly “functional testing” means? Have you come up with an excellent way to test your software, only to be disdainfully told that’s not real unit testing? I believe testing is suffering from a case of confusing terminology, and I’d like to suggest a cure.

Consider a Reddit post I recently read where a programmer asked about “unit testing a web application.” What they very clearly meant was “automated testing”: how could they automate testing their application? Since they were using Python this usage might have been encouraged by the fact the Python standard library has a unittest module intended for generic automated testing. One of the answers to this question used unit testing in a different sense: automated tests for a self-contained unit code, and which only interact with in-memory objects. The answer therefore talked about details relevant to that particular kind of testing (e.g. mocking) without any regard to whether this was relevant to the use case in the original post.

Now you could argue that one or the other of these definitions is the correct one; our goal should be to educate programmers about the correct definition. But a similar confusion appears with many other forms of testing, suggesting a deeper problem. “Functional testing” might mean black box testing of the specification of the system, as per Wikipedia. At my day job we use the term differently: testing of interactions with external systems outside the control of our own code. “Regression testing” might mean verifying software continues to perform correctly, again as per Wikipedia. But at a previous company regression testing meant “tests that interact with the external API.”

Why is it so hard to have a consistent meaning for these terms? I believe it’s because we are often ideologically committed to testing as a magic formula for software quality. When a particular formula proves not quite relevant to our particular project our practical side kicks in and we tweak the formula until it actually does what we need. The terminology stays the same, however, even as the technique changes.

Imagine a web developer who is trying to test a HTTP-based interaction with very simple underlying logic. Their thought process might go like this:

  1. “Unit testing is very important, I must unit test this code.”
  2. “But, oh, it’s quite difficult to test each function individually… I’d have to simulate a whole web framework! Not to mention the logic is either framework logic or pretty trivial, and I really want to be testing the external HTTP interaction.”
  3. “Oh, I know, I’ll just write a test that sends an HTTP request and make assertions about the HTTP response.”
  4. “Hooray! I have unit tested my application.”

When they post about this on Reddit they are then scolded about how this is not really unit testing. But whether or not it’s unit testing is irrelevant: it’s useful testing, and that’s what really matters! How then can we change our terminology to actually allow for meaningful conversations?

We must abandon our belief that particular techniques will magically result in High Quality Code™. There is no universal criteria for code quality; it can only be judged in the context of a particular project’s goals. Those goals should be our starting point, rather than our particular favorite testing technique. Testing can then be explained as a function of those goals.

For example, imagine you are trying to implement realistic looking ocean waves for a video game. What is the goal of your testing? “My testing should ensure the waves look real.” How would you do that? Not with automated tests. You’re going to have to look at the rendered graphics, and then ask some other humans to look at it. If you’re going to name this form of testing you might call it “looks-good-to-a-human testing.”

Or consider that simple web application discussed above. They might call what they’re doing “external HTTP contract testing.” It’s more cumbersome than “unit testing,” “end-to-end testing,” “automated testing”, or “acceptance testing”… but so much more informative. There might eventually also be a need to test the public API of the library code the HTTP API relies on, if it grows complex enough or is used by other applications, That testing would then be “library’s public API contract testing.” And if they told you or me about we would know why they were testing, and we’d have a pretty good idea of how they were doing the testing.

So next time you’re thinking or talking about testing don’t talk about “unit testing” or “end-to-end testing.” Instead, talk about your goals: what the testing is validating or verifying. Eventually you might reach the point of talking about particular testing techniques. But if you start with your goals you are much more likely both to be understood and to reach for the appropriate tools for your task.

Agree? Disagree? Send me an email with your thoughts.

You shouldn't have to work evenings or weekends to succeed as a software engineer. Take control of your time and your career by reading The Programmer's Guide to a Sane Workweek.

You might also enjoy:

How to choose a side project
Book Review: Become a better learner by discovering "How Learning Works"
Don't crank out code at 2AM, especially if you're the CTO
From 10x programmer to 0.1x programmer: creating more with less