Keeping tests focused: When one thing is broken, one test fails

This is the second article in the Effective Testing series. We will dive deep into the issue of duplicated test failures, false positive assertions and some aspects of System test transparency.

Sep 21, 2022

As I have explained in the Announcement, this article [hopefully] will be part of a book I am working on. You can see where this chapter fits in the “Contents” section of the Announcement.

In this chapter:

How to make and keep tests focused by applying the “DRA” principle to assertions.
Use the “setup” function to prevent a bug in “common logic” from failing multiple tests in a suite.
When the “DRA” principle has to be broken for Integration and System tests to prevent false positives.
How to help yourself debug failures in System tests by preemptively verifying dependencies.

Have you found yourself in a situation where you changed something in production code, ran tests, and failed dozens of tests? Knowing what has happened and where to start in such a situation is challenging. Now think if it were only one test, how much easier it would be to find the issue.

First, let’s clarify one thing: having one test fail for one underlying reason is impossible to achieve all the time. If you imagine any logic having an arboreous (tree-like) structure, one must go over the tree's trunk to reach its branches and leaves. So if logic in the “trunk” (“Common-logic branch”) fails, all tests for “branches” will fail too. Any control statement in your code (if, for, while, switch, etc.) produces branching, so we can’t avoid this problem, but we can minimize it.

The “setup” function

A long time ago, around 2013, I developed a test framework that focused on that idea. In that framework, every test is an atomic piece of potentially repeatable logic, and one piece of logic or a test depends on another, creating a “tree” of dependencies. The aim was to reproduce complex scenarios using small building blocks with few assertions but clear humanly-understandable intent. If one test fails, none of the other tests that depend on it will be executed. For example, you have to test a social media system, where a user has to be logged in and go to the correct web page to make a post. With that framework, you would have a test for making a post, depending on a test for opening a page, depending on the login test. So each test is a test and a setup function. It even outputs a nice diagram as a test result, showing all those branches and dependencies in vivid colors. Eh, good old times. Unfortunately, I haven’t open-sourced it, so there is no point in going into more details.

But even outside such elaborate frameworks, we have at least one layer of “set up” logic we can use. Every test framework I worked with has a notion of “setup.” It might have a different name, but the principle is the same: run this piece of code before every test. If setup fails, the tests are skipped, making it clear that something fundamental is broken, and you don’t have to stare down dozens of failed tests.

In short, ”validate a common logic for a given test group in a set-up function.”

I have to emphasize that this issue is exacerbated by the amount of logic in the System Under Test. Whereas for Integration and System tests, this problem is hard to avoid since we are testing the application as a whole, it is much easier for Unit tests. In the latter case, try to write smaller functions and use dependency injection with mocking to minimize the logic within one unit (function or class). To put it simply, write clean and readable code. My “rule of thumb” is that a function does too much if it is hard to test and therefore has to be simplified. For Integration and System tests, we will discuss specific strategies later in this chapter.

a function does too much if it is hard to test and therefore has to be simplified

The breadth of a test

A thing that is absolutely in your control is the breadth of a test. In other words, how much one test verifies. In theory, one can test the whole application in a single test. In practice, though, that test would never survive the trial of time. Another extreme is to have one assert per test. That might be perfectly fine for minimal functions at a Unit test level but will be detrimental for Integration, and System testing as a breath of tests at those levels is inevitably large and, in most cases, would require asserting multiple things.

Finding the right size and, more importantly, articulating it is no simple task, especially when you can’t simply use numbers to describe something. My mom makes outstanding pierogi1. When I moved to Canada, I really missed that dish. Even though I can buy it here, it is not the same. So I asked her for the recipe over a video call so I can make it myself. Even though she has a master's degree in mechanical engineering and is very good with numbers, the way, she described the recipe was funny and not at all usable. Yet it taught me a lesson. She said, “take that much flour,” showing her hands together, “... mix it with water, but not too much or too little. It has to be just right when you knead the dough.” At this point, I realize that the secret of her recipe will stay with her. The issue is she has made the dish so many times that it is evident to her. But all most impossible to articulate to others. And some things like “softness of the dough” can be only felt, not told. I remember this story every time I try to explain the right size of a test to others. Or any other problem you need to have personal intuition for. Let me try to explain it anyways.

So what is the “reasonable” breadth for a test? Here, we have to talk about Unit vs. Integration and System tests.

At the Unit level, you want to verify a maximum of one execution path per test. Say you are testing a function that has one “if” statement. In that case, you should have at least two tests, one for each “if” branch. Two “if” statements, at least four tests, and three “if”s result in a minimum of eight tests. The trend is exponential, so the argument made earlier for keeping the functions simple arrives at a mathematical proof. Of course, an “if” statement might not be nested or not have an “else” clause. So it is not strictly 2 to the power of the number of “if” statements, but it might be, and you do not want to be on the wrong side of the exponential trend. Similar math applies to other control statements2.

The rule here would be: “do not assert the same thing twice.” In other words, “Don’t Repeat Yourself” (DRY). The usage of the DRY principle is a bit twisted here. In its definition, you should avoid a redundant (repeated) declaration of the same logic. In the case of assertions, you should avoid redundant or duplicated method calls (assertion). So, let’s call it the “DRA” principle or the “Don’t Repeat Assertion” principle.

Do not assert the same thing twice

The breath of tests for Integration and System level tests

For Integration and System tests, the story is more complex. If you have an application with a few thousand lines of code, you would have the equivalent of a few hundreds of nested “if” statements and 2 to power 100 is a massive number of tests (1,267,650,600,228,229,401,496,703,205,3763). And that is if you are dealing with an unrealistically small application. If you try to test every possible code path at that level, you will end up on the wrong side of the exponential trend.

So at the System and Integration test level, one should focus on unique application interaction. Think about business requirements and user behavior, not how it is implemented. For example, if an application manages users, i.e., lets you create and retrieve a user. You would have tests for creation and tests for retrieval. Your “retrieve” tests would have to create a user first, but in the “retrieve” case, you would not validate the “create” logic (you most likely put it to the “setup” function as suggested above). Otherwise, when a create use-case fails, the corresponding retrieve case will fail, breaching the “one thing is broken, one test fails” rule.

Think about business requirements and user behavior, not how it is implemented

When asserting the same thing is a necessity

There are, as usual, some exceptions from the rules here. Namely, “DRA.” At System and Integration level, you sometimes should assert the same thing, but in a minimal and specific case. We have already touched on the issue with the arboreous nature of logic, meaning that some parts of logic have to be handled each time for various scenarios to reach the “leaves” of the “tree.” When we test the application as a whole, not just one Unit, the “tree” is large, and the path from a trunk to a leaf is much more complex and has multiple turns. So many things might have to be done before you can perform the function you are testing. Going back to our user management example. You need to create a user to retrieve it. But what if you have to perform more than one action with different input parameters? That would prevent you from encapsulating that logic in the “setup” function since you can’t parametrize it. There is simply more than one setup.

Let’s extend our example. Now a user can have different permissions, and depending on the permissions, the user should be able or unable to perform specific actions. Say you are writing tests for a user posting a message, moderating a post, etc. But you need to first create a user with specific permissions. Since the permissions are different (for posting and moderation), you can’t rely on one type of user created in the setup. So you have to create a user “on the fly” within a test. The goal or the focus of the test is to validate the user's action,” making a post,”, but the user's existence is a prerequisite. If we do not assert that the user is created correctly, our test of the user's ability to post might fail. The problem is that this failure will be misleading. You expect if a test of a user creating a post fails, it is due to a bug in the logic of post creation. But it can be because the user was not created or not created properly.

That happened to me on countless occasions. And it is so painful to run into this “rabbit hole” of debugging an issue that is not there: “The logic is correct; how can that happen? Am I that dumb? How does this system work at all? Is magic real? What is reality?” In that case, you must assert all required states and functions that are prerequisites for the function you are testing4.

To illustrate, let’s say we have the following test:

1 public void test_create_post_by_new_user() {

2 User user = new User("John", "Doe", new Permissions(REGULAR_USER));

3 UserCreateResponse userCreateResponse = userManagementClient.createUser(user);

4 Post post = new Post("Some amazing thing happened to me today");

5 CreatePostResponse actual = underTest.createPost(post);

6 assertThat(actual.StatusCode()).isEqualTo(HTTP_OK);

7}

It doesn’t matter what the implementation of the constructs in the example above is. The important part is that there is no assertion on the user being created successfully. I would change two things in this test, a) add the assertion for creating success and b) extract creation to a method. Since “b” is going to be discussed in great detail in the next chapter, “Write the intent, not the implementation,” let’s focus on “a” for now. We would need to simply add the following line after line 3:

assertThat(userCreateResponse.StatusCode())

.as(“Failed to create a user”)

.isEqualTo(HTTP_CREATED);

Now, if user creation fails, the test will go no further and will not confuse us with a false-positive failure at line 6.

Note on System tests

For System tests only, I would recommend also asserting the state of data in other systems that your System Under Test relies on. Quite often, I worked on applications that relied on other systems that were not so stable in a testing environment, where System tests are usually executed. And that instability often produced false positive failures in my tests. Every time prompting an investigation resulted in frustrating “Oh, that was because of X service misbehaving again.” This activity is a time drain. But more dangerously, it will erode your trust in the test results. At some point, you will just assume that the test failed due to that unstable service, and that is going to be precisely when the tests found an actual bug that you will ignore.

So try to have explicit verification for all systems you depend on before running your tests. That could be as simple as calling an API of a service X to verify if it is up and running, in case your System Under Test depends on said service X. Do this, of course, in a “setup” or “before all” function, or a particular test, if it is the only one depends on that system. It is important to make the call automatic and failure evident to anyone who would run the tests and see the results. Do not make it a manual step. Use your asserts with a clear message, and maybe add some info about who to contact if that situation occurs. Remember, the function of a test is to fail, and in that be as informative and as helpful as possible.

This kind of effort might feel redundant or that it is too much work, or “those folks should get their s**t together and fix it.” But reality will always win, no matter how right you are. If you do not do that, your tests will drain your team’s time and energy, and you might abandon them altogether. The latter action is much more costly and frustrating than writing a few lines of code that will make your life easier. You can take it as far as collect metrics on the offending service(s) stability and help make a case for a budget of time to solve the stability issue. Or automatically send emails to a manager of that service each time it is down. Automation is a powerful tool, and opportunities are limitless.

What does that mean in practice (code)?

There was a lot of text until now but very little code to illustrate it, so let’s put our new knowledge into practice and see how we can further improve the test from the previous chapter. To remind you, this is what we ended up with the last time5:

@Test

void testCreate_afterAssertImprovements() {

User user = new User(null, "a", "b", new Address("a", "b", "c", "d", "", "x"));

UserRepository repo = Mockito.mock(UserRepository.class);

Either<ValidationError, UserId> eitherValidationOrId = new UserManagementService(repo).createUser(user);

UserId userId = eitherValidationOrId.get();

Mockito.verify(repo).store(Mockito.eq(userId), Mockito.eq(user));

assertThat(userId).isNotNull();

user = new User(new UserId("a1"), "a", "b", new Address("a", "b", "c", "d", "", "x"));

eitherValidationOrId = new UserManagementService(repo).createUser(null);

assertThat(eitherValidationOrId).validationFailed()

.validationMessageIs("User can't be null")

.validationCodeIs("UCE0");

eitherValidationOrId = new UserManagementService(repo).createUser(user);

assertThat(eitherValidationOrId).validationFailed()

.validationMessageIs("User ID should be empty")

.validationCodeIs("UCE1");

}

We have introduced a custom assertion for Either<ValidationError, T>(If you are not familiar with generics, do not worry, it will not impact your understanding of the material), and now use it on lines 11 and 15. We also used explicit assert isNotNull for the user ID.

Looking at this code now, we can understand that the responsibility (breadth) of the test is too broad. The test validates three different things:

Create a user with no validation errors.
Create a user with a validation error
1. The User is null
2. The User id is not empty

Refactoring here is pretty straightforward; split this test into three cases:

@Test

void testCreate() {

User user = new User(null, "a", "b", new Address("a", "b", "c", "d", "", "x"));

UserRepository repo = Mockito.mock(UserRepository.class);

Either<ValidationError, UserId> eitherValidationOrId = new UserManagementService(repo).createUser(user);

UserId userId = eitherValidationOrId.get();

assertThat(userId).isNotNull();

Mockito.verify(repo).store(Mockito.eq(userId), Mockito.eq(user));

}

@Test

void testCreate_validate_user_null() {

UserRepository repo = Mockito.mock(UserRepository.class);

Either<ValidationError, UserId> eitherValidationOrId = new UserManagementService(repo).createUser(null);

assertThat(eitherValidationOrId).validationFailed()

.validationMessageIs("User can't be null")

.validationCodeIs("UCE0");

}

@Test

void testCreate_validate_userId_not_null() {

UserRepository repo = Mockito.mock(UserRepository.class);

User user = new User(new UserId("a1"), "a", "b", new Address("a", "b", "c", "d", "", "x"));

Either<ValidationError, UserId> eitherValidationOrId = new UserManagementService(repo).createUser(user);

assertThat(eitherValidationOrId).validationFailed()

.validationMessageIs("User ID should be empty")

.validationCodeIs("UCE1");

}

After these changes, whenever one test fails, it would clearly indicate what has failed. In future chapters, we will take these three tests and apply more concepts to them to improve them further6.

Summary

In this chapter, we have learned that having multiple assertions for exactly the same thing is harmful and that we can solve this issue by applying the Don’t Repeat Yourself principle to the use of assertions.
We saw how using a “setup” function could prevent issues in common logic from generating duplicated failures.
We learned about the importance of assertion of state for tests with a large breadth of functionality in System and Integration tests. And how complexity in unit tests can produce a monstrous number of tests.
Learned about a simple and relatively “cheap” technique to improve the informativeness of System tests that can prevent hours of debugging and loss of confidence in tests. And this can be achieved by a few lines of code and assertions made directly to the applications on which the System Under Test depends.
And finally, we have refactored the code from the previous chapter to see some rules mentioned here in practice.

Please leave comments and let me know what do you think about the format and the material itself.

Effective Software Engineer

Discussion about this post