Testing software

Oct 29, 2022

Pretty sure the first time I learned anything legitimate about testing software was in my senior year course ’software engineering’ class.

It was 90% worthless shit like estimating function points to calculate project length, but I did learn a handful of interesting nuggets of information. One of those nuggets was about the importance of having a good test suite. I'll never forget being asked about test inputs during our presentation and realizing we hadn't accounted for a lot of edge cases.

But there are many different kinds of testing, effort levels, and project lifecycle states to account for. It’s tough to strike the right balance of investment in testing against the return you get from that testing.

Unit tests

I first encountered unit testing during a training session on test driven development I took at my first employer. It was mostly worthless. I strongly suggest you quit your job and start giving those trainings, lots of money in them.

One of the key things to remember is that code is read many, many more times than it’s written, and likely modified by someone who did not originally write the code. The primary benefit unit tests provide is to make that maintainer comfortable that the changes they’re making won’t break some unexpected behavior.

Of course, you have to balance the size and maturity of your project against the costs imposed by adding these tests. If you’re going to be the sole maintainer of a project, don’t invest a lot in this. If your project is young and your codebase is thrashing, similarly avoid significant unit testing.

In my mind, unit tests mostly fall into three categories.

Trivial tests

All modules in your project should be covered by a trivial unit test. Even if it doesn't verify anything, making sure things wire together properly is a huge benefit. These are easy to make and maintain, but in exchange only verify that you haven't made the dumbest possible mistakes.

This has low costs and huge value though, in my experience. Particularly for untyped languages.

Unit test coverage

It's not uncommon to have a mandate to have XX% code coverage. This forces devs to write a lot of stupid tests to make that number go up. These tests are a huge waste of time, particularly for languages with verbose error handling.

Having tests like this is unnecessary busy work at development time, and unnecessary maintenance over the long run. If your coverage is over 60%, invest in better functional code tests instead.

Functional unit tests

Perhaps I've been corrupted by the golang mindset, but I'm a huge fan of:

Never using stubs
Minimal mocking
Rare fakes
Almost always using real dependencies, if it’s your code
- Exception is to cut at the external network layer

It used to be very common to hear people say ‘mock all dependencies, test in isolation’. I've come to believe (and I believe this is very typical now) that this makes for bad tests. It’s always better to test using real code, when you can.

Of course, this requires your code to be structured in a way that makes this possible. But I find that code that is easy to test generally ends up being better code.

The network boundary

The rare exception to this is the network boundary, but even then you can sometimes use real code. For example, if you're testing a gRPC server that implements methods, don't test the methods by calling them directly; stand up a dummy rpc server and call it using the generated client code on localhost.

But if you’re calling some truly external dependency (e.g. a web3 endpoint), mocking it is fine.

Integration tests

Unit tests give you some feeling of confidence while making changes, but in my mind, integration tests are really critical in terms of feeling comfortable about your code.

A good integration test suite should test your code in a realistic environment, cover all your critical user journeys thoroughly, and at least touch upon all major features you support. When adding new features you should bake in the cost of adding an integration test as part of the time estimate.

Unlike unit tests, an integration test is a good idea even in the early stages of a project. Having to stop and manually review every function of your application before pushing it live is a nightmare. Additionally, some use cases can just be difficult to exercise manually as a human. Scripting those is a huge win.

You can read the description of the integration tests I had for my Joepegs mint contest bot to get an idea of how useful this can be. It saved me infinite amounts of time vs doing similar tests manually.

Release validation testing

This is really just a subset of integration testing, but I’ve been devops-pilled HARD on release validation testing. If you’re a serious project without a release validation suite, you should make some time to review devops principles and think about adopting these practices.

A release validation suite is a series of automated integration tests that you run to validate a software artifact before promoting it to the next level in your release pipeline. If you’re a hobbyist developer you probably only have development (maybe even only local development) and production, but it’s very common to distinguish multiple stages with different levels of stability, such as:

autopush: an hourly build of your software
staging/qa/daily: autopush promotion 1-2x per day
weekly/pre-prod/uat: staging promotion 1-2x per week
production

After experiencing a project that exclusively uses full release validation testing, I’m never going back. I spent many years working at places that did long, slow, manual promotions and validations between environments.

The main downside is the large investment required to create and maintain the integration suites, and to set up the devops pipelines. But they are 100% worth the effort for any mature product. You will recoup the costs with reduced SWE hours on testing and with improved customer satisfaction due to fewer outages.

Production tests

Tests against production are frequently refered to as ‘probes’. These can range from simple uptime checks (‘does the page respond’) to complex workflows. If your service offers an API, you should be probing it periodically to ensure that it’s returning expected results with an expected amount of latency.

In a good setup, there will be an enormous overlap between your release validation tests and your probes. Basically the only thing that goes into validation testing and not into probing are tests that are literally damaging to your service; testing for extreme corner cases, load testing, etc.

The main difference between designing tests for release validation and production probing is that you generally don’t expect release validation tests to fail in prod; why would they, when you’ve tested them X times in the release pipeline?

The production probes that you add should be designed not to just confirm your features are working, but to validate that your dependencies are operating without error and within expected tolerances.

If your service depends on calling some AWS or GCP endpoint to get data in region X and that region is down, you need to know about it. Perhaps there’s some disaster recovery you can do to shift to another region. At minimum, you should be able to identify that the issue is occurring and communicate it to your customers, e.g. through a banner or something.

Bonus: Releases and rollbacks

A team at Google Cloud did some analysis and determined that an enormous percentage of outage incidents occurred immediately after a rollout of the service in question. Rollouts are unquestionably the riskiest time for a project.

It’s probably not that relevant to you, but their reaction was to immediately separate failure domains into regions, and move to a slow, multi-day rollout with automated rollbacks when error rates spike in newly-updated zones.

The part that probably is relevant to you is the shift in philosophy about rollbacks. Rollouts are risky because they’re a shift in state. Presumably the previous state was OK, since things weren’t exploding until the rollout. So everyone in Google Cloud was told to update their mental model to “If something goes wrong, and a release just happened, roll back”.

And this worked. Outages are rarer, less severe, and services recover more quickly. So my advice to you is to know how to roll back your service to a known-good state, and be prepared to do so immediately if things break (hopefully probes start failing).

If you’re following good deployment practices, this should be relatively easy to do. If you don’t know how to rollback your service quickly, you should ask yourself why not, and what you need to do to get to a state where you can.

tactical_retreat’s stuff

Discussion about this post