Introduction
Automated testing (I’m deliberately ignoring manual testing in this post) has become a hot subject for which many people have strong opinions, especially when unit testing and system testing are involved (integration and component testing seem to be a bit less controversial). The debate, typically, is around what kind of testing is more effective in helping to find and prevent software defects when developing applications. I think each type of automated testing has its place. However, in my experience, too much emphasis on system tests can slow a project down to a crawl or even kill it. This applies to system of all sizes: from small ones developed only by one person, to big distributed systems developed by several geographically distributed teams. The only exceptions I’ve seen were quick throwaway applications, for which often there are very few, if any, smoke tests.
In particular, in the (quite a few) legacy systems I worked on, the difficulty in maintaining and evolving them was never due to the absence of tests per se—system tests, in a way or another, were always present—but the absence of more finely grained ones (e.g., integration tests, component tests, and, especially, unit tests), which made it extremely difficult or impossible to change some parts of those systems safely and in a reasonable amount of time.
In the following sections I’ll explain the reasons in more detail.
The Purpose Of Tests
The reasons for testing a software system can be summarised in two broad points:
- Verify that the system does what it is supposed to do, and doesn’t do what it is not supposed to do
- Support the development and the evolution of the system
The first point should be obvious as it is what, traditionally, tests are supposed to help with. However, even if that point is very important, I think the second one is more interesting.
Successful software systems, not only change and evolve for as long as they are used, but, nowadays, they are often delivered incrementally with new functionality added with each increment. Therefore development teams need a mechanism to check that they haven’t broken any existing functionality (or any system quality) in the process.
Automated testing, albeit having its limitations (e.g., according to some research, formal code inspections can find more bugs), can be a very convenient technique for that purpose, as it is easier to learn and apply, scales better and is cheaper than other techniques (e.g., formal proofs, and formal code inspections).
However, there are some caveats: in order to help with the development and evolution of the system, the tests need to
- Be reasonably simple to create—or people will not create enough of them, or, perhaps, create just some easy, but not necessarily high value, ones
- Provide feedback at the right granularity level—running an entire distributed system to check the implementation of a single class or method is inconvenient, to say the least
- Easy and fast to run—or they won’t be executed as often as they should be (if at all)
Unfortunately, those are areas were system tests have some serious limitations.
Some Limitations of System Tests
System tests are very useful—some qualities, like performance, scalability, usability, and other qualities, are, in many cases, better tested at the system level—however:
- They can be quite complicated to setup and run, especially for distributed systems
- They can be very slow, making the feedback loop quite long and the implementation of changes in the system very time consuming
- They may be very difficult or even impossible to run on a developer’s machine. In those cases they are often run in a shared test environment, forcing the members of the development teams to work serially, therefore slowing everybody down
- They don’t scale well compared to other kinds of automated tests. Increasing their number will slow down the testing cycle considerably (and running them in parallel might be infeasible or will just solve the problem for a limited time)
- They are unhelpful in pinpointing the hiding places of bugs found in production (even if they can be very useful in reproducing the bugs in the first place)
- Their coarse-grained nature makes them unsuitable for checking that a single component, class, function, or method has been implemented correctly.
The last point is probably the most important one—and missed by many, who tend to focus on execution speed instead. Big systems are composed of smaller components, which, in turn, are composed of smaller components, and so on and so forth until we reach class, function and method levels.
Using system testing to see if a method or class works as expected can give is equivalent to check that a small cog in the engine of a car works well by fully assembling the car first and then by trying (I’m using “trying” because, if the various components have not been tested in isolation, many or all of them may not work at all) to run a few laps in a track with it.
That approach is inefficient, and ineffective. Is inefficient because every small change would require a longer than needed testing procedure (making code refactorings a major chore, therefore avoided as much as possible). Is ineffective because the system test may still pass or break for reasons that have nothing to do with the small changes being tested.
The limitations above have quite a big impact on what is tested in practice—system tests are generally used to cover some positive scenarios and a few negative ones, leaving the majority of the possible scenarios out—making over-reliance on system testing quite a dangerous practice. In fact, all the systems I’ve come across that were developed relying almost entirely on system testing, were, without exception, bug-ridden and very difficult to maintain and evolve.
If we add multiple distributed teams to the mix, with each team developing one or more components of the entire system—something quite common these days—the problems above get orders of magnitude worse. I’ve seen those problems first hand several times: I’ve been involved in quite a few distributed projects where the almost total reliance on system testing brought the development activities (not to mention bug fixing and production support) almost to a standstill. The more the teams the worse the situation.
A Better Option
A better option, in my experience, is to rely mainly on lots of unit testing and an healthy amount of component and integration testing, and use system testing mainly to make sure the system hangs together by running the main acceptance and performance tests.
In cases of systems developed by multiple teams, unit and component tests become even more important. Teams working on components depending on each other need to agree on how those components will communicate (they will have to do this anyway), and use that information to create the necessary stubs on their own environments so that each team will be able to work (and to run their tests) in isolation. When all the teams are ready then they can run some integration and system tests to verify there were no misunderstandings in the definition of the communication protocols of the components involved.
Conclusion
I’m not against system tests—I usually like to work outside-in starting with a failing end-to-end test—but I think that they are not the answer to all testing needs. An over-reliance on them can bring development and maintenance activities to a standstill first and to a rewrite from scratch as the next step (which comes with its own problems, and is very rarely a good idea).
Other types of testing like unit and component tests are more suitable than system ones in many situations.
A final observation. In the projects I’ve been involved with, the problems in the ones with worthless unit tests were not due to unit testing per-se, but to a lack of design and automated testing skills inside the teams. In fact, the production code in those systems was invariably bad, and the system tests weren’t in a good shape either.
So, if you are thinking about putting a lot emphasis on system tests for your testing activities, my suggestion is to take a step back and think again, or your project may be killed as a result.