System Tests Can Kill Your Project

Introduction

Automated testing (I’m deliberately ignoring manual testing in this post) has become a hot subject for which many people have strong opinions, especially when unit testing and system testing are involved (integration and component testing seem to be a bit less controversial). The debate, typically, is around what kind of testing is more effective in helping to find and prevent software defects when developing applications. I think each type of automated testing has its place. However, in my experience, too much emphasis on system tests can slow a project down to a crawl or even kill it. This applies to system of all sizes: from small ones developed only by one person, to big distributed systems developed by several geographically distributed teams. The only exceptions I’ve seen were quick throwaway applications, for which often there are very few, if any, smoke tests.

In particular, in the (quite a few) legacy systems I worked on, the difficulty in maintaining and evolving them was never due to the absence of tests per se—system tests, in a way or another, were always present—but the absence of more finely grained ones (e.g., integration tests, component tests, and, especially, unit tests), which made it extremely difficult or impossible to change some parts of those systems safely and in a reasonable amount of time.

In the following sections I’ll explain the reasons in more detail.

The Purpose Of Tests

The reasons for testing a software system can be summarised in two broad points:

  1. Verify that the system does what it is supposed to do, and doesn’t do what it is not supposed to do
  2. Support the development and the evolution of the system

The first point should be obvious as it is what, traditionally, tests are supposed to help with. However, even if that point is very important, I think the second one is more interesting.

Successful software systems, not only change and evolve for as long as they are used, but, nowadays, they are often delivered incrementally with new functionality added with each increment. Therefore development teams need a mechanism to check that they haven’t broken any existing functionality (or any system quality) in the process.

Automated testing, albeit having its limitations (e.g., according to some research, formal code inspections can find more bugs), can be a very convenient technique for that purpose, as it is easier to learn and apply, scales better and is cheaper than other techniques (e.g., formal proofs, and formal code inspections).

However, there are some caveats: in order to help with the development and evolution of the system, the tests need to

  • Be reasonably simple to create—or people will not create enough of them, or, perhaps, create just some easy, but not necessarily high value, ones
  • Provide feedback at the right granularity level—running an entire distributed system to check the implementation of a single class or method is inconvenient, to say the least
  • Easy and fast to run—or they won’t be executed as often as they should be (if at all)

Unfortunately, those are areas were system tests have some serious limitations.

Some Limitations of System Tests

System tests are very useful—some qualities, like performance, scalability, usability, and other qualities, are, in many cases, better tested at the system level—however:

  1. They can be quite complicated to setup and run, especially for distributed systems
  2. They can be very slow, making the feedback loop quite long and the implementation of changes in the system very time consuming
  3. They may be very difficult or even impossible to run on a developer’s machine. In those cases they are often run in a shared test environment, forcing the members of the development teams to work serially, therefore slowing everybody down
  4. They don’t scale well compared to other kinds of automated tests. Increasing their number will slow down the testing cycle considerably (and running them in parallel might be infeasible or will just solve the problem for a limited time)
  5. They are unhelpful in pinpointing the hiding places of bugs found in production (even if they can be very useful in reproducing the bugs in the first place)
  6. Their coarse-grained nature makes them unsuitable for checking that a single component, class, function, or method has been implemented correctly.

The last point is probably the most important one—and missed by many, who tend to focus on execution speed instead. Big systems are composed of smaller components, which, in turn, are composed of smaller components, and so on and so forth until we reach class, function and method levels.

Using system testing to see if a method or class works as expected can give is equivalent to check that a small cog in the engine of a car works well by fully assembling the car first and then by trying (I’m using “trying” because, if the various components have not been tested in isolation, many or all of them may not work at all) to run a few laps in a track with it.

That approach is inefficient, and ineffective. Is inefficient because every small change would require a longer than needed testing procedure (making code refactorings a major chore, therefore avoided as much as possible). Is ineffective because the system test may still pass or break for reasons that have nothing to do with the small changes being tested.

The limitations above have quite a big impact on what is tested in practice—system tests are generally used to cover some positive scenarios and a few negative ones, leaving the majority of the possible scenarios out—making over-reliance on system testing quite a dangerous practice. In fact, all the systems I’ve come across that were developed relying almost entirely on system testing, were, without exception, bug-ridden and very difficult to maintain and evolve.

If we add multiple distributed teams to the mix, with each team developing one or more components of the entire system—something quite common these days—the problems above get orders of magnitude worse. I’ve seen those problems first hand several times: I’ve been involved in quite a few distributed projects where the almost total reliance on system testing brought the development activities (not to mention bug fixing and production support) almost to a standstill. The more the teams the worse the situation.

A Better Option

A better option, in my experience, is to rely mainly on lots of unit testing and an healthy amount of component and integration testing, and use system testing mainly to make sure the system hangs together by running the main acceptance and performance tests.

In cases of systems developed by multiple teams, unit and component tests become even more important. Teams working on components depending on each other need to agree on how those components will communicate (they will have to do this anyway), and use that information to create the necessary stubs on their own environments so that each team will be able to work (and to run their tests) in isolation. When all the teams are ready then they can run some integration and system tests to verify there were no misunderstandings in the definition of the communication protocols of the components involved.

Conclusion

I’m not against system tests—I usually like to work outside-in starting with a failing end-to-end test—but I think that they are not the answer to all testing needs. An over-reliance on them can bring development and maintenance activities to a standstill first and to a rewrite from scratch as the next step (which comes with its own problems, and is very rarely a good idea).

Other types of testing like unit and component tests are more suitable than system ones in many situations.

A final observation. In the projects I’ve been involved with, the problems in the ones with worthless unit tests were not due to unit testing per-se, but to a lack of design and automated testing skills inside the teams. In fact, the production code in those systems was invariably bad, and the system tests weren’t in a good shape either.

So, if you are thinking about putting a lot emphasis on system tests for your testing activities, my suggestion is to take a step back and think again, or your project may be killed as a result.

Reinventing the Wheel Considered Useful

We, programmers, are prone to reinvent the wheel—for example, by creating frameworks and libraries instead of using existing ones (very common), or even write our own build tools from scratch (less common).

We do that despite paying lip service to the fact that reinventing the wheel is generally considered a bad thing. In fact, conventional wisdom says that, from a business point of view, reinventing the wheel can be very expensive—developers spend time both to create and to maintain artefacts that could be bought instead—and, from a technical point of view, using a third party product often means fewer bugs to take care of and less time spent doing maintenance.

I used to think that as well. Not anymore. Recently—after giving some references to some widely available good quality ones—I’ve even found myself suggesting a client to keep their in-house C++ unit testing framework if they wanted to. I did that for two reasons: first, it wasn’t a drag on the project; second, I was pretty sure the team had some form of emotional attachment to it and I wanted to keep them. Forcing them to use something else would have probably upset them and made them less productive as a result.

In fact, I’m convinced that allowing a small amount of wheel reinvention in a project has some advantages: it is a cure against boredom, a booster for happiness and motivation, and a way to learn new things. The increased happiness, motivation and the new learnings will improve productivity which, in turn, will help in recovering the associated costs.

During my career, every single team I worked with had reinvented the wheel at least once and, every time, without exception, the programmers who did that were quite proud of their creation and they spent quite a bit of time to improve it. That also gave them something to look forward to, both when the projects were experiencing difficult times, and also when the times where boring and uninteresting—often, when things go very smoothly for a long time, and the technical challenges are few and far between, boredom kicks in and the developers may struggle to stay interested and motivated.

Good programmers are curious by nature and always want to understand how things work, therefore reinventing the wheel can be a great learning experience. For example, writing your own unit testing framework (as many of us did) is a very good way to understand better how that kind of frameworks work, and to understand better the reasons for some of the choices made by the creators of famous ones ones like JUnit, CppUnit, etc. The same can be said of any other frameworks.

A final word of caution. If the amount of wheel reinvention is too much, the project may sink—costs as well as delivery times will go up, and the people in the team may find themselves spending too much time doing infrastructure work and not enough in adding value for the other stakeholders—so you will have to find the right amount for your projects.

Methodology à La Carte

à la carte |ˌä lä ˈkärt, lə|
adjective
(of a menu or restaurant) listing or serving food that can be ordered as separate items, rather than part of a set meal.

I’ve been uncomfortable with the mainstream discussions about software methodology for quite some time. It seems to me that far too many, in the software development community, are in a wrongheaded quest to find The Methodology that will solve all our software development sorrows.

So far we’ve had (just to mention some popular ones): Waterfall, Spiral, Evo, RUP, DSDM, FDD, XP, Scrum, Kanban, Disciplined Agile Delivery; and also some cross-breeds, e.g., Scrum + XP, Scrumban (Scrum + Kanban), etc.

We keep finding that each of those methodologies has many strengths, but also several weaknesses that make each of them applicable in some contexts but not, easily, in others. I think this will always be the case.

Let me explain.

Let’s first look at what a methodology is. The definition I like the most is this one by Alistair Cockburn, found in [1]

Your ”methodology“ is everything you regularly do to get your software out. It includes who you hire, what you hire them for, how they work together, what they produce, and how they share. It is the combined job descriptions, procedures, and conventions of everyone on your team. It is the product of your particular ecosystem and is therefore a unique construction of your organization.

According to the definition above, all methodologies in the previous list are more accurately described as methodology frameworks—they impose some constraints, and make some assumptions about the surrounding context, but leave many (important) details to the specific implementations (note that they include team dynamics and personal preferences. They are very important, but I’m going to leave them out for now).

Constraints and assumptions are both a strength and a weakness of every framework. If they are satisfied in the context where the framework is applied, then using the framework can save time, money, and grief. However, if they are not, using the framework can become difficult, if not detrimental.

For example, think of teams working in fixed length iterations that also have to deal with support issues and point releases outside their standard iteration cycle; I’ve encountered this problem several times with different teams, and the implementation of a solution has never been straightforward.

Another example is TDD. I’m a strong advocate of TDD, however, it is a practice that requires some level of proficiency, and, in some contexts, it is just too difficult to adopt straight away. Sometimes it is just better to start by writing unit tests without caring about when they are written—first or last—as long as they are there (and, before you lambast me on this, I know perfectly well that TDD is not only about testing, but also about design; however that’s not my point here).

I can give many more examples, but the point is that, whatever the methodology framework, some of its assumptions and constraints may not be valid in some contexts.

In my opinion, a better approach would be to create a methodology per project by mixing and matching sound practices, processes and tools—which can be borrowed from existing methodologies, or the literature, e.g., [2], [3], [4]—to fit the context and the needs of the project. This is what “à la carte” is about.

Mind you, I am not claiming to have invented or discovered anything—this is what effective teams have always done (and it’s an approach I’ve been promoting for quite some time [5])—but I think that we, as a community, need to have a different kind of discussion from one focused on promoting one methodology over the others.

Some people pointed it out to me that this approach looks like Crystal [4]. I’ve certainly been influenced by it; however what I’m describing here is neither a methodology nor a methodology family (like Crystal Clear and Crystal, respectively), since it doesn’t impose any constraints or assume any context. All it requires is discipline, mindful choices, and the willingness to improve.

That said, I think that there still is a place for methodology frameworks like the ones mentioned before. In fact, you may be in the lucky position where one of them works for you straight out of the box; however, if you are not, and you fear you may incur in some form of analysis paralysis, you can choose one as a starting point, then modify it as necessary—incidentally, this is what many Scrum and Kanban teams seem to be doing anyway.

I’ll be doing more work on this, and I’ll be speaking about methodology à la carte at the upcoming ACCU 2013 conference in Oxford, UK.

In the meantime, I’ll welcome your feedback.


  1. Cockburn, Alistair, Agile Software Development: The Cooperative Game (2nd Edition) (Agile Software Development Series), Addison-Wesley Professional, 2006.

  2. Beck, Kent and Andres, Cynthia, Extreme Programming Explained: Embrace Change (2nd Edition), Addison-Wesley Professional, 2004.

  3. Coplien, James O. and Harrison, Neil B., Organizational Patterns of Agile Software Development, Prentice-Hall, Inc., 2004.

  4. Cockburn, Alistair, Crystal clear a human-powered methodology for small teams, Addison-Wesley Professional, 2004.

  5. Asproni, Giovanni, Fedotov, Alexander and Fernandez, Rodrigo, An Experience Report on Implementing a Custom Agile Methodology on a C++/Python Project, Overload 64, December 2004.