Scaling Agile: How Many Teams Are Too Many?

This is  the second instalment of a “Scaling Agile” blog series. The first instalment was  “Scaling Agile: A Law And Two Paradoxes”. Before proceeding further, please, read it, as it sets the context for what follows.

In this post I’ll suggest a way to find an answer to this question:

Q1: How many people and teams can be added to the project to deliver more features in less time?

In other terms, how can the throughput of delivered features be increased? This isn’t always the right question to answer (almost never is) as focusing on throughput of deliverables is not the same as focusing on value delivered to the customers, but, in the following, I’ll make some simplifying assumptions and show that, also in ideal scenarios, there are some hard limits to how much a project can scale up. In particular, I’ll assume that the project meets all prerequisites for scaling (described in “Scaling Agile: A Law And Two Paradoxes”), which, among other things mean:

  1. Requirements are subject to ruthless prioritisation—i.e., non-essential, low value features are aggressively de-prioritised or binned. In this scenario there is a clear positive relationship between features and value delivered
  2. All teams currently in the project are working at peak effectiveness and efficiency—i.e., the existing teams are already as good as they can be, and they might (but not necessarily will) be able to do more only by increasing their size or their number
  3. There are effective metrics in place to measure, among others, productivity, quality, and throughput

Being able to answer Q1 is important as “deliver more faster” seems to be the main reason for scaling up in most large software projects. As it happens, some time ago, I was hired as a consultant in a very large scale agile project precisely to answer that question.

The very first thing I did was to survey the literature to find if anybody had already answered Q1. In the process, I discovered that the scaled agile literature has quite a bit of information about the pros and cons of component vs feature teams, but—despite this being a very obvious and important issue in the context of scaling—I couldn’t find much information that would help in answering it.

Looking further, I read again Fred Brooks’s “Mythical Man Month” book, and came across this quote (highlights mine):

The number of months of a project depends upon its sequential constraints. The maximum number of men depends upon the number of independent subtasks. From these two quantities one can derive schedules using fewer men and more months. (The only risk is product obsolescence.) One cannot, however, get workable schedules using more men and fewer months.

If you haven’t recognised it yet, that is Amdahl’s Law applied to teams. That made perfect sense to me. Here are a couple of important implications:

  1. The total time spent in sequential activities—anything that has to be done by one team at the time but affects most or all other teams, e.g., the creation of a common component or library, setting up some common infrastructure for testing and CI, etc.—is a lower bound for the time necessary to deliver the required functionality. The project cannot go faster than that
  2. The maximum number of independent sub-projects, in which the main project can be split, is an upper bound to the maximum number of teams that can be added productively to the project. Note that “independent” in this context is a relative concept—sub-projects of a big project always have some dependencies among them, and “independent” ones have just a few

The picture below (which is meant to be descriptive, not mathematically accurate) shows what can be achieved in practice—as you can see, it’s far less than the theoretical limits discussed above:

  • The straight line labelled “What managers hope” describes the typical attitude I’ve seen in many projects: managers add teams expecting to achieve linear scalability for throughput
  • The line labelled “What Amdahl says” describes the upper bound given by Amdhal’s Law, which tends asymptotically to a finite maximum value for throughput (remember the amount of sequential activities? That’s why there is the asymptotic value), therefore, even in the scenario in which the teams were completely independent from each other, after a certain point adding new teams would be pointless
  • The line labelled “Reality” describes what happens in reality. The throughput will increase much less than the theoretically predicted level, and will peak when the number of teams reaches a maximum k. That’s the point where communication and synchronisation issues start to become the predominant factors affecting throughput. Any more teams than that and the overall throughput for the project will go down. If you ever worked in projects with more than a (very) few teams chances are you’ve seen this happening first hand

ThroughputConcurrentTeams

There are three important things to notice.

The first is that the cost per deliverable will increase (or, equivalently, productivity will decrease) more than linearly with the number of people involved, and it may become unacceptable well before scaling to k teams.

The second is that the shape of the “Reality” curve above is independent on the type of teams—component or feature, or any mix of the two—and it will always below Amdahl’s curve.

The third is that, independently of any methodology or process used (agile or otherwise), the curves for the throughput will always resemble the ones above. In other terms, those relationships are more fundamental than the methodologies used and cannot be eliminated or avoided.

Now, suppose cost is not a problem, and that time to market is more important. To answer Q1 we can either try to calculate the value of k in some analytical way (which I don’t know how to do, or if it is even possible in some contexts), or we can do something else—i.e., add people, measure the effects, and act accordingly. The second approach is the one I suggested to my client. Specifically:

  1. When increasing the size of an existing team do the following:
    • Check with the teams involved if they need help—they might already be working at peak throughput, with everybody busy, but not overloaded, in which case they are better left alone
    • If increasing the size of the team is a viable proposition, do it incrementally by adding a few people at a time. Measure the effects (using the metrics you’ve got already in place). There may be a small throughput drop in the short term, but then throughput should increase again after not too long (e.g., a couple of sprints if using Scrum). If it doesn’t, or if quality suffers, understand the reasons and, if necessary, revert the decision remove the new members from the team
  2. When adding a new team to the project do the following:
    • Ensure that the scope of work is well understood and is sufficiently self-contained with minimal and clear dependencies on other teams
    • Start small. 3-4 people maximum with the required skills—including knowledge of the technologies to be used, and of the domain
    • The Product Owner for the new team is identified and available
    • The team is given all the necessary resources to perform their job—e.g., software, hardware, office space, etc.
    • There is an architect available to help the team proceed in the right directions according to the reference architecture of system
    • Measure the effects. There may be a small decrease in throughput in the short term, but then it should increase again after not too long (e.g, a couple of sprints if using Scrum). If it doesn’t, or if quality suffers, understand the reasons and, if necessary, revert the decision and remove the team from the project

As you can see, adding people might, in some circumstances, make the project faster, but there are some hard limits to the number of people and teams that can be added, and the costs will increase more (usually much more) than linearly with the number of people—even in an almost ideal situation. As my friend Allan Kelly says: “Software has diseconomies of scale – not economies of scale”.

If you, despite all the dangers, decide to scale up your project, and try to do so applying the recommendations above, I would love to hear your feedback about how it worked out.

The next instalment of this series will be about component teams vs feature teams.

Scaling Agile: A Law And Two Paradoxes

Scaling Agile is all the rage nowadays. Apparently many, mostly big, companies have “big projects” that need plenty of people to be developed, and they want to do that in an Agile fashion in order to reap the benefits of quick feedback and responding to change, but on a grander scale. At least, this is the official reason. The real reasons, in my experience, are closer to “do more in less time”. But that’s another story—here, instead, I want to share an empirical law and two paradoxes I discovered while working in large scale software projects, and then provide some anecdotal (i.e., based on my experience) justification for them.

Fundamental law of scaling: Scaling up amplifies the bad and makes the good more difficult.

First paradox of scaling: Most projects are scaled up because they don’t fulfil the prerequisites for scaling.

Second paradox of scaling: Projects fulfilling the prerequisites for scaling have a lesser need to scale.

Before explaining the law and the two paradoxes above, here is why I think I can claim some expertise. In the last few years, I’ve been hired as:

  1. Contract developer and Agile coach on several projects involving at least 4-5 teams
  2. Methodology consultant, coach and mentor (both for teams and management) on one project involving 10 teams with ~100 developers and testers and on another involving ~80 geographically distributed teams with ~700 developers
  3. Project reviewer with the task of giving some recommendations on how to (or not to) scale up in one project involving more than 1000 developers organised in more than 100 geographically distributed teams

Interestingly, despite the difference in size, all projects above had remarkably similar issues—e.g., lack of communication, lack of shared goals, lack of synchronisation, etc.—which required similar solutions (none of them involving the use of any of the scaled agile frameworks currently available on the market).

Now, let me explain the law and the two paradoxes.

The fundamental law of scaling will be obvious after you read the following definitions of “bad” and “good” in the context of the law of scaling.

The “bad” represents any problems of any kind that negatively impact the project—e.g., technical debt, lack of communication, bad management, bad planning, bottlenecks, etc. As the number of people and teams grows the impact of those problems will become bigger, making the situation increasingly difficult (end expensive) to sustain.

The “good” represents the activities that make the project run smoothly—e.g., proper planning, automation, prioritisation, appropriate architecture and design, managing technical debt to keep it low, clear communication, high visibility, continuous delivery, etc. As the number of people and teams grows, performing those activities properly will become increasingly difficult. If that was not enough, some of those activities will require fundamental changes—e.g., planning for one team is quite different to planning for two or more (or 120)—for which people may not be prepared.

Now, let’s have a look at the paradoxes. In them I mentioned some prerequisites for scaling. Here they are:

  1. Clear shared goals—without these, people pull in different directions and there will be little or no progress
  2. High quality standards and the ability to deal effectively with technical debt—if the quality is low to start with, things will only get worse as the number of people and teams grows
  3. An architecture suitable for scaling—without it the teams will interfere with each other, resulting in slower progress and more defects
  4. Availability of hardware and software resources. Lack of resources is a common and serious bottleneck
  5. Plenty of efficient and effective automation—any manual activity has the potential to become a major bottleneck in the presence of many teams (e.g., manual testing)
  6. Effective and efficient communication channels—e.g., face to face chats, information radiators, etc. With more people, there will be more communication channels and an increased need for visibility and synchronisation
  7. People with the necessary management and technical skills—managing a single-team project is quite different from managing a multiple-teams one. Likewise, a multi-team project requires developers to improve their design, coding, and infrastructure skills
  8. Appropriate metrics in place to measure productivity and quality and other interesting aspects of the project. Without them it will be impossible to decide if scaling up was a good idea or not
  9. The ability of creating good user stories with a clear description of what is needed, and clear and crisp acceptance criteria. Bad user stories, in my experience, are one of the main causes of pain, defects, misunderstandings, and wastes of time
  10. Very good prioritisation and planning skills—they are very important for single-team projects, even more so for multiple-team ones. In particular, multiple-teams projects behave very similarly to concurrent distributed systems—therefore planning activities need to keep into account that the teams will often be out-of-sync with each other, in order to avoid nasty surprises

Let’s now have a look at how the first paradox of scaling works. When one or more of the above prerequisites is not fulfilled, the result is that the team (or teams) involved will be perceived as “slow”. In that scenario, the typical reaction of management, in most companies I’ve worked with, is to increase the number of people or teams—usually without even asking the teams already on the project—and make a bad situation worse. I’ve seen this behaviour even in situations where point 8. above was satisfied—for some reasons, managers wouldn’t look at the available data and would act based on their “past experience” instead.

As far as the second paradox of scaling is concerned, my experience is that, teams that are good at all ten points above are already fast enough that even thinking of scaling never becomes an issue. Or, if the number of requirements goes up to an unsustainable level, take some actions to solve or mitigate the issue, e.g., prioritise requirements, and bin low priority ones, more ruthlessly. In other terms, they will try to improve the way they work to avoid scaling up until it becomes really necessary.

This is the first in a series of posts about large scale Agile development. In the next post I’ll assume all the prerequisites above are fulfilled, and show how to decide if the project can be scaled-up or not.

System Tests Can Kill Your Project

Introduction

Automated testing (I’m deliberately ignoring manual testing in this post) has become a hot subject for which many people have strong opinions, especially when unit testing and system testing are involved (integration and component testing seem to be a bit less controversial). The debate, typically, is around what kind of testing is more effective in helping to find and prevent software defects when developing applications. I think each type of automated testing has its place. However, in my experience, too much emphasis on system tests can slow a project down to a crawl or even kill it. This applies to system of all sizes: from small ones developed only by one person, to big distributed systems developed by several geographically distributed teams. The only exceptions I’ve seen were quick throwaway applications, for which often there are very few, if any, smoke tests.

In particular, in the (quite a few) legacy systems I worked on, the difficulty in maintaining and evolving them was never due to the absence of tests per se—system tests, in a way or another, were always present—but the absence of more finely grained ones (e.g., integration tests, component tests, and, especially, unit tests), which made it extremely difficult or impossible to change some parts of those systems safely and in a reasonable amount of time.

In the following sections I’ll explain the reasons in more detail.

The Purpose Of Tests

The reasons for testing a software system can be summarised in two broad points:

  1. Verify that the system does what it is supposed to do, and doesn’t do what it is not supposed to do
  2. Support the development and the evolution of the system

The first point should be obvious as it is what, traditionally, tests are supposed to help with. However, even if that point is very important, I think the second one is more interesting.

Successful software systems, not only change and evolve for as long as they are used, but, nowadays, they are often delivered incrementally with new functionality added with each increment. Therefore development teams need a mechanism to check that they haven’t broken any existing functionality (or any system quality) in the process.

Automated testing, albeit having its limitations (e.g., according to some research, formal code inspections can find more bugs), can be a very convenient technique for that purpose, as it is easier to learn and apply, scales better and is cheaper than other techniques (e.g., formal proofs, and formal code inspections).

However, there are some caveats: in order to help with the development and evolution of the system, the tests need to

  • Be reasonably simple to create—or people will not create enough of them, or, perhaps, create just some easy, but not necessarily high value, ones
  • Provide feedback at the right granularity level—running an entire distributed system to check the implementation of a single class or method is inconvenient, to say the least
  • Easy and fast to run—or they won’t be executed as often as they should be (if at all)

Unfortunately, those are areas were system tests have some serious limitations.

Some Limitations of System Tests

System tests are very useful—some qualities, like performance, scalability, usability, and other qualities, are, in many cases, better tested at the system level—however:

  1. They can be quite complicated to setup and run, especially for distributed systems
  2. They can be very slow, making the feedback loop quite long and the implementation of changes in the system very time consuming
  3. They may be very difficult or even impossible to run on a developer’s machine. In those cases they are often run in a shared test environment, forcing the members of the development teams to work serially, therefore slowing everybody down
  4. They don’t scale well compared to other kinds of automated tests. Increasing their number will slow down the testing cycle considerably (and running them in parallel might be infeasible or will just solve the problem for a limited time)
  5. They are unhelpful in pinpointing the hiding places of bugs found in production (even if they can be very useful in reproducing the bugs in the first place)
  6. Their coarse-grained nature makes them unsuitable for checking that a single component, class, function, or method has been implemented correctly.

The last point is probably the most important one—and missed by many, who tend to focus on execution speed instead. Big systems are composed of smaller components, which, in turn, are composed of smaller components, and so on and so forth until we reach class, function and method levels.

Using system testing to see if a method or class works as expected can give is equivalent to check that a small cog in the engine of a car works well by fully assembling the car first and then by trying (I’m using “trying” because, if the various components have not been tested in isolation, many or all of them may not work at all) to run a few laps in a track with it.

That approach is inefficient, and ineffective. Is inefficient because every small change would require a longer than needed testing procedure (making code refactorings a major chore, therefore avoided as much as possible). Is ineffective because the system test may still pass or break for reasons that have nothing to do with the small changes being tested.

The limitations above have quite a big impact on what is tested in practice—system tests are generally used to cover some positive scenarios and a few negative ones, leaving the majority of the possible scenarios out—making over-reliance on system testing quite a dangerous practice. In fact, all the systems I’ve come across that were developed relying almost entirely on system testing, were, without exception, bug-ridden and very difficult to maintain and evolve.

If we add multiple distributed teams to the mix, with each team developing one or more components of the entire system—something quite common these days—the problems above get orders of magnitude worse. I’ve seen those problems first hand several times: I’ve been involved in quite a few distributed projects where the almost total reliance on system testing brought the development activities (not to mention bug fixing and production support) almost to a standstill. The more the teams the worse the situation.

A Better Option

A better option, in my experience, is to rely mainly on lots of unit testing and an healthy amount of component and integration testing, and use system testing mainly to make sure the system hangs together by running the main acceptance and performance tests.

In cases of systems developed by multiple teams, unit and component tests become even more important. Teams working on components depending on each other need to agree on how those components will communicate (they will have to do this anyway), and use that information to create the necessary stubs on their own environments so that each team will be able to work (and to run their tests) in isolation. When all the teams are ready then they can run some integration and system tests to verify there were no misunderstandings in the definition of the communication protocols of the components involved.

Conclusion

I’m not against system tests—I usually like to work outside-in starting with a failing end-to-end test—but I think that they are not the answer to all testing needs. An over-reliance on them can bring development and maintenance activities to a standstill first and to a rewrite from scratch as the next step (which comes with its own problems, and is very rarely a good idea).

Other types of testing like unit and component tests are more suitable than system ones in many situations.

A final observation. In the projects I’ve been involved with, the problems in the ones with worthless unit tests were not due to unit testing per-se, but to a lack of design and automated testing skills inside the teams. In fact, the production code in those systems was invariably bad, and the system tests weren’t in a good shape either.

So, if you are thinking about putting a lot emphasis on system tests for your testing activities, my suggestion is to take a step back and think again, or your project may be killed as a result.