We have never have enough time for testing, so let’s just write the test first.

Kent Beck

What’s done, is done.

Shakespeare, Macbeth, Act III, scene II

Test-First Abstract

Agile testing differs from the big bang test-at-the-end approach of traditional development. Instead, code is developed and tested in small increments, often with the development of the test itself preceding the development of the code. In this way, tests serve to elaborate and better define the intended system behavior before the system is coded. Quality is built in from the beginning. This just-in-time approach to elaboration of intended systems behavior helps avoid the need for lengthy, detailed requirements specifications, and the delay-inducing drafting and sign-offs that are often required in traditional software development. Even better, unlike traditional software requirements, these tests are automated wherever possible, and even when not, they serve as tests a definitive statement of what the system actually does, rather than what we thought it was supposed to do.

This article describes a comprehensive approach to agile testing based on Brian Marick’s four quadrant agile testing matrix. Quadrants 1 and 2—the tests that support the development of the system, and thereby define and determine the intended and actual system behavior—are described in depth in this article. Quadrants 3 and 4 are described in Release and Nonfunctional Requirements, respectively.


The Agile Testing Matrix

XP proponent and Agile Manifesto signer Brian Marick described a matrix that helps us reason about the unique role of testing in the Agile paradigm. This matrix was further developed in Agile Testing [1] and extended for the scaled agile paradigm in Agile Software Requirements [2]. With another modest extension with respect to when to test what, we come to the matrix in Figure 1.


Figure 1. Agile Testing Matrix

Figure 1. Agile Testing Matrix with Test-First, Test-Continuously and Validation Test quadrants


The horizontal axis of the matrix consists of business or technology facing tests. Business-facing tests are understandable by the user and are described in words from the business domain.

Technology-facing tests are written in the language of the developer, and are used to evaluate whether the code delivers the behaviors the developer intended.

The vertical axis highlights how tests are further classified as being either in “support of programming” (internal system tests we use to evaluate our code) or are used to “critique the product” (tests that make sure the external behavior of the system meets the end users actual requirements).

As described below, mapping the various types of testing into these four quadrants provides a comprehensive testing strategy that can help assure quality in the delivered system.

Quadrant 1 contains Unit Tests and Component Tests. These tests are written by developers to test whether the system does what they intended it to do. These tests can be largely automated, so they persist as a low cost way of assuring that the system works as intended, both before and after changes to the code. Since there are a large number of them, they can and should be automated in the various unit testing environments.

Quadrant 2 contains user-facing Story and Feature Acceptance tests. Story acceptance tests validate that each new Story works the way the Product Owner (customer, user) intended. Feature-level acceptance testing works in a similar manner, but tests the higher-level features of the system, which aggregate the behavior of many user stories. Many of these tests can be automated—the more the better—but some of these tests are likely to be manual.

Taken together, these two quadrants test the functionality of the system against the user stories and higher-level features. We can further describe Quadrant 1 and 2 as Test-First Tests, and as we will see below, we use the test-first approach to make sure we understand what the system is really supposed to do, before we commit the code. That makes the coding process more efficient and also serves as a protocol between the developer and the user, or user proxy, to establish the real system intent.

Quadrant 3 contains System Acceptance Tests, system-level tests used to determine whether the aggregate behavior of the system meets its usability and functionality requirements, including the many variations (scenarios) that may be encountered in actual use. These can include exploratory testing, user acceptance testing, scenario based testing, final usability testing and more. These tests are often largely manual in nature, because they involve users and testers using the system in actual or simulated deployment and usage scenarios. We classify these tests as validation tests, as there is usually some amount of final system validation required before delivery to the end user.

Quadrant 4, contains System Qualities Tests, which are used to determine whether the system meets its Nonfunctional Requirements. Such tests are typically supported by a class of testing tools, such as load and performance testing tools, which are designed specifically for this purpose. Here, we need to test continuously, because any change to the underlying system could accidentally violate one of the NFRs.

In this article, we’ll describe the tests in Quadrants 1 and 2. In Nonfunctional Requirements, we’ll describe the testing in Quadrant 4, wherein we will need to test continuously. We describe Quadrant 4 testing in Release, reminding us that there is virtually always some finale system validation, and yes, even some documentation, that has to be readied before we deploy to the end user.

Test-Driven Development

Beck [2] and others have defined a set of XP practices described under the umbrella label of Test-Driven Development, or TDD. In TDD, the focus is on writing the unit test before writing the code. For many, TDD is an simply an assumed discipline in agile development. The practice is straightforward in principle:

  1. Write the test first. Writing the test first assures the developer understands the required behavior of the new code.
  2. Run the test, and watch it fail. Because there is as yet no code to be tested, this may seem silly initially, but this accomplishes two useful objectives: it tests the test itself and any test harnesses that hold the test in place, and it illustrates how the system will fail if the code is incorrect.
  3. Write the minimum amount of code that is necessary to pass the test. If the test fails, rework the code or the test as necessary until a module is created that routinely passes the test.

In XP, this practice was primarily designed to operate in the context of unit tests, which are developer written tests (also code) that test the classes and methods that are used. These are a form of “white-box testing” because they test the internals of the system and the various code paths that may be executed. Pair programming is used extensively as well, so you can well imagine that when two sets of eyes have seen the code and the tests, its probable that the module is of extremely high quality. Even when not pairing, the test is “the first other set of eyes” that see the code, and developers note that they often Refactor the code in order to pass the test as simply and elegantly as possible. This is quality at the source—one the main reasons that SAFe is so reliant on TDD.

Unit Testing

Most TDD is done in the context of unit testing, which prevents QA and test personnel from spending most of their time finding and reporting on code-level bugs. This allows additional focus to more system-level testing challenges where more complex behaviors are found based on the interactions of the unit code modules. In support of this value system, the open source community has built unit testing frameworks to cover most forms of testing, including Java, C, C#, C++, XML, HTTP, and Python. Now, there are unit-testing frameworks for most languages and coding constructs a developer is likely to encounter. These frameworks provide a harness for the development and maintenance of unit tests and for automatically executing unit tests against the system under development.

Because the unit tests are written before or concurrently with the code, and because the unit testing frameworks include test execution automation, unit testing can be accomplished within the Iteration. Moreover, the unit test frameworks hold and manage the accumulated unit tests, so regression testing automation for unit tests is largely free for the team. Unit testing is a cornerstone practice of software agility, and any investments a team makes toward more comprehensive unit testing will be well rewarded in quality and productivity.

Component Testing

In a like manner, component testing is used to test larger-scale components of the system. Many of these are present in various architectural layers, where they provide services needed by features or other components. Testing tools and practices for implementing component tests vary according to the nature of the component. For example, unit testing frameworks can hold arbitrarily complex tests written in the framework language (Java, C, C#, and so on), so many teams use their unit testing frameworks to build component tests. They may not even think of them differently, as its simply part of their testing strategy. Acceptance testing frameworks (see below), especially those at the level of http Unit and XML Unit, are also employed. In other cases, developers may use other testing tools or write fully customized tests in any language or environment that is most productive for them to test these larger system behaviors. These test are automated as well, where they serve as a primary defense against unanticipated consequences of refactoring and new code.

Acceptance Test-Driven Development

In Quadrant 2 of the Agile Testing Matrix, we see that the philosophy of Test-First applies equally well to testing stories and features as it does to unit testing. After all, the goal is to have the story work as intended, not to simply have the code do what we intended it to do. This is called Acceptance Test-Driven Development (ATDD), and whether it is adopted formally or informally, many teams simply find it more efficient to write the acceptance test first, before developing the code. Pugh [4] notes that the emphasis here can be viewed more as expressing requirements in unambiguous terms, than a focus on the test, per se. He further notes that are three alternative labels to this requirement detailing process – ATDD, Specification by Example, and Behavior-Driven Design. There are some slight differences to these three versions, but they all emphasize understanding requirements prior to implementation. In particular, Specification by Example suggests that the Product Owner should be sure to provide examples, as they often do not write the acceptance tests themselves.

Whether its viewed as a form of requirements expression, or as a test, the understanding that results is the same. The acceptance tests serve to record the decisions made in the conversation (see user story card, conversation, confirmation) between the team and the product owner, so that the team understands the specifics of the intended behavior the story represents.

Story Acceptance Tests

Story acceptance tests are functional tests intended to assure that the implementation of each new user story delivers the intended behavior. If all the new stories work as intended, then it’s likely that each new increment of software will ultimately satisfy the needs of the users. Story Acceptance tests:

  • Are written in the language of the business domain
  • Are developed in a conversation between the developers, testers, and the product owner
  • Are black-box tests in that they verify only that the outputs of the system and meet the conditions of satisfaction, without concern for how the result is achieved
  • Are implemented during the course of the iteration in which the story itself is implemented

Although everyone can write tests, the Product Owner, as business owner/customer proxy, is generally responsible for the efficacy of the tests. If a story does not pass its test, the teams get no credit for the story, and the story is carried over into the next iteration, where the code or the test, or both, are reworked until the test passes.

Feature Acceptance Tests

In a similar manner, feature acceptance testing is performed for all features that are implemented during the course of a Program Increment. The tools used are typically the same, but these tests operate at the next level of abstraction, typically testing how many some number of stories work together to deliver a larger value to the user. Of course, we can easily have multiple feature acceptance tests associated with a more complex feature, and the same goes for user stories as well. In this manner, we have strong verification that the system works as intended, at both the feature and story levels, as is illustrated by Figure 2.


Figure 2. Acceptance Testing Verification

Figure 2. Acceptance Testing Verification

1. Features and stories cannot be considered done until they pass one or more acceptance tests.
2. Stories realize the intended features.
3. There can be more than one test associated with a particular feature or story.

Automating Acceptance Testing

Because acceptance tests run at a level above the code, there are a variety of approaches to executing these tests, including manual tests. However, manual tests pile up very quickly (the faster you go, the faster they grow, the slower you go), and eventually, the number of manual tests required to run a regression slows down the team and introduces major delays in value delivery.

To avoid this, teams know that they have to automate most of their acceptance tests. They use a variety of tools to do so; including the target programming language (PERL, Groovy, Java) or in natural language as supported by specific testing frameworks such as RobotFramework or Cucumber, or perhaps they use table formats as supported by the Framework for Integrated Testing (FIT). The preferred approach is to take a high level of abstraction that works directly against the business logic of the application, and thereby not be as encumbered by the presentation layer or other implementation details.

Acceptance Test Template/Checklist

Whenever a story is selected for implementation, usually during backlog refinement, teams create an acceptance test that further refines the details of a new story and defines the conditions of satisfaction that constitute acceptance by the product owner. In addition, in the context of a team and a current iteration, the domain of the story is pretty well established, and certain patterns of activities result, which can guide the team to the work necessary to get the story accepted into the baseline. To assist in this process, it can be convenient to the team to have an ATDD checklist—a simple list of things to consider—to fill out, review, and discuss each time a new story appears. ASR [2] provides an example of such a story acceptance testing checklist.


Learn More

[1] Crispin, Lisa, and Janet Gregory. Agile Testing: A Practical Guide for Testers and Agile Teams. Upper Saddle River, NJ: Addison-Wesley, 2009.

[2] Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.

[3] Beck, Kent. Test-Driven Development. Boston, MA: Addison-Wesley, 2003.

[4] Pugh, Ken, Lean-Agile Acceptance Test-Driven Development: Better Software Through Collaboration, Addison-Wesley, 2011.


Last update: 24 July, 2014

This information on this page is © 2010-2014 Leffingwell, LLC. and is protected by US and International copyright laws. Neither images nor text can be copied from this site without the express written permission of the copyright holder. For permissions, please contact