Drive Requirements to Testing with BDD to Deliver AI

Drive Requirements to Testing with BDD to Deliver AI

Subtitle: Successful AI Projects are Focused on Outcomes, Not Simply Outputs

We have been using Behavior-Driven Development (BDD) over the past several years to test microservices developed as part of our consulting engagements. Initially, these were for Enterprise Search projects, but more recently, they have been for AI-injected endpoints driving Copilots and Agents.

We chose BDD because it helps bridge the gap between various stakeholders by allowing us to express requirements in a simple form that focuses on purpose. BDD's natural language scenarios (e.g., Given-When-Then) translate perfectly into an LLM transaction.

  • Given – Context
  • When – Action
  • Then – Result

BDD’s notion of living documentation is quite useful for maintaining project requirements and performing regression testing as models and data change, causing changes to the AI inference responses.

This approach provides the basis for your requirements and live documentation in the ever-evolving AI project. Given AI’s somewhat chatty nature, it can also be interwoven into this process to revise and expand test coverage.

Requirements to Testing


Article content
A top-down approach to scoping and breaking down a project

As the diagram displays there is a logical breakdown of the scope of the SOW (or an initiative) to BDD features.  Features are “User Story” or “Use Cases”.  

Step 1 - Organizing the Approach

Scenario is defined in Wikipedia as:

"In computing, a scenario is a narrative of foreseeable interactions of user roles (known in the Unified Modeling Language as 'actors') and the technical system, which usually includes computer hardware and software.

Actors are defined in Wikipedia as:

“Actors may represent roles played by human users, external hardware, or other subjects. Actors do not necessarily represent specific physical entities but merely particular facets (i.e., “roles”) of some entities that are relevant to the specification of their associated use cases. A single physical instance may play the role of several different actors, and a given actor may be played by multiple different instances.

For one of our projects, we have focused on the following themes:

  • Indexing
  • Interfaces
  • Analytics
  • Relevancy
  • Operations

Step 2 - Write the Features

Features can be written in Gherkin.  An example of a simple feature is below.  This is what our requirements would be written in (A feature + scenarios).  A simple way to write a feature to remember the following:

  1. A feature is a user story or a use case
  2. A Scenario should describe an observed behavior from the user's (or actor's) perspective
  3. A Given statement should set up the Context
  4. A When is the Action
  5. A Then is the outcome that should be tested

Final Form:

# -- FILE: features/objective_coding.feature
Feature: Objective Coding

Objective coding tags documents 'objectively' or factually as opposed to subjectively.  As documents are processed, structured data is extracted and added to the document before indexing in the vector database.

Rule: The original author should not come from metadata if the PDF declares an author

  Scenario: Identify the original author for a document for a pdf
  Given an <document>
    And the extension is ".pdf"
    When the document is processed
    Then the asset has a metadata field "metadata.oc.author" with the value of "<name>" 
    And the author is not the "author" field of the document
    Examples:
      | asset                                                                   | name             |
      | Electronic Media/Box 23/Folder 1/001_Technical_Report.pdf	| David Rhodes       |        

Step 3 - Refactor to match previous steps

With Gerkin and Cucumber.  The text of each step is matched against a function.  There should be existing functions that can be drawn from which are in a well-defined testable form.  Additionally using functions is a best practice.

Step 4 Write The Steps

For all of the Steps that have not been completed, write the implementation.

Why Use BDD for AI Testing?

1 - Behavior Drives Development (Pun)

As the name implies, focusing on the Behavior you want to replicate an automate with a structured form will help your project team focus. AI projects often involve diverse teams, including data scientists, developers, business stakeholders, and domain experts. BDD's natural language scenarios (e.g., Given-When-Then) help:

  • Align stakeholders on AI system goals and behaviors.
  • Ensure requirements are clear, measurable, and understood by all parties.

2 - Test Outcomes, Not Just Outputs

Your entire business is focused on producing improved outcomes. The challenge in doing with AI is that it can do "anything". As we discussed in our recent webinar, there are characteristics of successful projects that you want to emulate. Your projects and outcomes are not a simple "is this a cat?". AI systems, being probabilistic, often deliver outcomes rather than deterministic results. BDD is well-suited for:

  • Describing the desired behavior in terms of business outcomes.
  • Defining acceptance criteria that account for variability, such as thresholds for accuracy.

3 - Ensure Explainability

BDD scenarios can incorporate explainability requirements, ensuring the AI behaves in ways that are interpretable and align with business objectives.

Challenges and Considerations

While BDD is powerful, applying it to AI has unique challenges:

  • Probabilistic Nature of AI: Defining deterministic behaviors for non-deterministic systems can be complex.
  • Dynamic Changes: AI models evolve over time, requiring constant updates to scenarios.
  • Metric-Based Verification: Verifying thresholds (e.g., accuracy) requires robust metric tracking and reporting.

Tools for BDD in AI Testing

Several tools can support the implementation of BDD for AI systems:

  • Cucumber/Behave/SpecFlow: Popular BDD frameworks that can be adapted for AI testing.
  • PyTest-BDD: A Python-based BDD framework with flexibility for AI projects.

Conclusion

Using BDD to test AI systems is an innovative approach that combines the collaborative nature of BDD with the need for rigorous AI testing. By defining clear, understandable scenarios, BDD makes AI testing more accessible, structured, and aligned with business goals. While there are challenges, adopting BDD for AI testing is a step toward making AI systems more reliable, ethical, and effective.

As AI continues to integrate into software systems, leveraging methodologies like BDD ensures that these systems behave as intended, fostering trust and delivering real value. As always, reach out to our team if you would like assistance in your AI initiatives.

To view or add a comment, sign in

More articles by Michael Cizmar

Insights from the community

Others also viewed

Explore topics