Drive Requirements to Testing with BDD to Deliver AI
Subtitle: Successful AI Projects are Focused on Outcomes, Not Simply Outputs
We have been using Behavior-Driven Development (BDD) over the past several years to test microservices developed as part of our consulting engagements. Initially, these were for Enterprise Search projects, but more recently, they have been for AI-injected endpoints driving Copilots and Agents.
We chose BDD because it helps bridge the gap between various stakeholders by allowing us to express requirements in a simple form that focuses on purpose. BDD's natural language scenarios (e.g., Given-When-Then) translate perfectly into an LLM transaction.
BDD’s notion of living documentation is quite useful for maintaining project requirements and performing regression testing as models and data change, causing changes to the AI inference responses.
This approach provides the basis for your requirements and live documentation in the ever-evolving AI project. Given AI’s somewhat chatty nature, it can also be interwoven into this process to revise and expand test coverage.
Requirements to Testing
As the diagram displays there is a logical breakdown of the scope of the SOW (or an initiative) to BDD features. Features are “User Story” or “Use Cases”.
Step 1 - Organizing the Approach
A Scenario is defined in Wikipedia as:
"In computing, a scenario is a narrative of foreseeable interactions of user roles (known in the Unified Modeling Language as 'actors') and the technical system, which usually includes computer hardware and software.
Actors are defined in Wikipedia as:
“Actors may represent roles played by human users, external hardware, or other subjects. Actors do not necessarily represent specific physical entities but merely particular facets (i.e., “roles”) of some entities that are relevant to the specification of their associated use cases. A single physical instance may play the role of several different actors, and a given actor may be played by multiple different instances.
For one of our projects, we have focused on the following themes:
Step 2 - Write the Features
Features can be written in Gherkin. An example of a simple feature is below. This is what our requirements would be written in (A feature + scenarios). A simple way to write a feature to remember the following:
Final Form:
Recommended by LinkedIn
# -- FILE: features/objective_coding.feature
Feature: Objective Coding
Objective coding tags documents 'objectively' or factually as opposed to subjectively. As documents are processed, structured data is extracted and added to the document before indexing in the vector database.
Rule: The original author should not come from metadata if the PDF declares an author
Scenario: Identify the original author for a document for a pdf
Given an <document>
And the extension is ".pdf"
When the document is processed
Then the asset has a metadata field "metadata.oc.author" with the value of "<name>"
And the author is not the "author" field of the document
Examples:
| asset | name |
| Electronic Media/Box 23/Folder 1/001_Technical_Report.pdf | David Rhodes |
Step 3 - Refactor to match previous steps
With Gerkin and Cucumber. The text of each step is matched against a function. There should be existing functions that can be drawn from which are in a well-defined testable form. Additionally using functions is a best practice.
Step 4 Write The Steps
For all of the Steps that have not been completed, write the implementation.
Why Use BDD for AI Testing?
1 - Behavior Drives Development (Pun)
As the name implies, focusing on the Behavior you want to replicate an automate with a structured form will help your project team focus. AI projects often involve diverse teams, including data scientists, developers, business stakeholders, and domain experts. BDD's natural language scenarios (e.g., Given-When-Then) help:
2 - Test Outcomes, Not Just Outputs
Your entire business is focused on producing improved outcomes. The challenge in doing with AI is that it can do "anything". As we discussed in our recent webinar, there are characteristics of successful projects that you want to emulate. Your projects and outcomes are not a simple "is this a cat?". AI systems, being probabilistic, often deliver outcomes rather than deterministic results. BDD is well-suited for:
3 - Ensure Explainability
BDD scenarios can incorporate explainability requirements, ensuring the AI behaves in ways that are interpretable and align with business objectives.
Challenges and Considerations
While BDD is powerful, applying it to AI has unique challenges:
Tools for BDD in AI Testing
Several tools can support the implementation of BDD for AI systems:
Conclusion
Using BDD to test AI systems is an innovative approach that combines the collaborative nature of BDD with the need for rigorous AI testing. By defining clear, understandable scenarios, BDD makes AI testing more accessible, structured, and aligned with business goals. While there are challenges, adopting BDD for AI testing is a step toward making AI systems more reliable, ethical, and effective.
As AI continues to integrate into software systems, leveraging methodologies like BDD ensures that these systems behave as intended, fostering trust and delivering real value. As always, reach out to our team if you would like assistance in your AI initiatives.