The Myth of Continuous Performance Testing

The Myth of Continuous Performance Testing

Software development is speeding up. We build it in smaller chunks and release more often. In the software testing space this necessitates the need for testing to keep up, which has largely driven the growth of functional test automation. But, what about performance testing?

Wherever I look online I see mention of 'continuous performance testing'. At a high level this is straight forward - we performance test more often to keep up with this velocity. Often this means building our performance testing into our deployment pipeline. But does automated performance testing actually deliver what it should?

Maintaining Test Suites

In general, load testing assets are fragile. They are more fragile than functional automated test assets because we are mostly simulating network traffic. Depending on the application we are testing it can be hugely time consuming and complicated to build a load testing suite.

The age old problem with load testing assets is that when the application changes, our test suites have a habit of breaking. The effort required to repair or rebuild them can be substantial to the point where it is not cost effective in a rapid iterative life-cycle. From experience, this is especially true of off the shelf and legacy applications.

One or more of the following needs to be in place for us to succeed:

  • The application needs to be testable. For a performance tester, this means the network traffic needs to be consistent and simple enough to keep on top of. That's not something we are always in control of if we have purchased an off the shelf solution.
  • Otherwise we have to limit what we test. If it's not cost effective to continually run and maintain a fully integrated test suite we need to identify what we can test. API's are often low hanging fruit but we need to be careful to keep the overall solution in mind. When we start breaking down our testing to the component level our testing loses a lot of its value.

Load test tool vendors are trying to tell a story about being DevOps and CI/CD ready. The reality is that from a technical viewpoint they have not significantly evolved in over a decade (with a few rare exceptions).

Automated Analysis

So we have some performance testing set up to run every time we deploy. A test is run and some results are recorded. Now what? How do we determine if the test 'passed' or 'failed'?

How do we define pass and fail criteria that mean something to the business? We can define NFR's, but from my experience these are often numbers plucked out of the air without any real connection to what matters. If we have a NFR that response time for all user actions should be 2 seconds or less at the 95th percentile and one low-impact activity takes 2.1 seconds, should we fail the build?

A better way would be to track performance over time. We could compare back to the past dozen runs and look for degradation over time. This is tricky to implement, nothing on the market does it out of the box. And how, even then, do you determine when to fail the build? Still, this is possible and a good avenue for future investigation.

For the meantime, performance testing results need to be analysed by someone who has the skills and experience to make sense of them and communicate them back to the business in a way they understand. Especially when we need to diagnose performance issues and do exploratory analysis. If we are going to automate our performance testing we have to account for that manual effort and the need to have someone with those skills. Our automatic validation is only going to scratch the surface at best.

Test Environments

Performance testing results are only accurate if the environment we test in matches (or is) production. The further we deviate from this, the higher the risk our results do not reflect the real world. Production-like does not just mean the same hardware configuration as production, it also means the same integrations are in place to external components, the software is configured the same way, and the database contains similar data.

So, what if the environment we are testing in is not production like? What can we say from the results? The best we can do is draw comparisons between builds and only about response time. On top of that the response times we observe will not necessarily reflect production, and capacity and stability cannot be measured accurately. It is a good early indication of some performance issues, but that is all.

For any performance testing to provide maximum value we need to test in a production-like environment. For continuous performance testing to work, this means continual deployment into a production-like environment. An an ideal world we would spin up a fully production-like environment at will but this is not the reality for most businesses. So what environments do we have available to us and what can we realistic achieve in them?

Test Duration

Classic performance tests generally run for an hour or more, Much longer if you are testing stability. Running a twelve hour test every time you deploy is hardly pragmatic if you are doing it multiple times a day. So how long should our 'automated' performance tests run for? What about five minutes? This severely limits the value of our testing:

  • We do not get enough sample points to make meaningful conclusions about performance
  • We miss out on periodic patterns over time - e.g. a spike in response time every quarter of an hour will not be picked up
  • We are not running for long enough to assess stability (or even capacity, we need time to ramp up and let the system stabilise)

So there's a conflict here. Either we run very short tests which provide only low-accuracy feedback on response time, or we compromise the agility and speed of our deployment schedule.

Something has to give. The conclusion I keep coming back to is that there are some limited things we can test continually, but there is also a place for some 'big bang' performance testing at less frequent milestones.

Closing

My view is that we shouldn't be diving into the concept of 'continuous performance testing' without properly thinking about whether it actually provides value relative to the cost. More than ever we need performance specialists who can understand both the business risk and but also have the technical depth to understand how a solution is performing.

It's not about doing what is possible. It's about doing what provides the business confidence in the performance of their software, efficiently. And that might mean that performance testing needs to sit somewhat apart of the development life-cycle.

What are your thoughts?

James Leatherman

Battle-tested SRE Leader, Incident Commander, and Observability Champion. I can bring balance to your technology team.

4y

Hey Stephen - I am interested to hear how your thoughts have evolved in the time since you originally wrote this article. I am about to bring my company down this road, and I am anticipating a hybrid Jenkins/k6 approach both in the build pipeline for basic validation, and a more production-like environment for the balance of performance activities. However, I am torn. Actually, I have had doubts about the ROI of large-scale performance test environments for many years, ever since I started concentrating on implementing APMs. I have found that I can catch and triage performance issues faster with a good APM and even limited traffic than I ever could have dreamed of finding even with the most elaborate e2e framework. Cost is a fraction of the latter, too. In fact, at my last company, I did very few hours of performance testing, but working with the system architecture team, brought latency down 40% across the board just by using good monitoring techniques and good communication with development to get quick turnaround. Another, more anecdotal observation - there just aren't many performance engineers around anymore. I tried to find a good one for my team a few years ago, and it was just script slingers coming in to interview. I'm not sure I could even build out a team at this point - not one that was worth the cost, anyway. It seems that most companies rely on other means to keep performance in line, but I am not sure if that means performance in the pipeline, better monitoring, adhoc testing, hiring performance-minded devs, or what. What are your thoughts these days?

Sharath Biderhalli

Senior Technical Program Manager PMP® Games24x7 | Ex-Jio Platforms/AJIO

4y

Totally agree.... But most of the times, business just wants numbers to move on. And lack of env(common issue) makes it really difficult to actually perf test anything in depth without affecting business hours transactions.... Endurance is out of question... Ideally, perf test should break every build to push d threshold in each iteration...it's time business identifies Perf testing as a critical activity of overal SDLC..

Jothi Gouthaman

Director @ Accenture | Automation |Generative AI | CBDC | Commodity Trading

4y

Now with increasing cloud adoption rate rather than performance testing against a SLA it is better to focus on SLO and SLI and leave network and infrastructure with cloud providers as base Contractual SLA

Like
Reply

To view or add a comment, sign in

More articles by Stephen Townshend

  • Monitoring your Mac with Prometheus

    A few weeks ago I was exploring SquaredUp Cloud which is an dashboarding and visibility platform that lets you connect…

    6 Comments
  • Running your first Kubernetes workload in AWS with EKS

    I have been using Kubernetes for about a year and a half, but through all of that time I've only ever deployed…

  • Containerising a Node.js app

    As a Developer Advocate, I need to keep my technical skills up to date and to practice what I preach. One way I'm doing…

  • A Year as an SRE

    A bit over a year ago I transitioned from performance engineering into the world of Site Reliability Engineering (SRE).…

    7 Comments
  • The HTTP Protocol (explained)

    What's this all about? A few years ago, I started writing a book about performance engineering. I only finished a rough…

    6 Comments
  • Running Grafana & Prometheus on Docker

    We're in the process of standing up a monitoring platform on Kubernetes. Before we started this process I had very…

    11 Comments
  • Is cloud computing killing performance testing?

    I 've received a few messages recently from individuals concerned that performance testing is "on the decline". The…

    17 Comments
  • Wrapping up 13 years of performance engineering

    Thirteen years ago, I fired off my CV to a few dozen organisations looking for my first job in IT. Months later, after…

    9 Comments
  • Performance Engineer to SRE?

    Two months ago I transitioned from a performance engineer to a site reliability engineer (SRE). It's been terrifying at…

    21 Comments
  • Before you automate your performance testing…

    This year I’ve been working in a large program of work. My role is to oversee the performance testing and engineering…

    14 Comments

Insights from the community

Others also viewed

Explore topics