Introducing Fitness Box Score

When you use an evolutionary approach to improving your product, service or service delivery then you need a method to evaluate whether a change to the design actually provides an improvement. Whether or not a product or service is fit for its purpose and likely to survive and thrive is always seen through the eyes of the consumer. Traditional approaches to market research and customer satisfaction surveys are not best suited to evolutionary approaches to a market as an ecosystem. It is important that a survey doesn't second guess how or why a consumer is using a product or service. Instead it is better to use the survey to sense their purpose, needs and expectations.

Back in January I posted our first take on replacing Net Promoter Score for evaluating service delivery using our new Fit For Purpose survey method. Since that initial posting, we, and others in our distribution channel globally have been experimenting with the technique. During our summer Kanban Leadership Retreat series of consultants' camps we reviewed our results and revised the method. This post represents the official publication of version 2.0 of Fit For Purpose survey and the new Fitness Box Score reporting method…

Motivation

Firstly, a reminder about the motivation behind Fit For Purpose scoring and why we felt the need to introduce a new method of sampling customer satisfaction:

Fit for Purpose Score was created for two reasons. The first was the need to evaluate whether a product or service that is being developed and improved in an evolutionary, and experimental, fashion, is in fact improving in the eyes of the consumer. The second, and not unrelated motivation, came from client dissatisfaction with the use of Net Promoter Score (NPS) which was often described as lacking actionable guidance. Net Promoter Score would tell a business whether its goods or services were more popular than before, or not, but often failed to provide any insights into what could be done to improve a bad score. We also desired to tie the results of customer surveys directly to actionable metrics, or Key Performance Indicators (KPIs). For example, if customers value delivery time, and their threshold of tolerance is 2 days, them the KPI for the service should be Lead Time with a service level expectation (SLE) of 2 days. When metrics are expressed this way and tied directly to customer expectations then we refer to them as Fitness Criteria Metrics and the level of acceptable performance as the Fitness Criteria Threshold Level, or just "threshold value" for short. Meanwhile, we'd like to provide a single number as an executive report that indicates how "fit for purpose" a given product or service is perceived by its consumers.

Fit For Purpose Survey

Question 1. Tell us why you chose our [product or service]? Select up to 3 reasons or objectives you had when choosing our [product or service].

(a) ________________________________________

(b) ________________________________________

(c) ________________________________________

Question 2. For each reason or objective in Question 1, please indicate how fit for purpose you found our [product or service] in fulfilling your expectations. Please score each reason or objective separately using the following scale

     5. My expectations were exceeded
     4. My expectations were fully met
     3. My expectations were mostly met but a few minor concerns remained
     2. Some significant needs were unaddressed
     1. I got some value but most of my expectations were unmet
     0. I found nothing useful. It was unfit for this purpose

Score (a) _______   (b) _______ (c) ________

Question 3. Tell us why you gave the score(s) in Question 2

(a) ________________________________________

(b) ________________________________________

(c) ________________________________________

What We've Learned From Using Fit For Purpose Surveys

Firstly, we learned that people often consume products or services with multiple purposes or objectives in mind. The survey has now been modified to allow for this. We believe that allowing up to 3 reasons is sufficient and doesn't become overwhelming. Previously when we used paper forms we found respondents would modify the form to allow them to describe multiple purposes. With the relatively small sample set of surveys we have collected over the past 3 months, we believe that entering up to 3 purposes will be optimal.

Secondly, we have found that we wished to drop the use of Net Fitness Score. Net scores hide vital actionable information. This has been replaced with Fitness Box Score [see below].

Thirdly, we have found the results from the surveys to provide actionable guidance. They have also provided valuable insights into why people consume products or services and this has enabled refined market segmentation and more focused marketing messages. We have found it easy even using manual effort to cluster responses to Question 1 to provide a set of purposes, and to cluster responses to Question 3, to provide actionable insights. We can also correlate answers to Question 2 against declared purposes to better understand consumer behavior and what we wish to amplify or dampen through our marketing.

Fourthly, to calculate overall scores, we are currently taking the best score indicated by an individual. We are considering experimenting with explicitly asking for an overall rating separately from specific reasons, purposes or objectives.

We have also found that people aren't always honest about their reasons. Some purposes can be personally embarrassing and individuals are reluctant to admit them. For example, people attending our conferences who are really just having a paid vacation at their employer's expense and are uncomfortable admitting it.

Finally, a word of caution, this approach is still nascent and evolving as we learn from its usage. A recent survey conducted in Spain after a training class, produced no result better than 3 with the somewhat philosophical response that "no purpose could ever be truly and completely fulfilled." Whether these Spanish philosophers are an outlier and an anomaly remains to be seen.

Also, we developed this survey for use with services. It isn't clear yet how it will play out with physical products such as lawnmowers or toothbrushes. What is sure, is that we will keep learning from the use of Fit For Purpose Surveys and we expect some further refinements in the method.

Fitness Box Score

Fitness Box Score replaces Net Fitness Score. Scores of 4 or 5 are considered "fit for purpose". Scores of 3 are considered "neutral" as the consumer is likely to be sufficiently happy that they may provide repeat business or recommendations, certainly won't speak badly or our product or service, but are offering us potential ways to improve and refine the product or service, if we can adequately capture their responses to Question 3. Scores of 2 or below are considered "unfit for purpose."

Previously, Net Fitness Score was modeled after Net Promoter Score, where the net score is the percentage of responses indicating Fit For Purpose less the percentage indicating Unfit For Purpose. However, net scores mask valuable information. To illustrate this consider the degenerate case of 50% indicate Fit for Purpose, while the other 50% indicate Unfit for Purpose. The net score is 0%. This is the same result as 100% indicating a neutral 3 out of 5. These are not equivalents. 100% voting 3 indicates our product is mediocre but could be improved. While a 50-50 split of Fit-Unfit indicates that we need to do better market segmentation and our product is already very adequate for some market niches. One result tells us we _must_ improve the product or service delivery, while the other tells us we ought to improve our marketing. As these are very different results, we feel net scores are highly misleading. Metrics designed for consumption by senior executives become dangerous when they provide misleading information and uncertain guidance. So Net Fitness Score is gone and it is replaced by Fitness Box Score, a system inspired by historical approaches to reporting baseball games.

Fitness Box Score is written like this…

[% Fit : % Neutral : % Unfit] (# responses / total sample population)

For example, the overall Fitness Box Score for the Kanban Leadership Retreat APAC in Bali, Indonesia in June…

[60:30:10] (18/20)

There were 20 paying attendees. Of these 18 responded to the survey - 2 people had to leave early. Of the 18 who responded 60% thought the event was "fit for purpose", 30% were neutral but largely satisfied while 10% had come to the wrong event and were unhappy, for them the event was "unfit for purpose."

It is currently recommended that the scores are rounded to the nearest 5%ile although this creates the possibility for the numbers to add to 105 on some occasions.

When drilling into specific purposes we report the total sample population as those who actually responded to the survey. Here is the full set of analyses for the Kanban Leadership Retreat event in Bali

Purpose                                                   Score
Networking                                               [85:15:0] (13/18)
Learn Kanban                                          [45:45:15] (7/18)
Knowledge Sharing                                  [50:50:0] (6/18)
Feedback & Validation                             [75:25:0] (4/18)
Experience KLR event                             [100:0:0] (2/18)
Kanban Marketing & Sales Strategy        [0:50:50]  (2/18)
Meet David J. Anderson                          [50:50:0] (2/18)
Improving Project Delivery                       [0:0:100] (2/18)

From these results, we can see that the majority of attendees come to Leadership Retreats for the reasons we hope - networking, knowledge sharing, feedback and validation.  This being our first event in Asia Pacific we also attracted people who might have been better at a more general Lean Kanban Conference - learn kanban and improving project delivery.

With this particular survey we learned how we can improve our marketing message and we learned that we should consider offering a Lean Kanban Conference in South East Asia targeting a general Asia Pacific audience.

Conclusions

Fit for Purpose Surveys are proving useful. In general, we believe that a strong "fit for purpose" score will correlate to and be a leading indicator of a product or service that will thrive and survive. At this time, we would continue to use it in conjunction with other customer satisfaction survey methods. For example, Fit For Purpose Survey doesn't explicitly ask "How did you like the location of the conference?" or "How did you find the catering and service at the venue?" However, if these were unsatisfactory we would expect them to show up in the answers to Question 3. There are times, however, where you do wish to ask explicit questions and seek specific answers.

As with all customer surveys, the great flaw in a method like this is that it doesn't survey those who didn't choose your product or service. We don't have survey results for those who haven't selected Kanban training or Lean Kanban conferences. So we are unable to tell whether our marketing is truly "fit for purpose" or whether it is dissuading potential customers and sending the wrong signals.

So Fit For Purpose Survey isn't a silver bullet but it is a useful new marketing tool that is highly compatible with evolutionary approaches to improving products or services such as Lean Startup , Kanban and Enterprise Services Planning. We believe that analyses from these surveys will dramatically improve decision making on when to pivot or stick with an existing concept or business model, and the success of pivots will be dramatically improved through the insights on customer purpose and the resulting market segmentation derived from it.

Marco Bresciani

Senior software engineer

8y

I think I'll use my own Italian translation of the survey in a small (3 people only) upcoming internal presentation... let's see the results!

Rob Myers

Still happily teaching TDD, BDD, A-CSD, & Extreme Programming (XP) | Glacially-paced author: Essential Test-Driven Development (Q3 2025) | Certified Scrum Alliance Trainer (CSAT) | 40-year Zen Buddhist dilettante-monk

8y

NPS has always seemed a bit...off. I'd like to try your approach with my courses. Would you suggest providing the survey to the purchaser(s) only, the participants only, or both groups?

Like
Reply
Horacio Gonzalez

Scalability in the Cloud, Operations, and Software Development

8y

Interesting. Have you seen something like this implemented as a post-service satisfaction survey in a Kanban system?

Alexei Zheglov

Expert in flow and improvement of intellectual enterprises

8y

Rodrigo Yoshima I've also noticed in private-class surveys that the more senior participants, executive sponsors, those who made the decisions to purchase training give higher scores than their subordinates who were ordered to attend the training.

Anubhav Sinha

Building Beyond Playbooks | Product Operating Model | 16+ years of exp in Building B2B Product, Business Aligned Approach and Transformation | Pricing, B2B SaaS

8y

I am gonna also use this scale for INVEST model while performing story estimation and independent ! Sanjay Kumar - thanks for sharing this article !

To view or add a comment, sign in

More articles by David Anderson

Insights from the community

Others also viewed

Explore topics