Issue #12: Five 9's vs. Dart Throws
DALL-E 3 prompt: "Doc and Marty McFly as telephone operators and throwing darts"

Issue #12: Five 9's vs. Dart Throws

Has the descent into the trough of disillusionment begun for generative AI? While stock prices certainly aren't reflecting this yet, some recent think pieces suggest it might be the case (e.g., 1, 2, 3). They center on generative AI's unreliability, caused by a propensity to hallucinate. They also cite AI's susceptibility to being gaslit, as it tends to go into automatic "please teacher" mode, with lack of pushback or logic checks on submitted requests.

Q: How often does generative AI need to get it right in order for it to be useful?

A: It depends on the use case, a nuance that both generative AI naysayer and cheerleaders seem to be handwaving past in pursuit of the simpler headline and story.

The shorthand way that I like to distinguish AI use cases on required reliability: "Is it a Five 9's situation or a dart throw?"

  • Five 9's refers to the notion of being reliable 99.999% of the time. It's a common SLA (Service Level Agreement) for IT systems and communications networks, supposedly originating from AT&T's commitment around telephone service uptime (over the course of a year, 99.999% availability corresponds to approximately five minutes of aggregate downtime). Nothing's 100% reliable, but this is about as close as you can get
  • Dart throws by contrast leave a great deal to chance. Even the top dart throwers hit the treble (inner ring) only about 45% of the time. I prefer this analogy to a coin flip, because (a) it's dependent on skill vs. randomness, and (b) the odds of succeeding can improve over time with work and effort.

While a bit simplistic, forcing yourself into the extremes of "this or that" categorization can help clarify what use cases generative AI can be effectively used for today, what bears further scrutiny and development, and where compromise and risk mitigation might be prudent to effectively unlock value sooner.

What work use cases are more of a dart throw?

  • Brainstorming and ideation
  • Initial drafts and alternates (e.g., make it shorter or longer, adjust it to be in the style of, combine or separate out)
  • Supplementary creative development (e.g., photos, videos, or music to augment a report)

These use cases are commonly "take what you want and leave the rest" situations, particularly given the speed and relatively low cost and upfront input required by generative AI. Even if the hit rate is limited, a couple good ideas or a starting place can make the user's effort to try it out worthwhile. They also fall into the "augment" (vs. "automate") category of use cases

By contrast, five 9's use cases are oriented toward automation and removing the human from the loop:

  • Fully automated, integrated workflows (e.g., processing, transcription, translation)
  • Automated decision-making (e.g., go/no go, talent decisions)

Taking the human completely out of the loop requires extremely high reliability. Inconsistency or inaccuracy means that you'll need to build back in human redundancy, which could erode the benefits of the automation or even make it counter-productive (extra checks and rework). It would also be a systemic issue, corrupting anything downstream dependent on this activity.

Additionally, we all know "garbage in = garbage out." With automated decision-making, you can run the risk of "good stuff in = garbage out" too if the process is unreliable or inaccurate. It also might not be possible to fully query an automated decision-making process in the same way that you would a human-driven one. And if you develop a dependency, it could take loads of time and pain to both spot any issue and correct it.

There are of course shades of gray. For example, natural language searching and summarizing can fall in between five 9's and a dart throw: the generated outputs don't need to be 100% complete and accurate to be useful. Being 60-80% right can provide some valuable points to leverage and jumpstart your work in an impactful way.

However, you also don't want misleading or wrong bits in these products either, so there are meaningful requirements that must be achieved to get to usefulness. Where that bar gets set depends on how this output will be used, and the criticality/sensitivity of the processes and decisions to which they'll be connected. It also requires a critical thinking on the part of end users, because despite safeguards and adjustments, all AI will lie at some point at this stage of its development.

Stepping back, as Ethan Mollick, Sam Altman, and others will remind you, generative AI will continue to get better, and the model you're using at present is the worst you will use going forward. How much better it will get on hallucinations - are they a bug or endemic to gen AI? - remains to be seen.

Right now, there are enough examples out in the world to substantiate two points:

  1. Generative AI can have clear utility and very positive ROI, despite hallucinations.
  2. For some use cases, completely relying on generative AI in its current state can have significant negative impacts as a result due to hallucination, so we always need to be conscious of and on the lookout for them.

As practitioners of generative AI (which all of us should be to some degree at this point), we must be thoughtful about our use cases: which use cases are closer to "five 9's" or "dart throws," and how should we adjust our expectations and the application of gen AI output accordingly to get the best from it and avoid or leave the worst.

Excited to see the new issue! Evaluating generative AI use cases is so crucial for effective implementation. What insights did you find most surprising while creating this framework?

Like
Reply
Isaac Cheifetz

Connecting SAAS Companies to Operating Executives Who Can Help Them Create, Market, & Sell Revenue-Driving Products

8mo

Chris, can we find a time to talk? I have substantive and entertainment value, and lots of original human capital models :)

Like
Reply
Courtney Veirs Bouloucon

Accomplished Sales Professional | 20+ Years Building Strong Client Relationships and Delivering Results

10mo

This is extremely informative, thank you sharing!

Like
Reply
Kayvon Touran

CEO & Cofounder Zal.ai

10mo

Chris Louie Thanks for sharing. I really like your framework. One thing I'll add to your "shades of grey" point - I think it's important to acknowledge use-cases that rely on precision as simply an indicator to add a human-in-the-loop, and not to totally dissuade someone from trying to come up with an innovative solution. I'm biased here, but think there's a huge potential for startups willing to take this initial risk, and rely on a human-in-the-loop to obtain Five 9 accuracy, and still with this approach create more value than what's currently available on market. Especially given the final point you make - where as this technology improves the human will do less, and therefore potential for full automation (and therefore value creation) increases.

Like
Reply
Phil Kirschner

Employee Experience, Future of Work, Org Effectiveness, and Change Management Leader | ex. McKinsey, WeWork, JLL, Credit Suisse | Keynote Speaker | Guide of The Workline | LinkedIn Top Voice

11mo

Love this framework Chris. With the dart throw example, I can't help but think of Luke Littler who recently took the darts world by storm at age 16/17, which supports your idea that you can get better at it with practice. And also that "the kids these days" will pick up this new technology very quickly, so the rest of us have a lot of work to do to catch up and stay current.

To view or add a comment, sign in

More articles by Chris Louie

Insights from the community

Others also viewed

Explore topics