How to test the latest AI models

James Grant

Co-founder at iwantmore.ai | Using AI to help build smarter businesses | CMO | Veteran

Published Feb 27, 2025

Every few days, a new AI model seems to drop from one of the big AI companies. This week it was Anthropic’s turn with Claude 3.7 Sonnet. Social media then explodes with people saying it’s either the best thing in the history of time or massively overhyped. But as a normal person, how do you work out whether a new AI model is actually better than what you were using before?

Thankfully for most of us, you don’t need a brain the size of a planet or a PhD in machine learning to test AI. It will be put through the ringer with Humanity's Last Exam for you! But as an individual to find out for yourself, you just need a bit of curiosity, a decent amount of time and a structured approach.

Here’s how to test if an AI model is worth the hype...

Test it on what you actually do

If you’re a lawyer, get it to summarise a contract. If you’re in training, ask it to create a lesson plan. If you’re in marketing, see if it can write ad copy that sounds even remotely human.

Have a list of your day-to-day tasks and use cases in your back pocket and use the same ones each time.

Most new AI models perform well in general tasks, but can they do your specific tasks well?

Quick test: Ask it to draft an email you’d actually send. If you still need to rewrite most of it, the model probably isn’t saving you time.

Push it with edge cases

Find out where the AI model’s boundaries are. What is its ‘jagged frontier’ (with thanks to Ethan Mollick for the phrase). When does it start going haywire?

Give it an ambiguous request (‘Write a summary of this article’ but don’t give it the article). Does it ask for clarification or just write a nonsense summary of something else?

Ask it to handle nuance (‘Explain AI regulation to a 10-year-old and then to a CEO’). Can it adjust its tone and depth?

If it falls apart on these, it’s probably not as advanced as it claims.

Test its knowledge

New models boast about their knowledge cut-off dates, but that doesn’t mean they understand the information they have been trained on correctly.

Ask it about recent events (e.g. ‘What happened in UK politics last week?’ - actually don’t ask that, it’s too depressing).

Get it to summarise a niche topic you know well. For me it’s instructions on how to make a Rhoorkhee chair. As niche as you can get.

AI is confident even when it's wrong, so if it’s misrepresenting things you know, it’s probably unreliable elsewhere too.

Measure how much effort and time it actually saves you

Recommended by LinkedIn

The Rise of Explainable AI (XAI): Building Trust and…

Arbisoft 12 months ago

Falling in love with AI (GPT-3)? ...only if you can…

Prescott Paulin 4 years ago

The Art of Asking the Right Question: Challenging…

John Kirby 3 months ago

The best AI tools don’t just generate text; they make your work and life easier.

Ask it to help you plan your next holiday, including flights, hotel comparisons, and an itinerary.

Get it to rewrite a bad piece of writing into something clear and professional.

If you’re spending as much time fixing the AI’s output as you would have spent writing or researching from scratch, it’s not a game-changer.

Compare models side by side

The easiest way to see if a new AI model is better? Run the exact same prompt across different models.

I have five different AI models open at any one time on a dedicated screen. Probably overkill, but it really clearly highlights the differences and capabilities of each.

For example, I would try ChatGPT (GPT-4), Claude, Gemini, CoPilot and Le Chat with the same request.

Look at response quality, depth, accuracy, and how much editing you need to do.

This gives you an instant reality check on whether the new model is actually an upgrade or just marketing rubbish.

Use a software tool to support you

If you have the budget, there are a breed of software tool such as arthur.ai that can help you evaluate model performance, but this doesn't necessarily cover the things you need to test for your role.

The best AI is the one that helps you do what you need to do

But for a moment, forget benchmarks and marketing claims. The best way to test AI is to see how well it fits into your life and work. If it works for you in your role, that’s a win – stick with it... until the next one 😵💫

Final thought - don't forget good AI governance. Stay within the boundaries of your AI policy when testing new models.

Keep up to date with what’s happening in AI by signing up to our newsletter: https://iwantmore.ai/ai-newsletter

Madhurjya Sarmah

AI & Automations Developer, SEO Strategist | 24/7 AI Workforce

2mo

Test new AI by putting it through real tasks, pushing its limits, checking its knowledge, and measuring time saved—then compare models to find what works best for you in your role. Stay ahead of AI trends and let's connect!

How to test the latest AI models

James Grant

Co-founder at iwantmore.ai | Using AI to help build smarter businesses | CMO | Veteran

Recommended by LinkedIn

More articles by James Grant

Insights from the community

Others also viewed

IMO - AI is being overhyped and we could be set to lose way more than we gain.

to ai or not to ai: general ai may arrive within trump’s current presidency (2/n)

How an AI Thinks Before It Speaks: Quiet-STaR

AI Through the Looking Glass – Are We Seeing What They Want Us To?

What Can’t AI Do Yet? Concerns and Limitations of the World-Changing Technology

AI Hallucinations: When Algorithms “Get Creative”

Ai in Government - my musings

What did I miss? The dawn of the Age of AI

What really is AI anyway?

To AI or not to AI?

Explore topics

Recommended by LinkedIn

More articles by James Grant

AI ROI: how to measure success beyond the hype

Could ChatGPT's model options be any more complicated?

‘There's an AI agent for that’

Why every organisation needs an AI council

Shadow AI: the risk you didn’t know you were taking

Who owns AI in your business?

Is SEO on life support? What the future holds for search

The future of leadership in the age of AI

The (not so) hidden risks of ignoring AI in business

Why does AI literacy matter?

Insights from the community

Others also viewed

IMO - AI is being overhyped and we could be set to lose way more than we gain.

to ai or not to ai: general ai may arrive within trump’s current presidency (2/n)

How an AI Thinks Before It Speaks: Quiet-STaR

AI Through the Looking Glass – Are We Seeing What They Want Us To?

What Can’t AI Do Yet? Concerns and Limitations of the World-Changing Technology

AI Hallucinations: When Algorithms “Get Creative”

Ai in Government - my musings

What did I miss? The dawn of the Age of AI

What really is AI anyway?

To AI or not to AI?

Explore topics