How to test the latest AI models
Every few days, a new AI model seems to drop from one of the big AI companies. This week it was Anthropic’s turn with Claude 3.7 Sonnet. Social media then explodes with people saying it’s either the best thing in the history of time or massively overhyped. But as a normal person, how do you work out whether a new AI model is actually better than what you were using before?
Thankfully for most of us, you don’t need a brain the size of a planet or a PhD in machine learning to test AI. It will be put through the ringer with Humanity's Last Exam for you! But as an individual to find out for yourself, you just need a bit of curiosity, a decent amount of time and a structured approach.
Here’s how to test if an AI model is worth the hype...
Test it on what you actually do
If you’re a lawyer, get it to summarise a contract. If you’re in training, ask it to create a lesson plan. If you’re in marketing, see if it can write ad copy that sounds even remotely human.
Have a list of your day-to-day tasks and use cases in your back pocket and use the same ones each time.
Most new AI models perform well in general tasks, but can they do your specific tasks well?
Quick test: Ask it to draft an email you’d actually send. If you still need to rewrite most of it, the model probably isn’t saving you time.
Push it with edge cases
Find out where the AI model’s boundaries are. What is its ‘jagged frontier’ (with thanks to Ethan Mollick for the phrase). When does it start going haywire?
If it falls apart on these, it’s probably not as advanced as it claims.
Test its knowledge
New models boast about their knowledge cut-off dates, but that doesn’t mean they understand the information they have been trained on correctly.
AI is confident even when it's wrong, so if it’s misrepresenting things you know, it’s probably unreliable elsewhere too.
Measure how much effort and time it actually saves you
Recommended by LinkedIn
The best AI tools don’t just generate text; they make your work and life easier.
If you’re spending as much time fixing the AI’s output as you would have spent writing or researching from scratch, it’s not a game-changer.
Compare models side by side
The easiest way to see if a new AI model is better? Run the exact same prompt across different models.
I have five different AI models open at any one time on a dedicated screen. Probably overkill, but it really clearly highlights the differences and capabilities of each.
This gives you an instant reality check on whether the new model is actually an upgrade or just marketing rubbish.
Use a software tool to support you
If you have the budget, there are a breed of software tool such as arthur.ai that can help you evaluate model performance, but this doesn't necessarily cover the things you need to test for your role.
The best AI is the one that helps you do what you need to do
But for a moment, forget benchmarks and marketing claims. The best way to test AI is to see how well it fits into your life and work. If it works for you in your role, that’s a win – stick with it... until the next one 😵💫
Final thought - don't forget good AI governance. Stay within the boundaries of your AI policy when testing new models.
Keep up to date with what’s happening in AI by signing up to our newsletter: https://iwantmore.ai/ai-newsletter
AI & Automations Developer, SEO Strategist | 24/7 AI Workforce
2moTest new AI by putting it through real tasks, pushing its limits, checking its knowledge, and measuring time saved—then compare models to find what works best for you in your role. Stay ahead of AI trends and let's connect!