March Madness: LLM Predictions | Invisible Technologies

View organization page for Invisible Technologies

67,691 followers

We put five leading large language models head-to-head in a March Madness bracket challenge, and the results revealed more about LLM usability, safety settings, and creativity than basketball predictions. Some models produced clear, coherent brackets, while others delivered inconsistent formatting and overwhelming data dumps. Models should be trained to ensure outputs are user-friendly, especially in real-world applications where clarity can make or break usability. When it came to safety, not all models wanted to play ball. Mistral and Gemini refused the task (likely flagging it as gambling-adjacent), while GPT-4o partially complied, providing only Final Four predictions. We take this as a positive. These models are increasingly being trained to avoid generating content that could be unethical or harmful. Gemini and Mistral scored bonus points for creativity. Instead of simple refusals, they offered detailed guides on building your own bracket. Gemini went one further and created an entirely fictional bracket with made-up schools, details about various teams’ strengths, and strategies. While this may not be useful for your office pool, it did showcase the model’s capacity for imaginative generation—a capability that can come in useful for requests that require storytelling or simulation. Who called it right?

6 Comments

Mike Leggett

Talent Partner scaling AI Startups from 0-1 and beyond through foundational TA system design and architecture.

Taylor Trepagnier you let me down. Should have had the Gators going all the way. Typical LSU fan 😂. Great read!

1 Reaction

Sarah Rukonhi

Housekeeper at Homebased

Do you have job openings for Shona language?

Paul Heredia

The case for continued Model training being solidified! Imagine how big the gap might be as these models become industry or context or task specialized! Then take that a step further to multiple models that feed of each other in the enterprise!!! Welcome to why Agentic AI transformation is here for the long haul!!

Amar Srayer

Great analysis! When it comes to output usability, I’m curious, did any of the models evaluate the quality of their own responses? Do reasoning-focused models handle this differently, or do they all just generate and move on?

See more comments

To view or add a comment, sign in

Invisible Technologies’ Post

More from this author

AI—The Invisible Advantage

Why your LLM is misbehaving: Common causes of AI failure

AI—The Invisible Advantage

Explore topics