Redpoint reposted this
On Unsupervised Learning, Cohere's Aidan Gomez shared his thoughts on model architectures, enterprise adoption, and what’s breaking in the foundation model stack. Aidan was a co-author on the original Transformer paper, leads one of the most advanced model labs and is now building for real-world enterprise deployments with Cohere’s agent platform, North. Cohere serves thousands of customers across sectors like finance, telco, and healthcare — and they’ve made a name for themselves by staying cloud-agnostic, privacy-forward, and deeply international (with major bets in Japan and Korea).Some key takeaways: 🔁 Will Transformers ever be replaced—and what might come next? Aidan’s been asking this question longer than most — and even he’s surprised we’re still here. While architectures like SSMs and discrete diffusion have drawn excitement, none have offered a fundamental reason to dethrone Transformers. The best ideas are getting subsumed into the Transformer backbone rather than replacing it outright. That said, the industry is clearly in search mode — and the next dominant architecture will likely come from someone deeply frustrated by Transformers’ limits. 🏢 What enterprise AI use cases are actually working today? Despite all the buzz, most enterprises are still in pilot mode — but there are a few clear winners. Cohere is seeing sustained traction in customer support automation, where every vertical (from healthcare to financial services) is applying LLMs to high-volume, high-context queries. Another category that’s emerging fast is Deep Research for specific industries. 🏷️ The Type of Data Labeling that Matters We’re well into the synthetic era of model training, but human-labeled data still plays a critical role. Aidan breaks it down: expert-labeled data is too expensive to scale (you can’t hire 100,000 doctors), but it’s invaluable for seeding high-quality synthetic data at scale. Cohere might use 100 trusted examples to generate 10,000 — and in domains like math or code, that synthetic data can be filtered for correctness automatically. But when it comes to evaluation, humans remain essential. Especially in messy, high-stakes domains, eval still needs judgment that models (and synthetic proxies) can’t replace. 📉 How scaling today includes better data, not just more compute Aidan admits he was once loyal to the “scale is all you need” hypothesis — but that era is over. Raw scaling is now delivering diminishing returns. The gains today are coming from smarter data curation, better evaluations, and more targeted applications. What we call “scaling” now often means something more nuanced: expanding test-time compute, diversifying demonstrations, or building models optimized for specific tasks. Full episode below: YouTube: https://lnkd.in/gbVzfeAR Spotify: https://bit.ly/42v4yUt Apple: https://bit.ly/4lxS3Qv