Hugging Face’s Post

Hugging Face reposted this

View profile for Daniel Vila Suero

Building data tools @ Hugging Face 🤗

In times of hype, run your own experiments. How? Use Hugging Face Inference Providers With every new open model release, social media timelines are full of contradicting information, exaggerated claims, etc. That's why running quick (and cheap) experiments is becoming critical. - Have you heard that the latest Llama 4 models are bad? - Have you heard that Llama 4 models behave differently across providers? - Is QwQ-32B better than DeepSeek R1? Run these models through data you care about. With the Hub you can: - Get access to the latest models (from Day 0). - Test them even if you don't have GPUs. - Mix and match the fastest, most reliable inference providers. - Discuss and learn about these models with the largest AI community The prompt and results in the image attached are part of "vibench" a tiny benchmark I'm building with Inference Providers. It contains interesting and challenging prompts from Reddit, the Sparks of AGI Microsoft's paper, and other places. You can find the open dataset in the first comment, and feel free to suggest challenging prompts to add them to vibench. and misinformation, run your own experiments. How? Use Hugging Face Inference Providers With every new open model release, social media timelines are full of contradicting information, exaggerated claims, etc. That's why running quick (and cheap) experiments is becoming critical. - Have you heard that the latest Llama 4 models are bad? - Have you heard that Llama 4 models behave differently across providers? - Is QwQ-32B better than DeepSeek R1? Run these models through data you care about. With the Hub you can: - Get access to the latest models (from Day 0). - Test them even if you don't have GPUs. - Mix and match the fastest, most reliable inference providers. - Discuss and learn about these models with the largest community. The prompt and results in the image attached are part of "vibench" a tiny benchmark I'm building with Inference Providers. It contains interesting and challenging prompts from Reddit, the Sparks of AGI Microsoft's paper, and other places. You can find the open dataset in the first comment, and feel free to suggest challenging prompts to add them to vibench.

  • No alternative text description for this image
🎲 Aleksandr Kazantsev

Tech Writer, Copywriter, Board game Author

3w

Gemma3-12b

  • No alternative text description for this image
Fabian Weigend

Postdoc@Harvard University | Machine Learning, Human-Robot Interaction, Wearables

3w
🎲 Aleksandr Kazantsev

Tech Writer, Copywriter, Board game Author

3w

Mistral-small 3.1

  • No alternative text description for this image
🎲 Aleksandr Kazantsev

Tech Writer, Copywriter, Board game Author

3w

Phi4-14b

  • No alternative text description for this image
Dr. Nathan Kodiappan

Founder & Managing Director, MediStrat Sdn Bhd.

3w

Try this too: starvector/starvector-1b-im2svg

Chandana Nandi

Data Scientist | ML & AI Intern | Python, SQL, Azure, Power BI | Data Analysis, LLMs, MLOps | Open to Full-Time Roles | MS in Data Science

3w

This is such an important reminder, Daniel — there’s so much noise and hype around every new model drop. Love the idea of running your own experiments instead of relying on exaggerated claims. The SVG butterfly comparison made it fun and insightful — definitely checking out vibench and planning to test some prompts myself! Thanks for sharing this with the community. 🔍🧠✨

Based on the image you sent it does seem that Maverick and Scout produced one of the worse images. I think Deepseek crushed it for sure 👍

Like
Reply

Cutting through the noise and running your own tests on various AI models is a fantastic approach. 👏 This encourages not just relying on hearsay but conducting thorough experimentation to gauge the efficacy of different models. At qantum.one, we couldn't agree more as we leverage both human expertise and artificial intelligence to provide comprehensive QA automation services. Checking system reliability is crucial to us. 💻🔍 Keep going with your benchmark project, vibench, sounds like a fantastic initiative! #AI #Testing #ModelPerformance #BionicTesting #QaAutomation #qantumone

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics