Hugging Face’s cover photo
Hugging Face

Hugging Face

Software Development

The AI community building the future.

About us

The AI community building the future.

Website
https://huggingface.co
Industry
Software Development
Company size
51-200 employees
Type
Privately Held
Founded
2016
Specialties
machine learning, natural language processing, and deep learning

Products

Locations

Employees at Hugging Face

Updates

  • Hugging Face reposted this

    View profile for Avijit Ghosh, PhD

    Applied Policy Researcher at Hugging Face 🤗

    🚨 New Article: Empowering Public Organizations: Preparing Your Data for the AI Era, with Yacine Jernite 🏛 Public organizations are authoritative sources for critical information: monitoring environmental conditions, tracking educational outcomes, documenting workforce trends, preserving cultural heritage, and managing public infrastructure. However, much of this data exists in formats that AI systems can't easily use — stored in PDFs, scattered across Excel files with inconsistent structures, and often organized in specialized formats designed for human consumption rather than machine learning. 📊 This means that the public data commons that power models, especially models made by small developers who cannot afford millions of dollars in licensing fees, would benefit the most if this data were made available in AI-consumable formats. Data quality is incredibly important for model performance and efficiency (we have written about this, too!), and this is already public knowledge and free! 🤝 There are clear benefits to doing this: Orgs get to enable technology that better serves communities, amplify the value of public data through collaboration, and maintain principled control over data use. Orgs like NASA - National Aeronautics and Space Administration, the National Library of Norway, the French Ministère de la Culture, The National Archives of Finland / Kansallisarkisto, and other public organizations are already on Hugging Face, releasing their rich datasets and models that add to the public commons and enrich us all.  📝 So, we wrote a guide for all public organizations to do so! We use the Massachusetts Data Hub as a case study for this article and convert four datasets! We look at: - MCAS education data 📚 (Excel files with different formats) - Labor market reports 💼 (PowerPoint presentations) - Occupational safety stats 🦺 (PDF reports) - 2023 aerial imagery 🛰️ (JP2 image files) For each of these datasets, we show why they were not AI-ready, the steps we took to clean, standardize, and convert them to Hugging Face datasets, and the release of both the code and datasets on the Hub! This was a really fun exercise, and we have some important takeaways for other public organizations that are looking to release data on Hugging Face to add to public knowledge and power better AI: 1️⃣ Identify Your Most Valuable Datasets: Which dataset is your organization the most authoritative source for, and releasing which in AI-ready formats would bring the most value to your mission? 2️⃣ Determine Format Needs: Consider who will be using this data and for what purpose, and how best to release this dataset so that it is optimal for downstream use. 3️⃣ Document Clearly: Rich documentation is important for downstream users and actually drives adoption! A study has shown that almost 90% of dataset downloads on the Hub have come from fully documented datasets. We can't wait to see your datasets on 🤗! [Article linked in comments]

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Sayak Paul

    ML @ Hugging Face 🤗

    We have got a new Diffusers release for you and it ships a truckload of things 🚙 Bringing you a bunch of new image & video gen models, a wide suite of memory optimization techniques with caching, & torch.compile() support when hotswapping LoRAs. I am missing much more than I can write here. So, please check out the release notes 🔥 Release notes are here: https://lnkd.in/gDX3vT57

  • Hugging Face reposted this

    View profile for Daniel Vila Suero

    Building data tools @ Hugging Face 🤗

    In times of hype, run your own experiments. How? Use Hugging Face Inference Providers With every new open model release, social media timelines are full of contradicting information, exaggerated claims, etc. That's why running quick (and cheap) experiments is becoming critical. - Have you heard that the latest Llama 4 models are bad? - Have you heard that Llama 4 models behave differently across providers? - Is QwQ-32B better than DeepSeek R1? Run these models through data you care about. With the Hub you can: - Get access to the latest models (from Day 0). - Test them even if you don't have GPUs. - Mix and match the fastest, most reliable inference providers. - Discuss and learn about these models with the largest AI community The prompt and results in the image attached are part of "vibench" a tiny benchmark I'm building with Inference Providers. It contains interesting and challenging prompts from Reddit, the Sparks of AGI Microsoft's paper, and other places. You can find the open dataset in the first comment, and feel free to suggest challenging prompts to add them to vibench. and misinformation, run your own experiments. How? Use Hugging Face Inference Providers With every new open model release, social media timelines are full of contradicting information, exaggerated claims, etc. That's why running quick (and cheap) experiments is becoming critical. - Have you heard that the latest Llama 4 models are bad? - Have you heard that Llama 4 models behave differently across providers? - Is QwQ-32B better than DeepSeek R1? Run these models through data you care about. With the Hub you can: - Get access to the latest models (from Day 0). - Test them even if you don't have GPUs. - Mix and match the fastest, most reliable inference providers. - Discuss and learn about these models with the largest community. The prompt and results in the image attached are part of "vibench" a tiny benchmark I'm building with Inference Providers. It contains interesting and challenging prompts from Reddit, the Sparks of AGI Microsoft's paper, and other places. You can find the open dataset in the first comment, and feel free to suggest challenging prompts to add them to vibench.

    • No alternative text description for this image
  • Hugging Face reposted this

    Announcing 📢 Reasoning Datasets Competition 📢 in collaboration with Hugging Face & Together AI Since the launch of DeepSeek-R1 this January, we’ve seen an explosion of reasoning-focused datasets: OpenThoughts-114k, OpenCodeReasoning, codeforces-cot, and more. OpenThoughts-114k alone has helped train 230+ models. Most datasets focus on math, coding, or science, domains where answers are clear-cut and verifiable. But reasoning is starting to push into messier, more human areas: Finance, Medicine and even multi-domain reasoning. The next big leap in LLMs could come from better datasets that mirror real-world ambiguity, complexity, and nuance. To help push the frontier, we’re launching a Reasoning Dataset Competition. More details here: https://lnkd.in/g33aSYtC

    • No alternative text description for this image
  • Hugging Face reposted this

    View organization page for Gradio

    61,636 followers

    🚀Breakthrough Alert: Gradio 5.24 is out! We've completely rebuilt our ImageEditor component based on developer feedback: 🤯 Enjoy Photoshop like features in your AI apps: > Professional-grade zooming and panning > Full transparency control > Advanced layer configuration, and many more.. This update is to enable everyone to build sophisticated inpainting and sketching interfaces with just a few lines of Python 🔥 Upgrade today: pip install --upgrade gradio

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Andrés Marafioti

    AI Researcher @ Hugging Face | 9+ YOE in GenAI, MLOps, & Research | Pushing the Boundaries of Open-Source AI

    Today, we share the tech report for 𝗦𝗺𝗼𝗹𝗩𝗟𝗠: 𝗥𝗲𝗱𝗲𝗳𝗶𝗻𝗶𝗻𝗴 𝘀𝗺𝗮𝗹𝗹 𝗮𝗻𝗱 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀.  🔥 Explaining how to create a 𝘁𝗶𝗻𝘆 𝟮𝟱𝟲𝗠 𝗩𝗟𝗠 that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago! Here are the coolest insights from our experiments: ✨ Longer context = Big wins: Increasing the context length from 2K to 16K gave our tiny VLMs a 60% performance boost! ✨ Smaller is smarter with SigLIP: Surprise! Smaller LLMs didn't benefit from the usual large SigLIP (400M). Instead, we use the 80M base SigLIP that performs equally well at just 20% of the original size! ✨ Pixel shuffling magic: Aggressively pixel shuffling helped our compact VLMs "see" better, achieving the same performance with sequences 16x shorter! ✨ Learned positional tokens FTW: For compact models, learned positional tokens significantly outperform raw text tokens, enhancing efficiency and accuracy. ✨ System prompts and special tokens are key: Introducing system prompts and dedicated media intro/outro tokens significantly boosted our compact VLM’s performance—especially for video tasks. ✨ Less CoT, more efficiency: Turns out, too much Chain-of-Thought (CoT) data actually hurts performance in small models. They dumb ✨ Longer videos, better results: Increasing video length during training enhanced performance on both video and image tasks. 🌟 State-of-the-Art Performance, SmolVLM comes in three powerful yet compact sizes—256M, 500M, and 2.2B parameters—each setting new SOTA benchmarks for their hardware constraints in image and video understanding. 📱 Real-world Efficiency: We've created an app using SmolVLM on an iPhone 15 and got real-time inference directly from its camera! 🌐 Browser-based Inference? Yep! We get lightning-fast inference speeds of 40-80 tokens per second directly in a web browser. No tricks, just compact, efficient models! If you’re into efficient multimodal models, you’ll love this one.

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Aymeric Roucher

    Building agents @ Hugging Face 🤗 | Polytechnique - Cambridge

    One overlooked aspect of the Llama-4 release on the Hugging Face Hub: this was another successful test for our new Xet backend, that allows near-instantaneous model changes. In short, Xet is a new storage backend that replaces Git backend. Its first advantage is better compression, meaning the first model download is just faster. Something like a 2x improvement for Llama-4. That's already cool ⚡️ But the core feature is just wild: Xet uses content-defined chunking (CDC), to deduplicate at the level of bytes (~64KB chunks of data) instead of files. This means that if you change one line in a huge parquet file, Xet sees the diff on the line-level rather than the huge-file-level. Then a change takes only an instant single-line upload instead of hours to upload/download the whole file. ⚡️⚡️ Congrats to the XetHub team, awesome work 👏

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

Hugging Face 8 total rounds

Last Round

Series unknown
See more info on crunchbase