SlideShare a Scribd company logo
LLMs vs SLMs
Nathan Bijnens
Sr Manager, Data & AI
Microsoft
Language Calculator
Will my sleeping
bag work for my
trip to Patagonia
next month?
User input
Context
Historical weather
lookup
Behavior Context data
Output structure Profile data
Prompt engineering “the art of
asking questions” + Add your own
data
LLM
Yes, your Elite Eco
sleeping bag is
rated to 21.6F,
which is below the
average low
temperature in
Patagonia in
September
Output
Prompt Completion
SLMs
Artificial Intelligence
Machine Learning
Deep
Learning
1956
Artificial Intelligence The field of computer science that seeks to
create intelligent machines that can replicate or exceed human intelligence.
1997
Machine Learning Subset of AI that enables machines to learn from
existing data and improve upon that data to make decisions or predictions.
2012
Deep Learning A machine learning technique in which layers of neural
networks are used to process data and make decisions.
2022
Large Language Models For the first time we are able capture and
model knowledge. Further, we observe emergent behaviors as we scale
up.
2023
Advent of Phi SLMs, Tiny but mighty language models that challenge
status quo!
Large
Language
Models (LLMs)
Small
Language
Models
🧨
What makes it small?
Feature Large Language Models (LLMs) Small Language Models (SLMs)
Amount of Parameters Billions to Trillions Millions to Billions
Use Cases Complex tasks like text generation,
translation, question answering, and
summarization
Specific tasks
Costs High computational and operational
costs due to extensive resource
requirements
Lower costs, suitable for resource-
constrained environments
Training Time Several weeks to months, depending
on the model size and computational
resources
Shorter training times, often a few
days to weeks
Training Dataset Sizes Massive datasets including books,
articles, websites, and other forms of
text
Smaller datasets, often task-specific
or domain-specific
Inference Speed Slower Faster
Deployment Requires powerful hardware
(GPUs/TPUs) and cloud infrastructure
Can run on edge devices, CPUs, and
less powerful GPUs
Accuracy High accuracy and performance on a Good performance on specific tasks,
Phi-3-mini
(3.8B)
Phi-3-vision
(4.2B)
Phi-3-MoE
(6.6B)
Phi-4
(14B)
Available on
Azure AI
Model Catalog
Hugging Face
Ollama
NVIDIA NIM
ONNX Runtime
Instruction Tuned RAI Safety Aligned
Phi
Small Language Models
Groundbreaking performance for
size, with frictionless availability
Models & availability across platforms
Model Input Content
Length
Azure AI (MaaS) Azure ML (MaaP) ONNX Hugging
Face
Ollama Nvidia
NIM
Phi-3-vision-128k-
instruct
Text+Image 128k Playground & Deployment Playground, Deployment
& Finetuning
CUDA, CPU
, DirectML
Download -NA- NIM APIs
Phi-3-mini-4k-instruct Text 4k Playground & Deployment Playground, Deployment
& Finetuning
CUDA, Web Playground
&
Download
GGUF NIM APIs
Phi-3-mini-128k-
instruct
Text 128k Playground & Deployment Playground, Deployment
& Finetuning
CUDA Download -NA- NIM APIs
Phi-3-small-8k-instruct Text 8k Playground & Deployment Playground, Deployment
& Finetuning
CUDA Download -NA- NIM APIs
Phi-3-small-128k-
instruct
Text 128k Playground & Deployment Playground, Deployment
& Finetuning
CUDA Download -NA- NIM APIs
Phi-3-medium-4k-
instruct
Text 4k Playground & Deployment Playground, Deployment
& Finetuning
CUDA, CPU
, DirectML
Download -NA- NIM APIs
Phi-3-medium-128k-
instruct
Text 128k Playground & Deployment Playground, Deployment
& Finetuning
CUDA, CPU
, DirectML
Download -NA- -NA-
Phi-4 Text 16k Playground & Deployment Playground, Deployment
& Finetuning
-NA- Download Download -NA-
Phi-silica which was announced at //build is based on Phi models and is optimized for Windows NPUs. Application developers
can leverage Phi-silica via in box Windows APIs. Phi-silica is not available on Azure, hence out of scope for this presentation
Benchmark numbers
Category Benchmark phi-4 (14B) phi-3 (14B) Qwen 2.5 (14B
instruct)
GPT-4o-mini Llama-3.3
(70B instruct)
Qwen 2.5 (72B
instruct)
GPT-4o
Popular
Aggregated
Benchmark
MMLU 84.8 77.9 79.9 81.8 86.3 85.3 88.1
Science GPQA 56.1 31.2 42.9 40.9 49.1 49.0 50.6
Math MGSM 80.6 53.5 79.6 86.5 89.1 87.3 90.4
Math MATH 80.4 44.6 75.6 73.0 66.3 80.0 74.6
Code
Generation
HumanEval 82.6 67.8 72.1 86.2 78.9 80.4 90.6
Factual
Knowledge
SimpleQA 3.0 7.6 5.4 9.9 20.9 10.2 39.4
Reasoning DROP 75.5 68.3 85.5 79.3 90.2 76.7 80.9
MMLU
GPQA
MGSM
MATH
HumanEval
SimpleQA
DROP
0
50
100
Benchmark numbers
phi-4 (14B) phi-3 (14B) Qwen 2.5 (14B instruct) GPT-4o-mini
Llama-3.3 (70B instruct) Qwen 2.5 (72B instruct) GPT-4o
Benefits of Small Language Models
Low compute
footprint and
can run on older
GPUs
Ultra Low
Latency thanks
to its small size
Easy on your
wallet, and
hence business
viable
Can be deployed
on-prem or on-
edge devices
Easier &
Affordable to
customize
The only model <5B that offers long context!
Some Use Cases for Small Language Models
Text Prediction Named Entity
Recognition
Summarization Domain Specific
Tasks
The only model <5B that offers long context!
Mistral - Ministral
Ministral-3B
(3.6B)
131k token context length
$0.04 / M tokens (input and output)
Internet-less assistant On-Device translation Local Analytics
Ministral-8B
(8B)
131k token context length
$0.04 / M tokens (input and output)
Meta - Llama
Llama-3.2-1B
(1B)
128k token context length
$0.37 / M tokens (est.)
Internet-less assistant Multilingual dialogue Image/Text to Text
Llama-3.2-3B
(3B)
Llama-3.2-11B-Vision
(11B)
128k token context length 128k token context length
/
Fast Slower but smarter
DeepSeek
DeepSeek-R1
(461B)
R1-Distill-Qwen-1.5B
R1-Distill-Qwen-7B
R1-Distill-Qwen-14B
R1-Distill-Qwen-32B
R1-Distill-Llama-8B
R1-Distill-Llama-70B
Distilled Models
Open-source model
Advanced reasoning capabilities
Not a SLM!
Important Considerations
Understand the Problem at hand
Identify the problem you are solving
Determine missing capabilities, skills, and behaviors
Evaluation and Benchmarks
Make sure you can measure what you are enabling
Use LLMs as a judge
Use BabelBench with 300+ tasks, track general capabilities
Invest in Better Data
Focus on higher quality, not quantity
Less finetuning data is better to keep general capabilities
Leverage LLMs to generate data
Human annotations if available
No Free Lunch
Fine-tuning reduces general capability over time
Model forgets knowledge outside target domain
Loss of general "thinking" skills
Thank you
Nathan Bijnens
Sr Manager, Data & AI
Microsoft
Nathan.Bijnens@microsoft.com
Ad

More Related Content

Similar to Large Language Models vs Small Language Models (20)

Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
Intel® Software
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
GDSCNiT
 
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
Enterprise Trends for Gen AI  - Berkeley LLM AI Agents MOOCEnterprise Trends for Gen AI  - Berkeley LLM AI Agents MOOC
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
VincentLui15
 
LJC-Unconference-2023-Keynote.pdf
LJC-Unconference-2023-Keynote.pdfLJC-Unconference-2023-Keynote.pdf
LJC-Unconference-2023-Keynote.pdf
EmilyJiang23
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
Ganesan Narayanasamy
 
Microsoft Build 2024 Updates
Microsoft Build 2024 UpdatesMicrosoft Build 2024 Updates
Microsoft Build 2024 Updates
Naoki (Neo) SATO
 
Microsoft Build 2024 Updates
Microsoft Build 2024 UpdatesMicrosoft Build 2024 Updates
Microsoft Build 2024 Updates
Naoki (Neo) SATO
 
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
 
Ml 3 ways
Ml 3 waysMl 3 ways
Ml 3 ways
PhilipBasford
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
geetachauhan
 
What are Small Language Models (SLMs) – A Brief Guide | USAII®
What are Small Language Models (SLMs) – A Brief Guide | USAII®What are Small Language Models (SLMs) – A Brief Guide | USAII®
What are Small Language Models (SLMs) – A Brief Guide | USAII®
United States Artificial Intelligence Institute
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
Sri Ambati
 
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
 
Deep learning at scale in Azure
Deep learning at scale in AzureDeep learning at scale in Azure
Deep learning at scale in Azure
Microsoft Tech Community
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
Andri Yadi
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
NVIDIA Japan
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
Intel® Software
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
GDSCNiT
 
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
Enterprise Trends for Gen AI  - Berkeley LLM AI Agents MOOCEnterprise Trends for Gen AI  - Berkeley LLM AI Agents MOOC
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
VincentLui15
 
LJC-Unconference-2023-Keynote.pdf
LJC-Unconference-2023-Keynote.pdfLJC-Unconference-2023-Keynote.pdf
LJC-Unconference-2023-Keynote.pdf
EmilyJiang23
 
Microsoft Build 2024 Updates
Microsoft Build 2024 UpdatesMicrosoft Build 2024 Updates
Microsoft Build 2024 Updates
Naoki (Neo) SATO
 
Microsoft Build 2024 Updates
Microsoft Build 2024 UpdatesMicrosoft Build 2024 Updates
Microsoft Build 2024 Updates
Naoki (Neo) SATO
 
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
geetachauhan
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
Sri Ambati
 
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
Andri Yadi
 

More from Nathan Bijnens (20)

AI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour BrusselsAI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour Brussels
Nathan Bijnens
 
AI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide SprintAI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide Sprint
Nathan Bijnens
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
Nathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
Nathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
Nathan Bijnens
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Nathan Bijnens
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
Nathan Bijnens
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
Nathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 
AI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour BrusselsAI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour Brussels
Nathan Bijnens
 
AI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide SprintAI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide Sprint
Nathan Bijnens
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
Nathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
Nathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
Nathan Bijnens
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Nathan Bijnens
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
Nathan Bijnens
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
Nathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 
Ad

Recently uploaded (20)

DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Artificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptxArtificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptx
03ANMOLCHAURASIYA
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?
Eric Torreborre
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
AI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamsonAI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamson
UXPA Boston
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Artificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptxArtificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptx
03ANMOLCHAURASIYA
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?
Eric Torreborre
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
AI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamsonAI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamson
UXPA Boston
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
Ad

Large Language Models vs Small Language Models

  • 1. LLMs vs SLMs Nathan Bijnens Sr Manager, Data & AI Microsoft
  • 2. Language Calculator Will my sleeping bag work for my trip to Patagonia next month? User input Context Historical weather lookup Behavior Context data Output structure Profile data Prompt engineering “the art of asking questions” + Add your own data LLM Yes, your Elite Eco sleeping bag is rated to 21.6F, which is below the average low temperature in Patagonia in September Output Prompt Completion
  • 3. SLMs Artificial Intelligence Machine Learning Deep Learning 1956 Artificial Intelligence The field of computer science that seeks to create intelligent machines that can replicate or exceed human intelligence. 1997 Machine Learning Subset of AI that enables machines to learn from existing data and improve upon that data to make decisions or predictions. 2012 Deep Learning A machine learning technique in which layers of neural networks are used to process data and make decisions. 2022 Large Language Models For the first time we are able capture and model knowledge. Further, we observe emergent behaviors as we scale up. 2023 Advent of Phi SLMs, Tiny but mighty language models that challenge status quo! Large Language Models (LLMs) Small Language Models 🧨
  • 4. What makes it small? Feature Large Language Models (LLMs) Small Language Models (SLMs) Amount of Parameters Billions to Trillions Millions to Billions Use Cases Complex tasks like text generation, translation, question answering, and summarization Specific tasks Costs High computational and operational costs due to extensive resource requirements Lower costs, suitable for resource- constrained environments Training Time Several weeks to months, depending on the model size and computational resources Shorter training times, often a few days to weeks Training Dataset Sizes Massive datasets including books, articles, websites, and other forms of text Smaller datasets, often task-specific or domain-specific Inference Speed Slower Faster Deployment Requires powerful hardware (GPUs/TPUs) and cloud infrastructure Can run on edge devices, CPUs, and less powerful GPUs Accuracy High accuracy and performance on a Good performance on specific tasks,
  • 5. Phi-3-mini (3.8B) Phi-3-vision (4.2B) Phi-3-MoE (6.6B) Phi-4 (14B) Available on Azure AI Model Catalog Hugging Face Ollama NVIDIA NIM ONNX Runtime Instruction Tuned RAI Safety Aligned Phi Small Language Models Groundbreaking performance for size, with frictionless availability
  • 6. Models & availability across platforms Model Input Content Length Azure AI (MaaS) Azure ML (MaaP) ONNX Hugging Face Ollama Nvidia NIM Phi-3-vision-128k- instruct Text+Image 128k Playground & Deployment Playground, Deployment & Finetuning CUDA, CPU , DirectML Download -NA- NIM APIs Phi-3-mini-4k-instruct Text 4k Playground & Deployment Playground, Deployment & Finetuning CUDA, Web Playground & Download GGUF NIM APIs Phi-3-mini-128k- instruct Text 128k Playground & Deployment Playground, Deployment & Finetuning CUDA Download -NA- NIM APIs Phi-3-small-8k-instruct Text 8k Playground & Deployment Playground, Deployment & Finetuning CUDA Download -NA- NIM APIs Phi-3-small-128k- instruct Text 128k Playground & Deployment Playground, Deployment & Finetuning CUDA Download -NA- NIM APIs Phi-3-medium-4k- instruct Text 4k Playground & Deployment Playground, Deployment & Finetuning CUDA, CPU , DirectML Download -NA- NIM APIs Phi-3-medium-128k- instruct Text 128k Playground & Deployment Playground, Deployment & Finetuning CUDA, CPU , DirectML Download -NA- -NA- Phi-4 Text 16k Playground & Deployment Playground, Deployment & Finetuning -NA- Download Download -NA- Phi-silica which was announced at //build is based on Phi models and is optimized for Windows NPUs. Application developers can leverage Phi-silica via in box Windows APIs. Phi-silica is not available on Azure, hence out of scope for this presentation
  • 7. Benchmark numbers Category Benchmark phi-4 (14B) phi-3 (14B) Qwen 2.5 (14B instruct) GPT-4o-mini Llama-3.3 (70B instruct) Qwen 2.5 (72B instruct) GPT-4o Popular Aggregated Benchmark MMLU 84.8 77.9 79.9 81.8 86.3 85.3 88.1 Science GPQA 56.1 31.2 42.9 40.9 49.1 49.0 50.6 Math MGSM 80.6 53.5 79.6 86.5 89.1 87.3 90.4 Math MATH 80.4 44.6 75.6 73.0 66.3 80.0 74.6 Code Generation HumanEval 82.6 67.8 72.1 86.2 78.9 80.4 90.6 Factual Knowledge SimpleQA 3.0 7.6 5.4 9.9 20.9 10.2 39.4 Reasoning DROP 75.5 68.3 85.5 79.3 90.2 76.7 80.9
  • 8. MMLU GPQA MGSM MATH HumanEval SimpleQA DROP 0 50 100 Benchmark numbers phi-4 (14B) phi-3 (14B) Qwen 2.5 (14B instruct) GPT-4o-mini Llama-3.3 (70B instruct) Qwen 2.5 (72B instruct) GPT-4o
  • 9. Benefits of Small Language Models Low compute footprint and can run on older GPUs Ultra Low Latency thanks to its small size Easy on your wallet, and hence business viable Can be deployed on-prem or on- edge devices Easier & Affordable to customize The only model <5B that offers long context!
  • 10. Some Use Cases for Small Language Models Text Prediction Named Entity Recognition Summarization Domain Specific Tasks The only model <5B that offers long context!
  • 11. Mistral - Ministral Ministral-3B (3.6B) 131k token context length $0.04 / M tokens (input and output) Internet-less assistant On-Device translation Local Analytics Ministral-8B (8B) 131k token context length $0.04 / M tokens (input and output)
  • 12. Meta - Llama Llama-3.2-1B (1B) 128k token context length $0.37 / M tokens (est.) Internet-less assistant Multilingual dialogue Image/Text to Text Llama-3.2-3B (3B) Llama-3.2-11B-Vision (11B) 128k token context length 128k token context length / Fast Slower but smarter
  • 14. Important Considerations Understand the Problem at hand Identify the problem you are solving Determine missing capabilities, skills, and behaviors Evaluation and Benchmarks Make sure you can measure what you are enabling Use LLMs as a judge Use BabelBench with 300+ tasks, track general capabilities Invest in Better Data Focus on higher quality, not quantity Less finetuning data is better to keep general capabilities Leverage LLMs to generate data Human annotations if available No Free Lunch Fine-tuning reduces general capability over time Model forgets knowledge outside target domain Loss of general "thinking" skills
  • 15. Thank you Nathan Bijnens Sr Manager, Data & AI Microsoft Nathan.Bijnens@microsoft.com

Editor's Notes

  • #11: FY25 - CSU Delivery to Consumption DEV - Power BI Quota
  • #17: @hardik to change this inter model comparison.
  • #36: Now, let’s take a closer look at each of the latest Phi-3.5 models.   Phi-3.5-mini The 3.8B parameter Phi-3.5-mini model supports over 20 languages and is capable of maintaining coherence and context with its 128K long-context window support. This model excels in various tasks including reasoning, mathematics, code generation, summarizing length documents or meeting transcripts. It has been instruction tuned and fully safety aligned with our Responsible AI principles.
  • #37: The Phi-3.5-vision model is multi-modal with 4.2B parameters that can handle both text and vision inputs. It is suitable for tasks that require visual and textual analysis. The model also supports 128K context length. It excels in complex reasoning, optical character recognition, and multi-frame summarization tasks. Same as its mini model sibling, the vision model has been instruction tuned and safety aligned.
  • #38: The Phi-3.5-MoE model is the only mixture-of-experts model in the Phi family. It has 16 modules of experts, with a total of 42B parameters. During the token processing, 2 experts are activated, thus only requiring computational resources to handle 6.6B parameters, making the MoE model incredibly computationally efficient while outperforming other dense models of similar sizes. This MoE model also supports more than 20 languages with 128K long-context support. It excels in real-world and academic benchmarks, surpassing several leading models in various tasks including reasoning, mathematics, and code generation.
  翻译: