SlideShare a Scribd company logo
Running LLM in Kubernetes
Volodymyr Tsap
CTO @ SHALB
What is LLM?
A large language model (LLM) is a language model notable for its ability to achieve general-purpose language
generation and understanding.
LLMs acquire these abilities by learning statistical relationships from text documents during a computationally
intensive self-supervised and semi-supervised training process.
LLMs are artificial neural networks, the largest and most capable of which are built with a transformer-based
architecture.
Wikipedia
What is Transformers?
! Transformers are a type of deep learning model that have revolutionized the way natural
language processing tasks are approached.
! Transformers utilize a unique architecture that relies on self-attention mechanisms to weigh
the significance of different words in a sentence. This allows the model to capture the context of
each word more effectively than previous models, leading to better understanding and
generation of text.
Building LLM. Data Collection and Preparation.
! Collect a large and diverse dataset from various sources such as books, websites, and other
texts.
! Clean and preprocess the data to remove irrelevant content, normalize text (e.g., lowercasing,
removing special characters), and ensure data quality.
Building LLM. Tokenization and Vocabulary Building.
! Tokenize the text data into smaller units (tokens) such as words, subwords, or characters. This
step may involve choosing a specific tokenization algorithm (e.g., BPE, WordPiece).
! Create a vocabulary of unique tokens and possibly generate embeddings for them. This could
involve pre-training embeddings or using embeddings from an existing model.
Building LLM. Model Architecture Design.
! Choose a transformer architecture (e.g., GPT, BERT) that suits the goals of your LLM. This
involves deciding on the number of layers, attention heads, and other hyperparameters.
! Implement or adapt an existing transformer model framework using deep learning libraries such
as TensorFlow or PyTorch.
Building LLM. Model Architecture Design.
Building LLM. Training.
! Split the data into training, validation, and test sets.
! Pre-train the model on the collected data, which involves running it through the computation of
weights over multiple epochs. This step is computationally intensive and can take from hours to
weeks depending on the model size and hardware capabilities.
! Use techniques such as gradient clipping, learning rate scheduling, and regularization to
improve training efficiency and model performance.
Building LLM. Fine-Tuning (Optional).
! Fine-tune the pre-trained model on a smaller, task-specific dataset if the LLM will be used for
specific applications (e.g., question answering, sentiment analysis).
! Adjust hyperparameters and training settings to optimize performance for the target task.
Building LLM. Evaluation and Testing.
! Evaluate the model on a test set to measure its performance using appropriate metrics (e.g.,
accuracy, F1 score, perplexity).
! Perform error analysis and adjust the training process as necessary to improve model quality.
Building LLM. Saving and Deployment.
! Save the trained model weights and configuration to files.
! Deploy the model for inference, which can involve setting up a serving infrastructure capable of
handling requests in real-time or batch processing.
TLDR. Watch Andrej Karpathy Explanation.
Hugging Face - GitHub for LLM’s
LLM Files
LLM Files
How to run? Using Google Colab with T4 gpu
How to run? Using laptop and llama.cpp. Quantization.
Using Managed Cloud Services.
! Amazon SageMaker
! Google Cloud AI Platform & Vertex AI
! Microsoft Azure Machine Learning
! NVIDIA AI Enterprise
! Hugging Face Endpoints
! AnyScale Endpoints
Why to run them in Kubernetes?
1. We already know him :)
2. Scalability. Resource efficiency, HPA, auto-scaling, API Limits, etc..
3. Price. Managed service 20-40% overhead. Reserved instances.
4. GPU sharing.
5. ML ecosystem - pipelines, artifacts. (KubeFlow, Ray Framework).
6. No vendor lock. Transportable.
LLM Serving Frameworks
Options to run LLM on K8s.
1. KServe from Kubeflow.
2. Ray Serve from Ray Framework.
3. Flux AI controller.
4. Own Kubernetes wrapper on top of Frameworks.
We choose TGI and made it Kubernetes ready.
We have Docker, Lets adapt it to Kubernetes
Demo Time!
Let’s bootstrap infrastructure from cluster.dev template
Then add model configuration
Apply and check the model is running
Changing models and infrastructure
Enabling HF chat-ui
Deploy Monitoring and Metrics with DCGM Exporter
Thank you! Now Questions?
Ad

More Related Content

What's hot (20)

Agile Integration eBook from 2018
Agile Integration eBook from 2018Agile Integration eBook from 2018
Agile Integration eBook from 2018
Kim Clark
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
루시드웍스 퓨전 소개서
루시드웍스 퓨전  소개서루시드웍스 퓨전  소개서
루시드웍스 퓨전 소개서
JunMyoung(준명) Youn(연)
 
Modular architecture today
Modular architecture todayModular architecture today
Modular architecture today
pragkirk
 
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Spark Summit
 
Resume akash storage admin
Resume akash storage adminResume akash storage admin
Resume akash storage admin
Akash Mishra
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
Jenkins for java world
Jenkins for java worldJenkins for java world
Jenkins for java world
Ashok Kumar
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
Serverless Machine Learning Model Inference on Kubernetes with KServe.pdf
Serverless Machine Learning Model Inference on Kubernetes with KServe.pdfServerless Machine Learning Model Inference on Kubernetes with KServe.pdf
Serverless Machine Learning Model Inference on Kubernetes with KServe.pdf
Stavros Kontopoulos
 
Travis CI
Travis CITravis CI
Travis CI
Анете Аннемария
 
[Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개
[Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개 [Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개
[Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개
CJ Olivenetworks
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
GlobalLogic Ukraine
 
Codifying the Build and Release Process with a Jenkins Pipeline Shared Library
Codifying the Build and Release Process with a Jenkins Pipeline Shared LibraryCodifying the Build and Release Process with a Jenkins Pipeline Shared Library
Codifying the Build and Release Process with a Jenkins Pipeline Shared Library
Alvin Huang
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programming
Andrei Pangin
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Jenkins Introduction
Jenkins IntroductionJenkins Introduction
Jenkins Introduction
Pavan Gupta
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
Animesh Singh
 
Kubeflow
KubeflowKubeflow
Kubeflow
Karane Vieira
 
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
Simplilearn
 
Agile Integration eBook from 2018
Agile Integration eBook from 2018Agile Integration eBook from 2018
Agile Integration eBook from 2018
Kim Clark
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Modular architecture today
Modular architecture todayModular architecture today
Modular architecture today
pragkirk
 
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Spark Summit
 
Resume akash storage admin
Resume akash storage adminResume akash storage admin
Resume akash storage admin
Akash Mishra
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
Jenkins for java world
Jenkins for java worldJenkins for java world
Jenkins for java world
Ashok Kumar
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
Serverless Machine Learning Model Inference on Kubernetes with KServe.pdf
Serverless Machine Learning Model Inference on Kubernetes with KServe.pdfServerless Machine Learning Model Inference on Kubernetes with KServe.pdf
Serverless Machine Learning Model Inference on Kubernetes with KServe.pdf
Stavros Kontopoulos
 
[Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개
[Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개 [Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개
[Gridgain]인메모리컴퓨팅 및 국내레퍼런스 소개
CJ Olivenetworks
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
GlobalLogic Ukraine
 
Codifying the Build and Release Process with a Jenkins Pipeline Shared Library
Codifying the Build and Release Process with a Jenkins Pipeline Shared LibraryCodifying the Build and Release Process with a Jenkins Pipeline Shared Library
Codifying the Build and Release Process with a Jenkins Pipeline Shared Library
Alvin Huang
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programming
Andrei Pangin
 
Jenkins Introduction
Jenkins IntroductionJenkins Introduction
Jenkins Introduction
Pavan Gupta
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
Animesh Singh
 
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
Simplilearn
 

Similar to "Running Open-Source LLM models on Kubernetes", Volodymyr Tsap (20)

GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
Muralidharan Deenathayalan
 
AI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptxAI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptx
emceemouli
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami Webinar
Taqi Jaffri
 
Understanding Large Language Models (1).pptx
Understanding Large Language Models (1).pptxUnderstanding Large Language Models (1).pptx
Understanding Large Language Models (1).pptx
RabikaKhalid
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
Usama Wahab Khan Cloud, Data and AI
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
Nick Pentreath
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Empower with visual charts (1)and llms and generative ai.pptx
Empower with visual charts (1)and llms and generative ai.pptxEmpower with visual charts (1)and llms and generative ai.pptx
Empower with visual charts (1)and llms and generative ai.pptx
JOBANPREETSINGH62
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
AI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptxAI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptx
emceemouli
 
Build with AI - 2025- AI is Future .pptx
Build with AI - 2025- AI is Future .pptxBuild with AI - 2025- AI is Future .pptx
Build with AI - 2025- AI is Future .pptx
QiratAzam
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
RaviKumarDaparthi
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
0567Padma
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Empower with visual charts__________.pptx
Empower with visual charts__________.pptxEmpower with visual charts__________.pptx
Empower with visual charts__________.pptx
JOBANPREETSINGH62
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Agile Impact Conference
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Agile Impact
 
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
Muralidharan Deenathayalan
 
AI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptxAI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptx
emceemouli
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami Webinar
Taqi Jaffri
 
Understanding Large Language Models (1).pptx
Understanding Large Language Models (1).pptxUnderstanding Large Language Models (1).pptx
Understanding Large Language Models (1).pptx
RabikaKhalid
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
Nick Pentreath
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Empower with visual charts (1)and llms and generative ai.pptx
Empower with visual charts (1)and llms and generative ai.pptxEmpower with visual charts (1)and llms and generative ai.pptx
Empower with visual charts (1)and llms and generative ai.pptx
JOBANPREETSINGH62
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
AI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptxAI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptx
emceemouli
 
Build with AI - 2025- AI is Future .pptx
Build with AI - 2025- AI is Future .pptxBuild with AI - 2025- AI is Future .pptx
Build with AI - 2025- AI is Future .pptx
QiratAzam
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
RaviKumarDaparthi
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
0567Padma
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Empower with visual charts__________.pptx
Empower with visual charts__________.pptxEmpower with visual charts__________.pptx
Empower with visual charts__________.pptx
JOBANPREETSINGH62
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Agile Impact Conference
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Agile Impact
 
Ad

More from Fwdays (20)

Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...
Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...
Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...
Fwdays
 
"Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her...
"Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her..."Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her...
"Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her...
Fwdays
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
"Must-have AI-tools for cost-efficient marketing", Irina Smirnova
"Must-have AI-tools for cost-efficient marketing",  Irina Smirnova"Must-have AI-tools for cost-efficient marketing",  Irina Smirnova
"Must-have AI-tools for cost-efficient marketing", Irina Smirnova
Fwdays
 
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
 
"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy
"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy
"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy
Fwdays
 
"Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies", V...
"Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies",  V..."Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies",  V...
"Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies", V...
Fwdays
 
"Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka...
"Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka..."Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka...
"Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka...
Fwdays
 
Performance Marketing Research для запуску нового WorldWide продукту
Performance Marketing Research для запуску нового WorldWide продуктуPerformance Marketing Research для запуску нового WorldWide продукту
Performance Marketing Research для запуску нового WorldWide продукту
Fwdays
 
"Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu...
"Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu..."Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu...
"Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu...
Fwdays
 
"AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea...
"AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea..."AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea...
"AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea...
Fwdays
 
"Constructive Interaction During Emotional Burnout: With Local and Internatio...
"Constructive Interaction During Emotional Burnout: With Local and Internatio..."Constructive Interaction During Emotional Burnout: With Local and Internatio...
"Constructive Interaction During Emotional Burnout: With Local and Internatio...
Fwdays
 
"Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil...
"Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil..."Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil...
"Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil...
Fwdays
 
"39 offers for my mentees in a year. How to create a professional environment...
"39 offers for my mentees in a year. How to create a professional environment..."39 offers for my mentees in a year. How to create a professional environment...
"39 offers for my mentees in a year. How to create a professional environment...
Fwdays
 
"From “doing tasks” to leadership: how to adapt management style to the conte...
"From “doing tasks” to leadership: how to adapt management style to the conte..."From “doing tasks” to leadership: how to adapt management style to the conte...
"From “doing tasks” to leadership: how to adapt management style to the conte...
Fwdays
 
[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...
[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...
[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...
Fwdays
 
[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...
[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...
[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...
Fwdays
 
[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...
[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...
[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...
Fwdays
 
"Dialogue about fakapas: how to pass an interview without unnecessary mistake...
"Dialogue about fakapas: how to pass an interview without unnecessary mistake..."Dialogue about fakapas: how to pass an interview without unnecessary mistake...
"Dialogue about fakapas: how to pass an interview without unnecessary mistake...
Fwdays
 
"Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest...
"Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest..."Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest...
"Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest...
Fwdays
 
Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...
Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...
Від KPI до OKR: як синхронізувати продажі, маркетинг і продукт, щоб бізнес ре...
Fwdays
 
"Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her...
"Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her..."Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her...
"Demand Generation: How a Founder’s Brand Turns Content into Leads", Alex Her...
Fwdays
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
"Must-have AI-tools for cost-efficient marketing", Irina Smirnova
"Must-have AI-tools for cost-efficient marketing",  Irina Smirnova"Must-have AI-tools for cost-efficient marketing",  Irina Smirnova
"Must-have AI-tools for cost-efficient marketing", Irina Smirnova
Fwdays
 
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
 
"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy
"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy
"Building a Product IT Team in a Defense-Tech Company", Arthur Seletskiy
Fwdays
 
"Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies", V...
"Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies",  V..."Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies",  V...
"Scaling Smart: GTM Strategies that Fuel Growth for Service IT Companies", V...
Fwdays
 
"Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka...
"Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka..."Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka...
"Pushy Sales Don’t Work: How to Sell Without Driving People Crazy", Aliona Ka...
Fwdays
 
Performance Marketing Research для запуску нового WorldWide продукту
Performance Marketing Research для запуску нового WorldWide продуктуPerformance Marketing Research для запуску нового WorldWide продукту
Performance Marketing Research для запуску нового WorldWide продукту
Fwdays
 
"Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu...
"Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu..."Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu...
"Scaling Product Mindset: From Individual Ideas to Team Culture", Oksana Holu...
Fwdays
 
"AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea...
"AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea..."AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea...
"AI-Driven Automation for High-Performing Teams: Optimize Routine Tasks & Lea...
Fwdays
 
"Constructive Interaction During Emotional Burnout: With Local and Internatio...
"Constructive Interaction During Emotional Burnout: With Local and Internatio..."Constructive Interaction During Emotional Burnout: With Local and Internatio...
"Constructive Interaction During Emotional Burnout: With Local and Internatio...
Fwdays
 
"Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil...
"Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil..."Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil...
"Perfectionisin: What Does the Medicine for Perfectionism Look Like?", Manoil...
Fwdays
 
"39 offers for my mentees in a year. How to create a professional environment...
"39 offers for my mentees in a year. How to create a professional environment..."39 offers for my mentees in a year. How to create a professional environment...
"39 offers for my mentees in a year. How to create a professional environment...
Fwdays
 
"From “doing tasks” to leadership: how to adapt management style to the conte...
"From “doing tasks” to leadership: how to adapt management style to the conte..."From “doing tasks” to leadership: how to adapt management style to the conte...
"From “doing tasks” to leadership: how to adapt management style to the conte...
Fwdays
 
[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...
[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...
[QUICK TALK] "Why Some Teams Grow Better Under Pressure", Oleksandr Marchenko...
Fwdays
 
[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...
[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...
[QUICK TALK] "How to study to acquire a skill, not a certificate?", Uliana Du...
Fwdays
 
[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...
[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...
[QUICK TALK] "Coaching 101: How to Identify and Develop Your Leadership Quali...
Fwdays
 
"Dialogue about fakapas: how to pass an interview without unnecessary mistake...
"Dialogue about fakapas: how to pass an interview without unnecessary mistake..."Dialogue about fakapas: how to pass an interview without unnecessary mistake...
"Dialogue about fakapas: how to pass an interview without unnecessary mistake...
Fwdays
 
"Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest...
"Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest..."Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest...
"Conflicts within a Team: Not an Enemy, But an Opportunity for Growth", Orest...
Fwdays
 
Ad

Recently uploaded (20)

DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 

"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap

  • 1. Running LLM in Kubernetes Volodymyr Tsap CTO @ SHALB
  • 2. What is LLM? A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and understanding. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs are artificial neural networks, the largest and most capable of which are built with a transformer-based architecture. Wikipedia
  • 3. What is Transformers? ! Transformers are a type of deep learning model that have revolutionized the way natural language processing tasks are approached. ! Transformers utilize a unique architecture that relies on self-attention mechanisms to weigh the significance of different words in a sentence. This allows the model to capture the context of each word more effectively than previous models, leading to better understanding and generation of text.
  • 4. Building LLM. Data Collection and Preparation. ! Collect a large and diverse dataset from various sources such as books, websites, and other texts. ! Clean and preprocess the data to remove irrelevant content, normalize text (e.g., lowercasing, removing special characters), and ensure data quality.
  • 5. Building LLM. Tokenization and Vocabulary Building. ! Tokenize the text data into smaller units (tokens) such as words, subwords, or characters. This step may involve choosing a specific tokenization algorithm (e.g., BPE, WordPiece). ! Create a vocabulary of unique tokens and possibly generate embeddings for them. This could involve pre-training embeddings or using embeddings from an existing model.
  • 6. Building LLM. Model Architecture Design. ! Choose a transformer architecture (e.g., GPT, BERT) that suits the goals of your LLM. This involves deciding on the number of layers, attention heads, and other hyperparameters. ! Implement or adapt an existing transformer model framework using deep learning libraries such as TensorFlow or PyTorch.
  • 7. Building LLM. Model Architecture Design.
  • 8. Building LLM. Training. ! Split the data into training, validation, and test sets. ! Pre-train the model on the collected data, which involves running it through the computation of weights over multiple epochs. This step is computationally intensive and can take from hours to weeks depending on the model size and hardware capabilities. ! Use techniques such as gradient clipping, learning rate scheduling, and regularization to improve training efficiency and model performance.
  • 9. Building LLM. Fine-Tuning (Optional). ! Fine-tune the pre-trained model on a smaller, task-specific dataset if the LLM will be used for specific applications (e.g., question answering, sentiment analysis). ! Adjust hyperparameters and training settings to optimize performance for the target task.
  • 10. Building LLM. Evaluation and Testing. ! Evaluate the model on a test set to measure its performance using appropriate metrics (e.g., accuracy, F1 score, perplexity). ! Perform error analysis and adjust the training process as necessary to improve model quality.
  • 11. Building LLM. Saving and Deployment. ! Save the trained model weights and configuration to files. ! Deploy the model for inference, which can involve setting up a serving infrastructure capable of handling requests in real-time or batch processing.
  • 12. TLDR. Watch Andrej Karpathy Explanation.
  • 13. Hugging Face - GitHub for LLM’s
  • 16. How to run? Using Google Colab with T4 gpu
  • 17. How to run? Using laptop and llama.cpp. Quantization.
  • 18. Using Managed Cloud Services. ! Amazon SageMaker ! Google Cloud AI Platform & Vertex AI ! Microsoft Azure Machine Learning ! NVIDIA AI Enterprise ! Hugging Face Endpoints ! AnyScale Endpoints
  • 19. Why to run them in Kubernetes? 1. We already know him :) 2. Scalability. Resource efficiency, HPA, auto-scaling, API Limits, etc.. 3. Price. Managed service 20-40% overhead. Reserved instances. 4. GPU sharing. 5. ML ecosystem - pipelines, artifacts. (KubeFlow, Ray Framework). 6. No vendor lock. Transportable.
  • 21. Options to run LLM on K8s. 1. KServe from Kubeflow. 2. Ray Serve from Ray Framework. 3. Flux AI controller. 4. Own Kubernetes wrapper on top of Frameworks.
  • 22. We choose TGI and made it Kubernetes ready.
  • 23. We have Docker, Lets adapt it to Kubernetes
  • 25. Let’s bootstrap infrastructure from cluster.dev template
  • 26. Then add model configuration
  • 27. Apply and check the model is running
  • 28. Changing models and infrastructure
  • 30. Deploy Monitoring and Metrics with DCGM Exporter
  • 31. Thank you! Now Questions?
  翻译: