SlideShare a Scribd company logo
1
Austin Parker, Principal Developer Advocate
Solving the
Hidden Costs
of Kubernetes
Who Am I?
Austin Parker
Principal Developer Advocate
2
@austinlparker
austin@lightstep.com✉️
Kubernetes
1
3
Why Kubernetes?
Kubernetes provides a convenient abstraction for
deploying, scaling, and running containerized services
in a distributed system.
However, this convenience has come at a price.
4
What Kubernetes Doesn’t Do
Kubernetes doesn’t…
● Make it easy to understand the changes in performance resulting
from a deployment
● Help you aggregate telemetry data from your services without
additional layers of tooling
● Solve traditional monitoring questions, like how to understand
unknown-unknowns
5
What you can control
What you are
responsible for
Stress (n): responsibility without control
6
Hidden Costs
2
7
Monitoring Costs
Kubernetes adds complexity and cost to traditional
time-series metric and log aggregation monitoring
solutions.
As your application scales, tag count increases, which
leads to increased bills.
Building it yourself? We’ll get to that...
8
Service Dependencies
Martin Fowler (in 2015) argues that the “microservices
premium” incurs such a high price that you’ll want to
build a monolith first, even if the application will be
eventually migrated to a microservice architecture.
Kelsey Hightower is arguing for monoliths in 2020.
Time is a flat circle.
9
Bus Factor
Not only do we increase the surface area of things
that can go wrong, but those failure states are less
accessible on average without domain experts.
How resilient are you to…
● Managed service failure/deprecation
● Loss of institutional knowledge
● Unforeseen externalities
10
Observability
3
11
Lightstep root-causes performance problems
anywhere, from mobile all the way to the
bottom of our distributed stack. This is the
future of monitoring.
Observability is the process and
practice of understanding your
system.
12
What’s the difference?
There’s too many unknowns in modern software to
just throw together some dashboards and call it a day.
Think about the things you might need or want to
know…
● Application logs and metrics
● Kubelet statistics (CPU, Memory)
○ Container, Pod, Namespace, Cluster,
Node…
● Resource Utilization
● Request tracing
13
How does observability help?
Observability provides a comprehensive and holistic approach to
understanding your entire system, not just individual pieces. It
accomplishes this by focusing on cultural and process changes,
empowered by a different way of thinking about data.
It’s not just a new tool or dashboard to add to your existing mix.
14
Quantifying Hidden Costs
4
15
What it really costs to monitor a system
There’s two primary costs associated with traditional monitoring
systems:
● Capital costs required for collection and storage of telemetry
● Operational costs required for engineering time to investigate
telemetry
16
Calculating Capital Costs
Initial Factors
● Aggregating + Indexing Logs per Service:
○ Storage
○ Compute
○ Network
● Peak instance count
● Retention period
● Services involved in a request.
Initial Values
Assuming 50GB of log data a day, 14 day retention,
high availability (no cold storage)
1 Master (Large Compute Optimized Instance) @ $89
2 Data (XL Memory Optimized Instance) @ $426
3 SSDs (General Purpose) @ $201
17
$716Cloud Spend per Month @ 50GB/logs
18
~$3,386Total after setup, maintenance (monthly)
19
Calculating Operational Costs
Initial factors
● How much does an engineer cost?
● How much time do they work in a year?
● How many disruptions are investigated per
year?
● How long does an investigation take?
● How many developers are involved in
investigating disruptions?
● How many deployments are being made, per
developer?
● How many minutes are spent validating those
deployments?
Initial values
We calculate an average annual cost of $202,500 for
an engineer.
Each engineer works 2,268 hours a year, for an hourly
cost of $89.
We calculate 10 disruptions per year, with 4 hours
average per disruption in investigation time.
We calculate 50 deployments per developer per year
(normalized), with 30 minutes of validation.
20
$3,571Cost per disruption
21
$133,929Annual cost of validating deployments
22
Efficiency is key!
Effective use of observability can reduce time spent
on validating deployments and investigating
disruptions by a factor of 5 or more!
What does that look like?
23
$3,571 $133,929
Efficiency is key!
Effective use of observability can reduce time spent
on validating deployments and investigating
disruptions by a factor of 5 or more!
What does that look like?
24
$3,571 $133,929
Efficiency is key!
Effective use of observability can reduce time spent
on validating deployments and investigating
disruptions by a factor of 5 or more!
What does that look like?
25
$712 $26,700
Holistic Observability
5
26
Tackling Hidden Costs
Holistic approaches to observability utilize context as a mechanism to
control costs, both capital and operational.
This context is generated in two ways.
27
Communication Context
Observability gives us a shared language to discuss
performance through Service Level Objectives and
Service Level Indicators.
Observability helps avoid alarm fatigue by focusing
on the data that’s important during a disruption.
This shared context helps your team save time and
set goals more efficiently.
28
Data Context
Observability is built by combining multiple data
sources (logs, metrics, traces) into insights rather than
through separate dashboards and workflows.
Request traces provide context that reduces the
search space for relevant metrics and logs.
These efficiencies not only help save time, but they
reduce the storage and processing cost of sifting
through telemetry data.
29
Implementing Observability
6
30
Keys to Implementing Observability
Make sure your services are observable.
Focus on key SLI’s, make sure SLO’s focus on business goals.
Put people at the center of your strategy.
31
Observable Services
Use open standards like OpenTelemetry for
instrumenting your service code. Think about service
meshes, like Istio, as a way to quickly bootstrap
request tracing.
OpenTelemetry provides a single set of APIs, SDKs,
and tools for generating distributed traces and
metrics from your services.
32
Standardize around the ‘golden signals’ - latency,
throughput, errors.
Prefer SLO’s that are aligned with business outcomes;
Less thinking about “five nines” and more about “when
people try to play a song, it plays”
Create Effective SLOs
33
Focus on People
Use observability insights and data as a component of your
retrospectives and sprint planning.
Kubernetes increases complexity; Observability can make it more
accessible to people that aren’t “experts”.
Tools (like Kubernetes!) are supposed to make hard things easier --
observability lets you see the impact of this, and not get surprised or
burned out by changes.
34
Q&A
7
35
Get Started Today
go.lightstep.com/trial
Austin Parker, Principal Developer Advocate
Thank you.
Ad

More Related Content

What's hot (20)

Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetryJuraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Juliano Costa
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
Kevin Brockhoff
 
Software cracking and patching
Software cracking and patchingSoftware cracking and patching
Software cracking and patching
Mayank Gavri
 
Introduction to Distributed Tracing
Introduction to Distributed TracingIntroduction to Distributed Tracing
Introduction to Distributed Tracing
petabridge
 
Observability and more architecture next 2020
Observability and more   architecture next 2020Observability and more   architecture next 2020
Observability and more architecture next 2020
Alon Fliess
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
Itiel Shwartz
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
DevOpsDays Tel Aviv
 
[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability
WSO2
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
Hemant Kumar
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioDistributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Max Klyga
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
Tonny Adhi Sabastian
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
Oracle Korea
 
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Yuri Shkuro
 
Distributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your productionDistributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your production
nklmish
 
Introduction to Opendaylight
Introduction to OpendaylightIntroduction to Opendaylight
Introduction to Opendaylight
Beny Raja
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
InfluxData
 
Openstack workshop @ Kalasalingam
Openstack workshop @ KalasalingamOpenstack workshop @ Kalasalingam
Openstack workshop @ Kalasalingam
Beny Raja
 
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup ZurichJaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
⛑ Pavol Loffay
 
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
Ambassador Labs
 
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetryJuraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Juliano Costa
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
Kevin Brockhoff
 
Software cracking and patching
Software cracking and patchingSoftware cracking and patching
Software cracking and patching
Mayank Gavri
 
Introduction to Distributed Tracing
Introduction to Distributed TracingIntroduction to Distributed Tracing
Introduction to Distributed Tracing
petabridge
 
Observability and more architecture next 2020
Observability and more   architecture next 2020Observability and more   architecture next 2020
Observability and more architecture next 2020
Alon Fliess
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
Itiel Shwartz
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
DevOpsDays Tel Aviv
 
[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability
WSO2
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
Hemant Kumar
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioDistributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Max Klyga
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
Tonny Adhi Sabastian
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
Oracle Korea
 
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Yuri Shkuro
 
Distributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your productionDistributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your production
nklmish
 
Introduction to Opendaylight
Introduction to OpendaylightIntroduction to Opendaylight
Introduction to Opendaylight
Beny Raja
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
InfluxData
 
Openstack workshop @ Kalasalingam
Openstack workshop @ KalasalingamOpenstack workshop @ Kalasalingam
Openstack workshop @ Kalasalingam
Beny Raja
 
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup ZurichJaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
⛑ Pavol Loffay
 
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
Ambassador Labs
 

Similar to Solving the Hidden Costs of Kubernetes with Observability (20)

Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
Komodor
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
Splunk
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
IBM Danmark
 
SRE & Kubernetes
SRE & KubernetesSRE & Kubernetes
SRE & Kubernetes
Afkham Azeez
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & LogsSplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
Splunk
 
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptxObservability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Magnus Johansson
 
what-full-stack-observability-requires-today.pptx
what-full-stack-observability-requires-today.pptxwhat-full-stack-observability-requires-today.pptx
what-full-stack-observability-requires-today.pptx
Ed Hossam
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic Observability
Elasticsearch
 
A Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud ComputingA Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud Computing
IRJET Journal
 
SplunkLive! Munich 2018: Integrating Metrics and Logs
SplunkLive! Munich 2018: Integrating Metrics and LogsSplunkLive! Munich 2018: Integrating Metrics and Logs
SplunkLive! Munich 2018: Integrating Metrics and Logs
Splunk
 
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
DevOps.com
 
Replace that cracked junk! Gotta do them.
Replace that cracked junk! Gotta do them.Replace that cracked junk! Gotta do them.
Replace that cracked junk! Gotta do them.
tobiasaldini37
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
Anant Corporation
 
Ab04405158161
Ab04405158161Ab04405158161
Ab04405158161
IJERA Editor
 
Print report
Print reportPrint report
Print report
Ved Prakash
 
The Frugal Architecture in Practice.pptx
The Frugal Architecture in Practice.pptxThe Frugal Architecture in Practice.pptx
The Frugal Architecture in Practice.pptx
Fwdays
 
Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern Analytics
Sense Corp
 
White Paper Reduce Infrastructure Cost With Microsoft System Center
White Paper  Reduce Infrastructure Cost With Microsoft System CenterWhite Paper  Reduce Infrastructure Cost With Microsoft System Center
White Paper Reduce Infrastructure Cost With Microsoft System Center
rajeshchoudhary23281
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
Komodor
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
Splunk
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
IBM Danmark
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & LogsSplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
Splunk
 
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptxObservability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Magnus Johansson
 
what-full-stack-observability-requires-today.pptx
what-full-stack-observability-requires-today.pptxwhat-full-stack-observability-requires-today.pptx
what-full-stack-observability-requires-today.pptx
Ed Hossam
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic Observability
Elasticsearch
 
A Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud ComputingA Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud Computing
IRJET Journal
 
SplunkLive! Munich 2018: Integrating Metrics and Logs
SplunkLive! Munich 2018: Integrating Metrics and LogsSplunkLive! Munich 2018: Integrating Metrics and Logs
SplunkLive! Munich 2018: Integrating Metrics and Logs
Splunk
 
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
DevOps.com
 
Replace that cracked junk! Gotta do them.
Replace that cracked junk! Gotta do them.Replace that cracked junk! Gotta do them.
Replace that cracked junk! Gotta do them.
tobiasaldini37
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
Anant Corporation
 
The Frugal Architecture in Practice.pptx
The Frugal Architecture in Practice.pptxThe Frugal Architecture in Practice.pptx
The Frugal Architecture in Practice.pptx
Fwdays
 
Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern Analytics
Sense Corp
 
White Paper Reduce Infrastructure Cost With Microsoft System Center
White Paper  Reduce Infrastructure Cost With Microsoft System CenterWhite Paper  Reduce Infrastructure Cost With Microsoft System Center
White Paper Reduce Infrastructure Cost With Microsoft System Center
rajeshchoudhary23281
 
Ad

More from DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
DevOps.com
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
DevOps.com
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
DevOps.com
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
DevOps.com
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
DevOps.com
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
DevOps.com
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
DevOps.com
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
DevOps.com
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
DevOps.com
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
DevOps.com
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
DevOps.com
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
DevOps.com
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
DevOps.com
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
DevOps.com
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
DevOps.com
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
DevOps.com
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
DevOps.com
 
Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
DevOps.com
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
DevOps.com
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
DevOps.com
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
DevOps.com
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
DevOps.com
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
DevOps.com
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
DevOps.com
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
DevOps.com
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
DevOps.com
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
DevOps.com
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
DevOps.com
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
DevOps.com
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
DevOps.com
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
DevOps.com
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
DevOps.com
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
DevOps.com
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
DevOps.com
 
Ad

Recently uploaded (20)

AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 

Solving the Hidden Costs of Kubernetes with Observability

  • 1. 1 Austin Parker, Principal Developer Advocate Solving the Hidden Costs of Kubernetes
  • 2. Who Am I? Austin Parker Principal Developer Advocate 2 @austinlparker austin@lightstep.com✉️
  • 4. Why Kubernetes? Kubernetes provides a convenient abstraction for deploying, scaling, and running containerized services in a distributed system. However, this convenience has come at a price. 4
  • 5. What Kubernetes Doesn’t Do Kubernetes doesn’t… ● Make it easy to understand the changes in performance resulting from a deployment ● Help you aggregate telemetry data from your services without additional layers of tooling ● Solve traditional monitoring questions, like how to understand unknown-unknowns 5
  • 6. What you can control What you are responsible for Stress (n): responsibility without control 6
  • 8. Monitoring Costs Kubernetes adds complexity and cost to traditional time-series metric and log aggregation monitoring solutions. As your application scales, tag count increases, which leads to increased bills. Building it yourself? We’ll get to that... 8
  • 9. Service Dependencies Martin Fowler (in 2015) argues that the “microservices premium” incurs such a high price that you’ll want to build a monolith first, even if the application will be eventually migrated to a microservice architecture. Kelsey Hightower is arguing for monoliths in 2020. Time is a flat circle. 9
  • 10. Bus Factor Not only do we increase the surface area of things that can go wrong, but those failure states are less accessible on average without domain experts. How resilient are you to… ● Managed service failure/deprecation ● Loss of institutional knowledge ● Unforeseen externalities 10
  • 12. Lightstep root-causes performance problems anywhere, from mobile all the way to the bottom of our distributed stack. This is the future of monitoring. Observability is the process and practice of understanding your system. 12
  • 13. What’s the difference? There’s too many unknowns in modern software to just throw together some dashboards and call it a day. Think about the things you might need or want to know… ● Application logs and metrics ● Kubelet statistics (CPU, Memory) ○ Container, Pod, Namespace, Cluster, Node… ● Resource Utilization ● Request tracing 13
  • 14. How does observability help? Observability provides a comprehensive and holistic approach to understanding your entire system, not just individual pieces. It accomplishes this by focusing on cultural and process changes, empowered by a different way of thinking about data. It’s not just a new tool or dashboard to add to your existing mix. 14
  • 16. What it really costs to monitor a system There’s two primary costs associated with traditional monitoring systems: ● Capital costs required for collection and storage of telemetry ● Operational costs required for engineering time to investigate telemetry 16
  • 17. Calculating Capital Costs Initial Factors ● Aggregating + Indexing Logs per Service: ○ Storage ○ Compute ○ Network ● Peak instance count ● Retention period ● Services involved in a request. Initial Values Assuming 50GB of log data a day, 14 day retention, high availability (no cold storage) 1 Master (Large Compute Optimized Instance) @ $89 2 Data (XL Memory Optimized Instance) @ $426 3 SSDs (General Purpose) @ $201 17
  • 18. $716Cloud Spend per Month @ 50GB/logs 18
  • 19. ~$3,386Total after setup, maintenance (monthly) 19
  • 20. Calculating Operational Costs Initial factors ● How much does an engineer cost? ● How much time do they work in a year? ● How many disruptions are investigated per year? ● How long does an investigation take? ● How many developers are involved in investigating disruptions? ● How many deployments are being made, per developer? ● How many minutes are spent validating those deployments? Initial values We calculate an average annual cost of $202,500 for an engineer. Each engineer works 2,268 hours a year, for an hourly cost of $89. We calculate 10 disruptions per year, with 4 hours average per disruption in investigation time. We calculate 50 deployments per developer per year (normalized), with 30 minutes of validation. 20
  • 22. $133,929Annual cost of validating deployments 22
  • 23. Efficiency is key! Effective use of observability can reduce time spent on validating deployments and investigating disruptions by a factor of 5 or more! What does that look like? 23 $3,571 $133,929
  • 24. Efficiency is key! Effective use of observability can reduce time spent on validating deployments and investigating disruptions by a factor of 5 or more! What does that look like? 24 $3,571 $133,929
  • 25. Efficiency is key! Effective use of observability can reduce time spent on validating deployments and investigating disruptions by a factor of 5 or more! What does that look like? 25 $712 $26,700
  • 27. Tackling Hidden Costs Holistic approaches to observability utilize context as a mechanism to control costs, both capital and operational. This context is generated in two ways. 27
  • 28. Communication Context Observability gives us a shared language to discuss performance through Service Level Objectives and Service Level Indicators. Observability helps avoid alarm fatigue by focusing on the data that’s important during a disruption. This shared context helps your team save time and set goals more efficiently. 28
  • 29. Data Context Observability is built by combining multiple data sources (logs, metrics, traces) into insights rather than through separate dashboards and workflows. Request traces provide context that reduces the search space for relevant metrics and logs. These efficiencies not only help save time, but they reduce the storage and processing cost of sifting through telemetry data. 29
  • 31. Keys to Implementing Observability Make sure your services are observable. Focus on key SLI’s, make sure SLO’s focus on business goals. Put people at the center of your strategy. 31
  • 32. Observable Services Use open standards like OpenTelemetry for instrumenting your service code. Think about service meshes, like Istio, as a way to quickly bootstrap request tracing. OpenTelemetry provides a single set of APIs, SDKs, and tools for generating distributed traces and metrics from your services. 32
  • 33. Standardize around the ‘golden signals’ - latency, throughput, errors. Prefer SLO’s that are aligned with business outcomes; Less thinking about “five nines” and more about “when people try to play a song, it plays” Create Effective SLOs 33
  • 34. Focus on People Use observability insights and data as a component of your retrospectives and sprint planning. Kubernetes increases complexity; Observability can make it more accessible to people that aren’t “experts”. Tools (like Kubernetes!) are supposed to make hard things easier -- observability lets you see the impact of this, and not get surprised or burned out by changes. 34
  • 37. Austin Parker, Principal Developer Advocate Thank you.

Editor's Notes

  • #22: Approx 10 devs x 4 dev-hours per incident
  • #23: 2250 per dev, so for a team of 60...
  • #37: Questions?
  翻译: