SlideShare a Scribd company logo
Building and Scaling a
WebSockets Pubsub
System
Kapil Reddy @ Helpshift
About me - Kapil
Staff Engineer @ Helpshift
Clojure
Distributed Systems
Games
Music
Books/Comics
Football
Helpshift is a Mobile CRM SaaS product. We help connect app developers with their customers. Since everything is now on mobile.
Scale
• ~2 TB data broadcast / day
• Outgoing - 75 k msg/sec
• Incoming - 1.5 k msg/sec
• Concurrency - 3.5k
Here are some scale numbers for the Platform we have built
PubSub Platform
We built a generic Pubish and Subscribe platform. Subscribers of these messages are Javascript clients listening on Websockets connection and Publishers are any
backend server using ZMQ to publish the messages
A simplified version of the platform’s architecture. Again browsers (Subscribers) connect to Dirigent using WebSockets and Backend servers (Publishers) connect to
Diligent using ZMQ. It’s a simplified view right now.
Zooming in a bit we get inside architecture a little more and see there are two different type of services. They internal talk to each other using ZMQ as well. Zookeeper is
used to do co-ordination between Dirigent services
We also we have multiple clusters and they can talk to each other. They have their different set of subscribers. Publishers can come from another cluster.
Evolution
v1 of the platform we used different transport mechanism. HTTP streaming for delivering messages to browsers and HTTP to deliver messages to Dirigent servers. HTTP
mechanism posed problem and it had coupling effect with backend server. Whenever dirigent platform went down due to load the HTTP connections timed out and
created a cascading failure in backend servers. We switched ZMQ there.
Problems with HTTP
streaming
Browser client needs only a subset of data but unsubscribing and subscribing to new topics was not possible over HTTP streaming since it’s unidirectional channel. The
only option was push everything to all clients for a specific subdomain. Initially it sounded like a good idea but once we hit scale we were running out of network
bandwidth per machine. We switched to web sockets where client can ask for specific information based on UI actions.
Under the hood
• Clojure (JVM)
• Http-kit (NIO based web sockets server)
• ZMQ
• Zookeeper
Monitoring
All the messages we are publishing is important data and needs to rendered in time. The nature of this data is ephemeral. We don’t store it anywhere so auditing is hard.
So utilising monitoring was crucial for us.
Under the hood
• StatsD protocol
• Graphite - Storage
• Grafana - Frontend
*example of monitoring
comparison different
stages*
Since auditing this kind of data is hard. We compare metrics of data in different stages of the platform. But since the numbers are big it’s hard to spot any anomaly. What
we are looking for is variance.
Message variance is easy to parse visually. If variance is low some stage of the platform is dropping data. In fact we also have setup alerts on this same query.
Another important metric is time taken to publish a message to WebSocket connection. Since near real time SLA is so important we look at p99s for anomalies. We have
setup alerts on these as well.
Cost saving
Costs are a concern for us always! There are two important factors that add up to the cost. Outgoing bandwidth usage and number of machines
Compression
First we started using gzip compression for websockets. It’s a standard compression mechanism supported by browsers but as with browsers there are quirks here.
Re-visiting features
Biggest change you can do to save costs is to re-visit the features/business logic itself and try to optimise there. This reduced the bandwidth usage by significant
amount.
Auto scaling
To save up on number of machines used. We started investigating in how to do auto scaling. Auto scaling was not a straight forward thing since all the connections are
long running and usually can stay alive for as long as 8 hours.
HAProxy with least
conn
We went with the obvious choice of least connection with HAProxy doing the load balancing.
Least load connection
works.
Sometimes
The problem with least load connection is assumption that number of connections a server is handling is directly proportional to amount of work it’s doing. This was a
wrong assumption and it just lead us to uneven distribution. Server crashes and just bad sleepless nights.
Feedback load
balancing
Feedback load balancing is something we started to do with Herald an internal tool we built at Helpshift. This helps HAProxy decide which server to choose when routing
a new connection. All the servers can expose the current load they are under to Herald which in turns tells HAproxy which server to choose. If all servers are loaded we
scale out. If all servers are under loaded we scale in.
Summary
• Building a web sockets infrastructure on EC2 is
possible but it has quirks
• Use feedback load balancing for WebSockets /
Long running connection traffic
• ZMQ, JVM are solid building blocks for building a
realtime pubsub platform
• Instrumentation in multiple stages of platform is a
good way to keep track of a real time system
Ad

More Related Content

What's hot (20)

Web Real-time Communications
Web Real-time CommunicationsWeb Real-time Communications
Web Real-time Communications
Alexei Skachykhin
 
SignalR for ASP.NET Developers
SignalR for ASP.NET DevelopersSignalR for ASP.NET Developers
SignalR for ASP.NET Developers
Shivanand Arur
 
Microsoft signal r
Microsoft signal rMicrosoft signal r
Microsoft signal r
rustd
 
Advanced WCF
Advanced WCFAdvanced WCF
Advanced WCF
Jack Spektor
 
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
confluent
 
BlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal MeetupBlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter
 
Real time web with SignalR
Real time web with SignalRReal time web with SignalR
Real time web with SignalR
Alessandro Melchiori
 
2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ
振东 刘
 
Real-time Communications with SignalR
Real-time Communications with SignalRReal-time Communications with SignalR
Real-time Communications with SignalR
Shravan Kumar Kasagoni
 
Load balancer
Load balancerLoad balancer
Load balancer
Raja Soundaramourty
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
confluent
 
Php day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system archPhp day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system arch
Quang Anh Le
 
Building Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalRBuilding Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalR
Shravan Kumar Kasagoni
 
Aws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersAws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and Developers
Dylan Burris
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networks
PLUMgrid
 
How to Build High Performance : WordPress
How to Build High Performance : WordPressHow to Build High Performance : WordPress
How to Build High Performance : WordPress
Dylan Burris
 
Messaging Powered Front Ends
Messaging Powered Front EndsMessaging Powered Front Ends
Messaging Powered Front Ends
Elton Stoneman
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalR
Adam Mokan
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalR
University of Hawai‘i at Mānoa
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
VMware Tanzu
 
Web Real-time Communications
Web Real-time CommunicationsWeb Real-time Communications
Web Real-time Communications
Alexei Skachykhin
 
SignalR for ASP.NET Developers
SignalR for ASP.NET DevelopersSignalR for ASP.NET Developers
SignalR for ASP.NET Developers
Shivanand Arur
 
Microsoft signal r
Microsoft signal rMicrosoft signal r
Microsoft signal r
rustd
 
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
confluent
 
BlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal MeetupBlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter Presents at the High Performance Drupal Meetup
BlazeMeter
 
2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ2.2 Reliable Message Bus based on RocketMQ
2.2 Reliable Message Bus based on RocketMQ
振东 刘
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
confluent
 
Php day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system archPhp day 2011 - Zing me configuration system arch
Php day 2011 - Zing me configuration system arch
Quang Anh Le
 
Building Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalRBuilding Realtime Web Applications With ASP.NET SignalR
Building Realtime Web Applications With ASP.NET SignalR
Shravan Kumar Kasagoni
 
Aws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersAws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and Developers
Dylan Burris
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networks
PLUMgrid
 
How to Build High Performance : WordPress
How to Build High Performance : WordPressHow to Build High Performance : WordPress
How to Build High Performance : WordPress
Dylan Burris
 
Messaging Powered Front Ends
Messaging Powered Front EndsMessaging Powered Front Ends
Messaging Powered Front Ends
Elton Stoneman
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalR
Adam Mokan
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
VMware Tanzu
 

Similar to Building and Scaling a WebSockets Pubsub System (20)

Introduction to requirement of microservices
Introduction to requirement of microservicesIntroduction to requirement of microservices
Introduction to requirement of microservices
Avik Das
 
Silk Performer Presentation v1
Silk Performer Presentation v1Silk Performer Presentation v1
Silk Performer Presentation v1
Sun Technlogies
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry Susanto
DicodingEvent
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
AWS User Group Italy
 
Microservice Workshop Hands On
Microservice Workshop Hands On Microservice Workshop Hands On
Microservice Workshop Hands On
Ram G Suri
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
IJCSIS Research Publications
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
Baiju P.S.
 
Real Time Web with SignalR
Real Time Web with SignalRReal Time Web with SignalR
Real Time Web with SignalR
Bilal Amjad
 
All you need to know about yelowsofts new version update
All you need to know about yelowsofts new version updateAll you need to know about yelowsofts new version update
All you need to know about yelowsofts new version update
Yelowsoft
 
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloudInterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
iMasters
 
Programming Server side with Sevlet
 Programming Server side with Sevlet  Programming Server side with Sevlet
Programming Server side with Sevlet
backdoor
 
Server Farms and XML Web Services
Server Farms and XML Web ServicesServer Farms and XML Web Services
Server Farms and XML Web Services
Jorgen Thelin
 
JNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution ServerJNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution Server
Bryson Tyrrell
 
Unit 1st and 3rd notes of java
Unit 1st and 3rd notes of javaUnit 1st and 3rd notes of java
Unit 1st and 3rd notes of java
Niraj Bharambe
 
A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...
somnath goud
 
Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12
Akshata Sawant
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...
OdessaJS Conf
 
Optimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualizationOptimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualization
JPINFOTECH JAYAPRAKASH
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
confluent
 
Introduction to requirement of microservices
Introduction to requirement of microservicesIntroduction to requirement of microservices
Introduction to requirement of microservices
Avik Das
 
Silk Performer Presentation v1
Silk Performer Presentation v1Silk Performer Presentation v1
Silk Performer Presentation v1
Sun Technlogies
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry Susanto
DicodingEvent
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
AWS User Group Italy
 
Microservice Workshop Hands On
Microservice Workshop Hands On Microservice Workshop Hands On
Microservice Workshop Hands On
Ram G Suri
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
IJCSIS Research Publications
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
Baiju P.S.
 
Real Time Web with SignalR
Real Time Web with SignalRReal Time Web with SignalR
Real Time Web with SignalR
Bilal Amjad
 
All you need to know about yelowsofts new version update
All you need to know about yelowsofts new version updateAll you need to know about yelowsofts new version update
All you need to know about yelowsofts new version update
Yelowsoft
 
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloudInterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud
iMasters
 
Programming Server side with Sevlet
 Programming Server side with Sevlet  Programming Server side with Sevlet
Programming Server side with Sevlet
backdoor
 
Server Farms and XML Web Services
Server Farms and XML Web ServicesServer Farms and XML Web Services
Server Farms and XML Web Services
Jorgen Thelin
 
JNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution ServerJNUC 2017: Open Distribution Server
JNUC 2017: Open Distribution Server
Bryson Tyrrell
 
Unit 1st and 3rd notes of java
Unit 1st and 3rd notes of javaUnit 1st and 3rd notes of java
Unit 1st and 3rd notes of java
Niraj Bharambe
 
A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...
somnath goud
 
Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12Mumbai MuleSoft Meetup 12
Mumbai MuleSoft Meetup 12
Akshata Sawant
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...
OdessaJS Conf
 
Optimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualizationOptimizing cloud resources for delivering iptv services through virtualization
Optimizing cloud resources for delivering iptv services through virtualization
JPINFOTECH JAYAPRAKASH
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
confluent
 
Ad

Recently uploaded (20)

Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
DED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedungDED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedung
nabilarizqifadhilah1
 
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
Reflections on Morality, Philosophy, and History
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Autodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User InterfaceAutodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User Interface
Atif Razi
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic AlgorithmDesign Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Journal of Soft Computing in Civil Engineering
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning ModelsMode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Journal of Soft Computing in Civil Engineering
 
Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...
Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...
Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...
Journal of Soft Computing in Civil Engineering
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink DisplayHow to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
CircuitDigest
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
DED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedungDED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedung
nabilarizqifadhilah1
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Autodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User InterfaceAutodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User Interface
Atif Razi
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink DisplayHow to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
CircuitDigest
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
Ad

Building and Scaling a WebSockets Pubsub System

  • 1. Building and Scaling a WebSockets Pubsub System Kapil Reddy @ Helpshift
  • 2. About me - Kapil Staff Engineer @ Helpshift Clojure Distributed Systems Games Music Books/Comics Football
  • 3. Helpshift is a Mobile CRM SaaS product. We help connect app developers with their customers. Since everything is now on mobile.
  • 4. Scale • ~2 TB data broadcast / day • Outgoing - 75 k msg/sec • Incoming - 1.5 k msg/sec • Concurrency - 3.5k Here are some scale numbers for the Platform we have built
  • 5. PubSub Platform We built a generic Pubish and Subscribe platform. Subscribers of these messages are Javascript clients listening on Websockets connection and Publishers are any backend server using ZMQ to publish the messages
  • 6. A simplified version of the platform’s architecture. Again browsers (Subscribers) connect to Dirigent using WebSockets and Backend servers (Publishers) connect to Diligent using ZMQ. It’s a simplified view right now.
  • 7. Zooming in a bit we get inside architecture a little more and see there are two different type of services. They internal talk to each other using ZMQ as well. Zookeeper is used to do co-ordination between Dirigent services
  • 8. We also we have multiple clusters and they can talk to each other. They have their different set of subscribers. Publishers can come from another cluster.
  • 10. v1 of the platform we used different transport mechanism. HTTP streaming for delivering messages to browsers and HTTP to deliver messages to Dirigent servers. HTTP mechanism posed problem and it had coupling effect with backend server. Whenever dirigent platform went down due to load the HTTP connections timed out and created a cascading failure in backend servers. We switched ZMQ there.
  • 11. Problems with HTTP streaming Browser client needs only a subset of data but unsubscribing and subscribing to new topics was not possible over HTTP streaming since it’s unidirectional channel. The only option was push everything to all clients for a specific subdomain. Initially it sounded like a good idea but once we hit scale we were running out of network bandwidth per machine. We switched to web sockets where client can ask for specific information based on UI actions.
  • 12. Under the hood • Clojure (JVM) • Http-kit (NIO based web sockets server) • ZMQ • Zookeeper
  • 13. Monitoring All the messages we are publishing is important data and needs to rendered in time. The nature of this data is ephemeral. We don’t store it anywhere so auditing is hard. So utilising monitoring was crucial for us.
  • 14. Under the hood • StatsD protocol • Graphite - Storage • Grafana - Frontend
  • 15. *example of monitoring comparison different stages* Since auditing this kind of data is hard. We compare metrics of data in different stages of the platform. But since the numbers are big it’s hard to spot any anomaly. What we are looking for is variance.
  • 16. Message variance is easy to parse visually. If variance is low some stage of the platform is dropping data. In fact we also have setup alerts on this same query.
  • 17. Another important metric is time taken to publish a message to WebSocket connection. Since near real time SLA is so important we look at p99s for anomalies. We have setup alerts on these as well.
  • 18. Cost saving Costs are a concern for us always! There are two important factors that add up to the cost. Outgoing bandwidth usage and number of machines
  • 19. Compression First we started using gzip compression for websockets. It’s a standard compression mechanism supported by browsers but as with browsers there are quirks here.
  • 20. Re-visiting features Biggest change you can do to save costs is to re-visit the features/business logic itself and try to optimise there. This reduced the bandwidth usage by significant amount.
  • 21. Auto scaling To save up on number of machines used. We started investigating in how to do auto scaling. Auto scaling was not a straight forward thing since all the connections are long running and usually can stay alive for as long as 8 hours.
  • 22. HAProxy with least conn We went with the obvious choice of least connection with HAProxy doing the load balancing.
  • 23. Least load connection works. Sometimes The problem with least load connection is assumption that number of connections a server is handling is directly proportional to amount of work it’s doing. This was a wrong assumption and it just lead us to uneven distribution. Server crashes and just bad sleepless nights.
  • 24. Feedback load balancing Feedback load balancing is something we started to do with Herald an internal tool we built at Helpshift. This helps HAProxy decide which server to choose when routing a new connection. All the servers can expose the current load they are under to Herald which in turns tells HAproxy which server to choose. If all servers are loaded we scale out. If all servers are under loaded we scale in.
  • 25. Summary • Building a web sockets infrastructure on EC2 is possible but it has quirks • Use feedback load balancing for WebSockets / Long running connection traffic • ZMQ, JVM are solid building blocks for building a realtime pubsub platform • Instrumentation in multiple stages of platform is a good way to keep track of a real time system
  翻译: