SlideShare a Scribd company logo
HIGH SCALABILITY AND
           RELIABILITY IN THE
           CLOUD
           GREG THOMPSON
           HEAD OF ARCHITECTURE, APPS ENABLEMENT
           ALCATEL-LUCENT

@gmthomp   greg.thompson@alcatel-lucent.com
About This Session
   Target audience is backend application
    developers deploying infrastructure into a
    cloud environment
   Will cover concepts for scalability and
    reliability with the goal of helping application
    developers understand some key
    considerations when designing and building
    the backend.
Design Time Decisions
   When first building your application backend,
    consider a few important questions
     How fast should the application be recovered if a
      failure occurs?
     What kind of down time is acceptable?
     Is the application maintaining stateful data?
     What kind of information needs to be shared across
      multiple instances?
Scalability
What is Scalability?
   Scalability is a term
    used to describe
    how the application
    will handle
    increased loads of
    traffic volume
Scalability – Factors to Consider
   Horizontal vs. Vertical
   Stateless vs. Stateful
   Understanding Limitations
   Connection Management
   Segmentation of traffic
   Segmentation of responsibility (distributed arch)
   Clustering
   Messaging
What Type of Scalability?
Vertical vs. Horizontal
Vertical                        Horizontal
   Scaling up a single            Scaling out across
    node                            multiple nodes
     Physical limitations –         Ability to distribute
      instances are very
      powerful but still have         traffic over a number
      finite limits                   of nodes
     Resources such as              Allows for more
      number of sockets               flexibility over time
      can only go so high
Will the App Maintain State?
Stateless Applications
   Application does not
    persist information
    about transactions     Request       Respons
                                         e
   Each transaction is
    independent and            Application
    atomic
Will the App Maintain State?
Stateful Applications
   Application needs to
    maintain data about
    transactions in
                           First         Subseque
    progress               Request       nt
                                         Request

   Requires storage                            D
                               Application      B
   Persistence may also
    be required
    depending the
Understanding Limitations
   Thorough testing is
    key to understanding
    bottlenecks
   Test real-world
    scenarios included
    latency
   Push the system to
    the max to
    understand how it
Connection Management
Mobile Device Connections
   Mobile devices don’t always
    behave like you expect
       Connectivity is often very
        dynamic
       Devices move from 4G/3G/2G/no
        G/Wifi
       Not all TCP events will get
        reported and sockets can remain
        open
   If not handled correctly, these
    factors can be time bomb no
    matter how vertically you scale a
    component
Segmenting Traffic
   Once the application is
    able to be scaled out,
    traffic can be
    segmented in different
    ways
       Location (i.e. east coast
        vs. west coast)
       Pre-assigned criteria -
        User ID, IP, or other
        dynamic criteria
       Load Balanced
Segmenting Responsibility
   Segmenting
    responsibility allows for
    a distributed
    architecture
       Each component can be
        scaled independently
       Allows for more flexibility
        in scaling
       Adds more complexity
        and potential messaging
        overhead
Clustering
   Clustering is the
    concept of having a
    group of nodes working     App   App   App   App
                               Nod   Nod   Nod   Nod
    together to provide the     e     e     e     e
    same capability
       Nodes typically co-            Share
        located                          d
       Common data shared             Data
        as needed across the
        cluster
       Communication may be
        needed between nodes
Messaging
   Once a clustered          Types of Messaging
    and/or distributed          JMS
    architecture is used        Open Source MQ
    messaging will be            packages
    needed between              Custom Designed
    various components          Use of APIs
    and/or nodes
Example of Scaled Architecture
             Load                                 Load
               Load                                 Load
            Balancer                             Balancer
             Balancer                             Balancer

  Web         Compone     Compone      Web         Compone     Compone
    Web
 Server         Compone
                nt 1        Compone
                            nt 2         Web
                                      Server         Compone
                                                     nt 1        Compone
                                                                 nt 2
   Server          nt 1        nt 2     Server          nt 1        nt 2




              Database                             Database

               Site 1                               Site 2
Reliability/Availability
What is Reliability/Availability?
   Availability is typically
    measured by the amount of
    downtime your application
    has in a given year
       Unplanned downtime and
        planned downtime are both
        considered
   Reliability is described by the
    likelihood of failure based on
    actual measurements
   We’ll focus more on
    Availability
Reliability/Availability
Factors to Consider
   Cost vs. Need
   Problem detection
   Automation for recovery
   Active/standby, active/active, hot standby vs. cold
    standby
   Local and Geo-redundancy
   Multi-zone, multi-cloud
   Test Until You Break the System
Reliability Requirements
Cost Considerations       Need

   Number of instances      User Experience
   Bandwidth                Customer
    requirements              requirements
    between sites
                             Negative Publicity
   Complexity of
    software
   Monitoring
Problem Detection
   Effective monitoring of
    the application is key to
    minimizing downtime
       Event reporting in the
        software
       External monitoring –
        test for successful
        behavior
       Auto detection and
        alerting to minimize cost
        of operations personnel
Automation for Recovery
   How quickly a failed
    component recovers
    increases reliability
     Automatic detection
      and automatic
      recovery
     Automated installation
      key for minimizing
      setup time during
      recovery
Availability Models
   N = number of nodes
    required for normal     N   N
    processing
   N+1 = one additional
    node to provide         N   N   +1
    redundancy in case of
    failure
   N+K = K nodes provide   N   N   K    K
    additional redundancy
Redundancy Models
   Active/Cold Standby                    Cold
       backup site is booted    Active   Standb
        up when needed                       y

   Active/Hot Standby
                                          Active
       Backup site is running   Active   Standb
        and ready to takeover                y

   Active/Active
       Both sites active and    Active   Active
        processing traffic
Local and Geo-Redundancy
   Local                       Geo-Graphic
     Backup  instances           Backup   instances
      are available within         are available in
      the same location            another geo-graphic
                                   location
     Use of availability
                                  Typically in a
      zones within a               separate region to
      region very similar          account for events
                                   such as natural
                                   disasters
Availability to the Max
   Multi-Zone/Multi-              Multi-Cloud
    Region
                                     Ifyour application
     Multi-zone typically
                                      requires the
      provide instances
      running in different            maximum possible
      physical locations, but         availability
      in same region                 Run in different
     Multi-region provides           cloud providers in
      different geographic
      regions of availability
                                      different regions
Test Until You Break the System
   Push the system to
    the max and observe
    the breaking points
   Fix the problem,
    repeat
   The best way to find
    problems to prevent
    unplanned downtime
    is to thoroughly test
    with a mindset to
    break
Q&A
THANK YOU!
Greg Thompson
@gmthomps
greg.thompson@alcatel-lucent.com
Ad

More Related Content

What's hot (20)

NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
Thanakrit Lersmethasakul
 
Cloud service models
Cloud service modelsCloud service models
Cloud service models
Prem Sanil
 
Virtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference ModelVirtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference Model
Dr Neelesh Jain
 
Cloud Computing and Services | PPT
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPT
Seminar Links
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
BOSS Webtech
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
International School Of Management Excellence
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
Ian Massingham
 
Virtualization- Cloud Computing
Virtualization- Cloud ComputingVirtualization- Cloud Computing
Virtualization- Cloud Computing
NIKHILKUMAR SHARDOOR
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...
Shashi soni
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
vishal choudhary
 
Aws ppt
Aws pptAws ppt
Aws ppt
RamyaG50
 
Virtualization & cloud computing
Virtualization & cloud computingVirtualization & cloud computing
Virtualization & cloud computing
Soumyajit Basu
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Tom Eberle
 
Virtualization and its Types
Virtualization and its TypesVirtualization and its Types
Virtualization and its Types
HTS Hosting
 
Cloud Computing and Vertualization
Cloud Computing and VertualizationCloud Computing and Vertualization
Cloud Computing and Vertualization
Reach Chirag
 
Cloud Resource Management
Cloud Resource ManagementCloud Resource Management
Cloud Resource Management
NASIRSAYYED4
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Avinash Saklani
 
Cloud computing presentation
Cloud computing presentationCloud computing presentation
Cloud computing presentation
Muhammad Usama Zuberi
 
Virtualization in cloud computing ppt
Virtualization in cloud computing pptVirtualization in cloud computing ppt
Virtualization in cloud computing ppt
Mehul Patel
 
Cloud security Presentation
Cloud security PresentationCloud security Presentation
Cloud security Presentation
Ajay p
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
Thanakrit Lersmethasakul
 
Cloud service models
Cloud service modelsCloud service models
Cloud service models
Prem Sanil
 
Virtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference ModelVirtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference Model
Dr Neelesh Jain
 
Cloud Computing and Services | PPT
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPT
Seminar Links
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
Ian Massingham
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...
Shashi soni
 
Virtualization & cloud computing
Virtualization & cloud computingVirtualization & cloud computing
Virtualization & cloud computing
Soumyajit Basu
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Tom Eberle
 
Virtualization and its Types
Virtualization and its TypesVirtualization and its Types
Virtualization and its Types
HTS Hosting
 
Cloud Computing and Vertualization
Cloud Computing and VertualizationCloud Computing and Vertualization
Cloud Computing and Vertualization
Reach Chirag
 
Cloud Resource Management
Cloud Resource ManagementCloud Resource Management
Cloud Resource Management
NASIRSAYYED4
 
Virtualization in cloud computing ppt
Virtualization in cloud computing pptVirtualization in cloud computing ppt
Virtualization in cloud computing ppt
Mehul Patel
 
Cloud security Presentation
Cloud security PresentationCloud security Presentation
Cloud security Presentation
Ajay p
 

Viewers also liked (20)

Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
gaurav jain
 
Scalability Design Principles - Internal Session
Scalability Design Principles - Internal SessionScalability Design Principles - Internal Session
Scalability Design Principles - Internal Session
Sachin Sancheti - Microsoft Azure Architect
 
The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.
Prachyanun Nilsook
 
Cloud computing availability
Cloud computing availabilityCloud computing availability
Cloud computing availability
s2page
 
API Reliability Guide
API Reliability GuideAPI Reliability Guide
API Reliability Guide
Nick DeNardis
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
lylcheng88
 
Buffer manager
Buffer managerBuffer manager
Buffer manager
computerheartofus
 
Resource Management in Cloud Computing
Resource Management in Cloud ComputingResource Management in Cloud Computing
Resource Management in Cloud Computing
Cristian Klein
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
Erin O'Neill
 
fault tolerance management in cloud computing
fault tolerance management in cloud computingfault tolerance management in cloud computing
fault tolerance management in cloud computing
Kruthikka Palraj
 
Scalable Reliable Secure REST
Scalable Reliable Secure RESTScalable Reliable Secure REST
Scalable Reliable Secure REST
guestb2ed5f
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Jonas Bonér
 
Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014
Nuxeo
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Shailendra Prasad
 
Buffer management --database buffering
Buffer management --database buffering Buffer management --database buffering
Buffer management --database buffering
julia121214
 
Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1
Mohamed Zahran
 
Research and technology explosion in scale-out storage
Research and technology explosion in scale-out storageResearch and technology explosion in scale-out storage
Research and technology explosion in scale-out storage
Jeff Spencer
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computing
Asmaa Ibrahim
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
Ali Usman
 
Cloud computing security and privacy
Cloud computing security and privacyCloud computing security and privacy
Cloud computing security and privacy
Adeel Javaid
 
Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
gaurav jain
 
The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.
Prachyanun Nilsook
 
Cloud computing availability
Cloud computing availabilityCloud computing availability
Cloud computing availability
s2page
 
API Reliability Guide
API Reliability GuideAPI Reliability Guide
API Reliability Guide
Nick DeNardis
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
lylcheng88
 
Resource Management in Cloud Computing
Resource Management in Cloud ComputingResource Management in Cloud Computing
Resource Management in Cloud Computing
Cristian Klein
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
Erin O'Neill
 
fault tolerance management in cloud computing
fault tolerance management in cloud computingfault tolerance management in cloud computing
fault tolerance management in cloud computing
Kruthikka Palraj
 
Scalable Reliable Secure REST
Scalable Reliable Secure RESTScalable Reliable Secure REST
Scalable Reliable Secure REST
guestb2ed5f
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Jonas Bonér
 
Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014
Nuxeo
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Shailendra Prasad
 
Buffer management --database buffering
Buffer management --database buffering Buffer management --database buffering
Buffer management --database buffering
julia121214
 
Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1
Mohamed Zahran
 
Research and technology explosion in scale-out storage
Research and technology explosion in scale-out storageResearch and technology explosion in scale-out storage
Research and technology explosion in scale-out storage
Jeff Spencer
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computing
Asmaa Ibrahim
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
Ali Usman
 
Cloud computing security and privacy
Cloud computing security and privacyCloud computing security and privacy
Cloud computing security and privacy
Adeel Javaid
 
Ad

Similar to Scalability and Reliability in the Cloud (20)

Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
Jorgen Thelin
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
Tapio Rautonen
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
Michael Kopp
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
Knoldus Inc.
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
Sekhar Mohanty
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
Mário Almeida
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
NetApp
 
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
PROIDEA
 
Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)
Young Suk Ahn Park
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...
OPNFV
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
Cloud and analytics Lab
 
Making sense of Cloud Computing
Making sense of Cloud ComputingMaking sense of Cloud Computing
Making sense of Cloud Computing
Lawrence Wilkes
 
Nfv open stack-shuo-yang
Nfv open stack-shuo-yangNfv open stack-shuo-yang
Nfv open stack-shuo-yang
OW2
 
Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices
Compuware APM
 
Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333
종필 김
 
Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2
종필 김
 
Sa 006 modifiability
Sa 006 modifiabilitySa 006 modifiability
Sa 006 modifiability
Frank Gielen
 
Spo1 w25 spo1-w25
Spo1 w25 spo1-w25Spo1 w25 spo1-w25
Spo1 w25 spo1-w25
SelectedPresentations
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
industriale82
 
Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
Jorgen Thelin
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
Michael Kopp
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
Knoldus Inc.
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
Sekhar Mohanty
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
Mário Almeida
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
NetApp
 
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
PROIDEA
 
Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)
Young Suk Ahn Park
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...
OPNFV
 
Making sense of Cloud Computing
Making sense of Cloud ComputingMaking sense of Cloud Computing
Making sense of Cloud Computing
Lawrence Wilkes
 
Nfv open stack-shuo-yang
Nfv open stack-shuo-yangNfv open stack-shuo-yang
Nfv open stack-shuo-yang
OW2
 
Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices
Compuware APM
 
Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333
종필 김
 
Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2
종필 김
 
Sa 006 modifiability
Sa 006 modifiabilitySa 006 modifiability
Sa 006 modifiability
Frank Gielen
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
industriale82
 
Ad

Recently uploaded (20)

RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
AI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamsonAI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamson
UXPA Boston
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptxTop 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
Top 5 Benefits of Using Molybdenum Rods in Industrial Applications.pptx
mkubeusa
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Developing System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptxDeveloping System Infrastructure Design Plan.pptx
Developing System Infrastructure Design Plan.pptx
wondimagegndesta
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
AI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamsonAI-proof your career by Olivier Vroom and David WIlliamson
AI-proof your career by Olivier Vroom and David WIlliamson
UXPA Boston
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 

Scalability and Reliability in the Cloud

  • 1. HIGH SCALABILITY AND RELIABILITY IN THE CLOUD GREG THOMPSON HEAD OF ARCHITECTURE, APPS ENABLEMENT ALCATEL-LUCENT @gmthomp greg.thompson@alcatel-lucent.com
  • 2. About This Session  Target audience is backend application developers deploying infrastructure into a cloud environment  Will cover concepts for scalability and reliability with the goal of helping application developers understand some key considerations when designing and building the backend.
  • 3. Design Time Decisions  When first building your application backend, consider a few important questions  How fast should the application be recovered if a failure occurs?  What kind of down time is acceptable?  Is the application maintaining stateful data?  What kind of information needs to be shared across multiple instances?
  • 5. What is Scalability?  Scalability is a term used to describe how the application will handle increased loads of traffic volume
  • 6. Scalability – Factors to Consider  Horizontal vs. Vertical  Stateless vs. Stateful  Understanding Limitations  Connection Management  Segmentation of traffic  Segmentation of responsibility (distributed arch)  Clustering  Messaging
  • 7. What Type of Scalability? Vertical vs. Horizontal Vertical Horizontal  Scaling up a single  Scaling out across node multiple nodes  Physical limitations –  Ability to distribute instances are very powerful but still have traffic over a number finite limits of nodes  Resources such as  Allows for more number of sockets flexibility over time can only go so high
  • 8. Will the App Maintain State? Stateless Applications  Application does not persist information about transactions Request Respons e  Each transaction is independent and Application atomic
  • 9. Will the App Maintain State? Stateful Applications  Application needs to maintain data about transactions in First Subseque progress Request nt Request  Requires storage D Application B  Persistence may also be required depending the
  • 10. Understanding Limitations  Thorough testing is key to understanding bottlenecks  Test real-world scenarios included latency  Push the system to the max to understand how it
  • 11. Connection Management Mobile Device Connections  Mobile devices don’t always behave like you expect  Connectivity is often very dynamic  Devices move from 4G/3G/2G/no G/Wifi  Not all TCP events will get reported and sockets can remain open  If not handled correctly, these factors can be time bomb no matter how vertically you scale a component
  • 12. Segmenting Traffic  Once the application is able to be scaled out, traffic can be segmented in different ways  Location (i.e. east coast vs. west coast)  Pre-assigned criteria - User ID, IP, or other dynamic criteria  Load Balanced
  • 13. Segmenting Responsibility  Segmenting responsibility allows for a distributed architecture  Each component can be scaled independently  Allows for more flexibility in scaling  Adds more complexity and potential messaging overhead
  • 14. Clustering  Clustering is the concept of having a group of nodes working App App App App Nod Nod Nod Nod together to provide the e e e e same capability  Nodes typically co- Share located d  Common data shared Data as needed across the cluster  Communication may be needed between nodes
  • 15. Messaging  Once a clustered  Types of Messaging and/or distributed  JMS architecture is used  Open Source MQ messaging will be packages needed between  Custom Designed various components  Use of APIs and/or nodes
  • 16. Example of Scaled Architecture Load Load Load Load Balancer Balancer Balancer Balancer Web Compone Compone Web Compone Compone Web Server Compone nt 1 Compone nt 2 Web Server Compone nt 1 Compone nt 2 Server nt 1 nt 2 Server nt 1 nt 2 Database Database Site 1 Site 2
  • 18. What is Reliability/Availability?  Availability is typically measured by the amount of downtime your application has in a given year  Unplanned downtime and planned downtime are both considered  Reliability is described by the likelihood of failure based on actual measurements  We’ll focus more on Availability
  • 19. Reliability/Availability Factors to Consider  Cost vs. Need  Problem detection  Automation for recovery  Active/standby, active/active, hot standby vs. cold standby  Local and Geo-redundancy  Multi-zone, multi-cloud  Test Until You Break the System
  • 20. Reliability Requirements Cost Considerations Need  Number of instances  User Experience  Bandwidth  Customer requirements requirements between sites  Negative Publicity  Complexity of software  Monitoring
  • 21. Problem Detection  Effective monitoring of the application is key to minimizing downtime  Event reporting in the software  External monitoring – test for successful behavior  Auto detection and alerting to minimize cost of operations personnel
  • 22. Automation for Recovery  How quickly a failed component recovers increases reliability  Automatic detection and automatic recovery  Automated installation key for minimizing setup time during recovery
  • 23. Availability Models  N = number of nodes required for normal N N processing  N+1 = one additional node to provide N N +1 redundancy in case of failure  N+K = K nodes provide N N K K additional redundancy
  • 24. Redundancy Models  Active/Cold Standby Cold  backup site is booted Active Standb up when needed y  Active/Hot Standby Active  Backup site is running Active Standb and ready to takeover y  Active/Active  Both sites active and Active Active processing traffic
  • 25. Local and Geo-Redundancy  Local  Geo-Graphic  Backup instances  Backup instances are available within are available in the same location another geo-graphic location  Use of availability  Typically in a zones within a separate region to region very similar account for events such as natural disasters
  • 26. Availability to the Max  Multi-Zone/Multi-  Multi-Cloud Region  Ifyour application  Multi-zone typically requires the provide instances running in different maximum possible physical locations, but availability in same region  Run in different  Multi-region provides cloud providers in different geographic regions of availability different regions
  • 27. Test Until You Break the System  Push the system to the max and observe the breaking points  Fix the problem, repeat  The best way to find problems to prevent unplanned downtime is to thoroughly test with a mindset to break
  • 28. Q&A
  翻译: