Coursera : Aws Case Study

Coursera : Aws Case Study

Hello Everyone !!

Let's see how Coursera platform is using Amazon Web Services ...

Coursera

Coursera is an American massive open online course provider founded in 2012 by Stanford University's computer science professors Andrew Ng and Daphne Koller that offers massive open online courses, specializations, degrees, professional and mastertrack courses.

It is an educational technology company with a mission to provide universal access to the world’s best curricula. The company partners with top universities and organizations worldwide to offer online courses for anyone to take, for free. Coursera has more than 13 million users from 190 countries enrolled and offers more than 1,000 courses from 119 institutions, everything from Programming in Python to Songwriting.

No alt text provided for this image


Changing the Way the World Learns

To host its website and support its rapidly expanding business, Coursera relies heavily on Amazon Web Services (AWS). Until recently, the company focused on setting up its backend services and AWS infrastructure. Now, it needed to streamline its front-end processes as well.

“We wanted to improve the front-end developer experience and improve our website’s reliability and performance,” says Bryan Kane,
Senior engineer at Coursera.


The Challenge

  • Coursera had a large monolithic application for processing batch jobs that was difficult to run, deploy, and scale.
  • A new thread was created whenever a new job needed to be completed, and each job took up different amounts of memory and CPU, continually creating inefficiencies.
  • A lack of resource isolation allowed memory-limit errors to bring down the entire application.
  • The infrastructure engineering team attempted to move to a microservices architecture using Docker containers, but they ran into problems as they tried to use Apache Mesos to manage the cluster and containers—Mesos was complicated to set up and Coursera didn’t have the expertise or time required to manage a Mesos cluster.

Why Amazon Web Services

  • Docker containers on Amazon EC2 Container Service (ECS) enabled Coursera to easily move to a microservices -based architecture.
  • Each job is created as a container and Amazon ECS schedules the container across the Amazon EC2 instance cluster.
  • Amazon ECS handles all the cluster management and container orchestration, and containers provide the necessary resource isolation.

The Benefits

  • Ease of use: Because Amazon ECS setup is straightforward and it manages all of the details of the cluster, the team had a prototype up and running in under two months.
  • Speed and agility: Time to deploy software changes went from hours to minutes, and each team can now develop and update its respective applications independently because the applications are resource isolated with no cross-dependencies.
  • Scalable capacity: Auto Scaling groups allow the compute capacity to scale up to handle dynamic job loads.
  • Operational efficiency: No extra infrastructure engineering time is spent installing software and maintaining a cluster—Amazon ECS handles everything from cluster management to container orchestration.

Amazon WebServices Used

  • AWS CodeBuild
  • Amazon ECS
  • Amazon S3
  • Amazon CloudFront
  • Amazon ECR
  • Amazon EC2

Challenged by Limited Build Capacity

Coursera’s website, a single-page application written in JavaScript, is hosted on Amazon Simple Storage Service (Amazon S3) and served using Amazon CloudFront.

To build and deploy its web application, the company used eight Jenkins machines running on Amazon Elastic Compute Cloud (Amazon EC2) instances. The Jenkins instances were spun up each morning and sat idle until there was a job. When a change was committed to GitHub, it would trigger Jenkins to build the JavaScript and upload it to Amazon S3. “This process worked well for a while,” says Kane. “But as our application code grew, builds began to take a long time.”

In addition, its web applications were deployed as a single monolithic build. A problem with one application would stop deployment of the entire build.

To improve build safety, Coursera broke up its monolithic web application into 50 individual applications containing different parts of the website that could be built and deployed separately. It also developed a new system to work with the 50 applications—however, this process also had issues.

“Any time we needed a build of the entire web application, we had to start 50 different Jenkins jobs to build all the individual applications,” says Kane. “However, we didn’t have sufficient capacity to run 50 jobs at once, so they would queue up and it would take a long time to go through them.”

The Jenkins instances were also used to run tests, backend builds, and other scheduled jobs. These jobs were delayed during the application build, causing frustration for backend developers. “We were looking for a solution that would allow us to run jobs in parallel—so we wouldn’t need to wait for a job to finish to run the next one,” says Kane .

How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct

At Coursera, they use Amazon Redshift as their primary data warehouse because it provides a standard SQL interface and has fast and reliable performance. They use AWS Data Pipeline to extract, transform, and load (ETL) data into the warehouse. Data Pipeline provides fault tolerance, scheduling, resource management and an easy-to-extend API for their ETL.

Dataduct is a Python-based framework built on top of Data Pipeline that lets users create custom reusable components and patterns to be shared across multiple pipelines. This boosts developer productivity and simplifies ETL management. At Coursera, they run over 150 pipelines that pull data from 15 data sources such as Amazon RDS, Cassandra, log streams, and third-party APIs. They load over 300 tables every day into Amazon Redshift, processing several terabytes of data. Subsequent pipelines push data back into Cassandra to power their recommendations, search, and other data products.

The image below illustrates the data flow at Coursera.

No alt text provided for this image

Using AWS CodeBuild to Build and Deploy JavaScript

Coursera chose to use AWS CodeBuild to build its JavaScript applications because it wanted a managed build service that could scale automatically and process multiple builds concurrently. “It took us less than two weeks to set up our containers to run on AWS CodeBuild,” says Kane. “Now we can run 50 or 60 jobs in parallel, and the build time is only the time it takes for the longest application to build.”

The company also started using Amazon EC2 Container Service (Amazon ECS) to deploy its JavaScript—and then used AWS CodeBuild to automate this step as well. Along with building the JavaScript assets and uploading them to Amazon S3, AWS CodeBuild also creates a Docker container that includes the assets and uploads it to Amazon EC2 Container Registry (Amazon ECR). “Now, whenever we want to deploy an application update, we use a tool that works with Amazon ECS to spin up a new service with the container running the new code,” says Kane.

As a further optimization, Coursera uses a custom build environment in its projects—and uses AWS CodeBuild for that also. Instead of using the standard Node.js container for its builds, it uses AWS CodeBuild to create a separate Docker container that includes the JavaScript dependencies as cache. The container is pushed to Amazon ECR and referenced in the project configuration.

“To build an application, all we need to do is retrieve the Docker image from the container registry, run the scripts, and upload to Amazon S3,” says Kane. “We don’t need to wait for the JavaScript dependencies to download because they are already pre-warmed in the container cache."


Amazon ECS enabled Coursera to focus on releasing new software rather than spending time managing clusters."
Frank Chen
Software Engineer, Coursera


Hope You Like this Article !

Thanks For Reading :)

Harsh Rajotya

Technical Blogger & DevOps Engineer @ Medium & DevOpsFarm Inc | Writing, Automation, Cloud

4y

Great and well-researched Rupali Gurjar

Himadri Biswas

Voice Partnerships Associate at Murf.AI || Ex-Global Supply & Partnerships Executive at Joveo || Ex-Brand Partnerships at Niyo Solutions Inc.

4y

Nice article Rupali Gurjar.

Abhishek Chouhan

DevOps Engineer at Toorak Capital

4y

✨ చాలా సమాచార ఆర్టికల్ భాగస్వామి దానిని కొనసాగించండి 🔥🔥....

Prasant Mahato

DevOps, Cloud & Performance Engineer| DevOps Engineer

4y

Nicely done Rupali Gurjar

Aditya Gupta

AWS || AZURE || PYTHON || BIG DATA || TERRAFORM || KUBERNETES || SHELL SCRIPTING || POWERSHELL || LINUX

4y

Worth reading 👏 Rupali Gurjar

To view or add a comment, sign in

More articles by Rupali Gurjar

  • Running GUI Application inside Docker Container

    Hello Guys ! Hope you are safe and doing well :) Before moving towards this task, let's see what the Docker is ? Docker…

    2 Comments
  • Face Recognition

    Let's create a Model of Face Recognition using Transfer Learning . Transfer Learning Transfer learning (TL) is a…

    12 Comments
  • Web-portal using EKS

    What is Amazon EKS Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that makes it easy for you to…

    6 Comments
  • Web Portal Using Personal VPC

    VIRTUAL PRIVATE CLOUD ( VPC ) A virtual private cloud (VPC) is an on-demand configurable pool of shared computing…

    10 Comments
  • CI/CD Pipeline using Docker inside Docker

    A CI/CD Pipeline implementation, or Continuous Integration/Continuous Deployment, is the backbone of the modern DevOps…

    6 Comments
  • Task-1 : Integration of Github with Docker and Jenkins

    This Project is based on the integration of Github with Docker and Jenkins . Github : GitHub, Inc.

    10 Comments
  • Task1 : Aws Infrastructure with Terraform

    I have created an AWS Infrastructure using Terraform code so that it would be automated from end to end . What is AWS ?…

    16 Comments
  • MLOPS TASK 3 : INTEGRATING MACHINE LEARNING WITH DEVOPS

    MLOPS : A COMPOUND OF MACHINE LEARNING AND OPERATIONS It is a practice for collaboration and communication between data…

    12 Comments

Insights from the community

Others also viewed

Explore topics