The Better (and Cheaper) Way to Analyze Video Streams in the Cloud.
TLDR: In this post, you will see how Chrysalis Cloud enables development of scalable, real-time computer vision products with a few lines of code.
You will learn, how Chrysalis Cloud SDK simplifies the process of building a simple real-time face-detector system running on a remote RTMP video stream. Normally, this is time and cost intensive because we have to worry about managing the RTMP stream (latency, disconnections, threading).
To solve this, we’re going to use the Chrysalis Cloud SDK to manage the RTMP stream, which will greatly simplify the process and can save up to 70% on cloud services like Kinesis and Lambda.
Starting with the Basics
As a computer vision engineer, I frequently need the following process to happen:
I have an Internet-connected camera set up remotely, and I need to do some compute / augmentation of that stream, either in the cloud or on my computer. To simplify the discussion, let's talk about a simple use case such as face detection to register when a person's face is visible on the camera.
In theory, this should be a super easy process. Video. Face detection. Box. Done.
However - and this is the bizarre part - there is very little publicly available information on how to smoothly execute this process. If we Google “python read from webcam”, the first thing that you’ll find in the search results is a link to the OpenCV site where they do something like:
The above code does exactly what I want it to do as a computer vision practitioner. I call a method, and I get an image from the camera. I do some processing, and I display the image. Presto.
However, here’s the catch: this code only works if the webcam happens to be attached (via USB) to the computer I’m running the Python script on.
Things get a lot more complicated if the webcam I want to stream from is not attached to my computer. OpenCV provides an alternative way to call the same VideoCapture() method such that it captures from a remote RTMP stream. For example:
This actually works. I have a demo video set up on an RTMP server (more on this later) and if I run the above code, I get:
This certainly looks acceptable, but it’s incredibly fragile and impractical from a product development perspective. For example, right now the inner loop of the Python code segment shows the image as soon as the read() returns; sleeping for only 1ms (waitKey) and then goes right back to trying to get another frame.
This breaks if I stop aggressively servicing the video stream. For example, let’s say I need to do some minimal compute (500ms) per frame on the incoming data. We can simulate that by passing 500 to waitKey (now the function will wait 500ms after displaying each frame before requesting the next image):
And as you can see, things quickly fall apart.
So what happened?
Well, when you’re working with network video streams, you absolutely, positively, 100% have to keep up with that stream. That’s because most IP camera streams are transmitted as H264-encoded packet streams. In H264, each packet represents an image/frame, and each packet is highly dependent on the ordered packets that came before it. Miss a few frames and the H264 decoder freaks out and starts outputting garbage like the image above.
Basically, when I’m dealing with a remote H264 stream like this, I have to consume incoming frames at least as fast as they are generated by my remote IP camera, or I’m toast.
If I’m doing any sort of substantial video processing, I will not be able to perform my computations at the same rate the camera can produce frames. For example, a 20 frames-per-second (FPS) stream source means that I need to finish my per-frame computation in less than 50 milliseconds, or I risk the OpenCV decoder getting behind and experiencing errors like the ones above.
To solve this, I could spawn a new thread to keep decoding frames in the background while my main thread handles the computer vision, but that is both complicated and limiting. In the real world, I have to handle cases where the webcam stream disconnects, or speeds up, or slows down because of latency. It’s simply not practical.
Computer vision is difficult enough without having to deal with the complexities of stream management. If there is a problem in production, I don’t want to worry about “is it the CV or is it the stream management?”.
What I really want, and what is needed, is a method where I don’t even think about the stream, and where the system immediately returns the latest image to me in a way that makes sense to build into my product development process.
We built the Chrysalis Cloud platform to make that possible.
Chrysalis is a streaming and compute framework that enables you to deploy nodes (relays) which can accept and manage incoming video streams (i.e., from remote cameras). In addition to video, Chrysalis can handle any type of streaming data, including audio and IoT data.
The Chrysalis node manages the maintenance of the stream, the reconnection logic, and even maintains a buffer of recently received data which is always immediately available to my Python program with a single method call.
To show how simple this is, I’m going to walk through setting up a Chrysalis node and using it to stream data and do the type of face detection we talked about earlier.
First I need to pick a computer to use as my relay. Generally I create a virtual machine with my favorite cloud provider (like DigitalOcean). Let’s say I’ve done this and have a VM with IP address 101.102.103.104
Step 1 of 3: Log onto VM and install Chrysalis service
That’s it! Chrysalis has lots of ways to “ingest” streaming video. The easiest is to use RTMP and to push video from the camera to the Chrysalis node. There is a pre-existing RTMP ingestion point that is installed with Chrysalis:
rtmp://101.102.103.104/live/mychryskey
For now, I’ll use this. I can configure OBS to stream a video on repeat to this endpoint as follows:
I can go into Settings in OBS and configure the program to stream to my Chrysalis endpoint. I’ll enter the server address from my cloud provider (here it’s 101.102.103.104). The default RTMP point that is set-up for me when I install Chrysalis is rtmp://IPaddress/live and the stream key is set to mychryskey. So I’ll update the streaming settings as follows:
Once I do this I can start streaming and I’m good to go!
Step 2 of 3: Install the Chrysalis SDK:
And that’s it. Next I’ll write a simple program to get images from Chrysalis just like it was a webcam attached to my computer.
Step 3 of 3: Fetch image data from Chrysalis
That’s it! I don’t have to care about the dynamics of the stream, or disconnects or latencies. Chrysalis manages all this for me in the background. All I need to do is call one method VideoLatestImage() to view the most recent image from the stream … whenever it is convenient for me to do so.
Note that this is with the 500ms delay between read frames (the same delay that was giving us the glitchy H264 problems before). The image looks clear now because the Chrysalis node is now keeping up with the camera stream for me. I can lazily fetch the latest image from Chrysalis whenever my computational schedule allows.
Finally, if I want to get fancy, I can do actual compute on this stream. For example, I can use dlib to recognize faces in the stream and draw boxes around them.
Key Lessons
In summary, here are all the streaming and stream management related activities that the Chrysalis Cloud automatically manages for you in a few lines of code:
- Easy Streaming: Easily stream video from webcam/ network connected camera securely to a cloud relay server.
- Stream management: Maintain reliable connection/ reconnection to the video stream in the event of network issues and internet outages.
- Stream decoding: Decode images from video streams in the cloud (aka you don’t have to keep up with the video stream).
- Stream buffering: When you detect an event in an image, you can request access for the complete history of frames before and after that event.
- Local stream access: Access remote network connected video streams securely on a local computer for computer vision development.
- Scaling: You can easily deploy your computer vision code instantly to any number of video endpoints, without worrying about server provisioning or maintenance.
Can Chrysalis help you?
If it seems too good to be true, we’d love to tell you more. On top of how easy it is to use and deploy, we are also substantially less expensive compared to using a common service like Amazon Kinesis and Lambda- up to 70% in fact. Click here to send us a note.
I like it 😊! Let me know if I can help
Outbound SDR Manager @ Rippling
5yGreat read - congrats to you and the Chrysalis Team Pavan Kumar !
Senior Engineering Manager, Autonomous Vehicle (AV) Simulation and Platform at NVIDIA
5ygood job Pavan
Sr. Director, AI Agents Platform
5yBrilliant work Pavan.
Startup Advisor & Founder Coach; Keynote Speaker; Former Co-Founder & CRO; $100M+ raised and revenue generated
5yGreat article