Fully Managed Lustre File Storage in the Cloud
Imagine having 25,000 conversations directed at you in parallel and all of them are waiting for a response. This is what happens to storage systems when customers train and deploy Large Language Models (LLM) today. Now add Multi-Modal Training with pictures, videos, audio and other rich content to the mix, and this just got 10x larger. Artificial Intelligence (AI)/Machine Learning (ML) workloads routinely operate at this scale. GPUs process 10s of Petabytes(PB) of data in parallel at 10s of Terabits per second (Tbps) throughput to enable the most complex models in the world. They need an extremely fast storage system that can access 100s of 1000s of files in parallel and feed it to 100s of 1000s of GPUs at high speed.
We are introducing Oracle Cloud Infrastructure (OCI) File Storage with Lustre today to meet the performance demands of these workloads. Lustre is designed to deliver parallel I/O performance at scale and is widely used in large-scale large language model (LLM) training and supercomputing projects.
OCI File Storage with Lustre is a fully managed service based on Lustre. It enables you with the performance and scale benefits of Lustre, including milliseconds of meta-data latency, capacity to petabytes, and high throughput of terabytes per second, while eliminating the complexity of management. As a fully managed service, OCI automates the file system deployment, scaling, and maintenance. Further, as the service is built on OCI’s leading Block Storage Service, you can expect the same enterprise class availability and durability of an enterprise application running on the Block Storage Service.
Recommended by LinkedIn
Lustre file system can be accessed in parallel by thousands of clients. OCI File Storage with Lustre is seamlessly integrated with Oracle Kubernetes Engine (OKE) and can deployed in GPUs Hosts, Bare Metal or virtualized environments.