Oracle Cloud Infrastructure Is Selected by NVIDIA for AI Services

Coleda Bureau
March 23, 2023

The world’s biggest supercomputer has been surpassed by OCI SuperclusterTM, which can now expand up to 4,096 compute bare metal instances with 32,768 GPUs.

Running important NVIDIA AI applications on the brand-new Oracle Cloud Infrastructure (OCI) SuperclusterTM is now part of Oracle’s expanded partnership with NVIDIA. In order to deliver NVIDIA DGX CloudTM, an AI supercomputing service, at a very large scale, NVIDIA has chosen OCI as the first hyper-scale cloud provider. Moreover, NVIDIA is hosting NVIDIA AI Foundations, its brand-new generative AI cloud services, on OCI. These services are accessible through DGX Cloud.

“OCI is the first platform to offer an AI supercomputer at scale to thousands of customers across every industry. This is a critical capability as more and more organizations require computing resources for their unique AI use cases. To support this demand, we continue to expand our work with NVIDIA,” said Clay Magouyrk, executive vice president, Oracle Cloud Infrastructure.

“The limitless opportunities for AI-driven innovation are helping transform virtually every business. NVIDIA’s collaboration with Oracle Cloud Infrastructure puts the extraordinary supercomputing performance of NVIDIA’s accelerated computing platform within reach of every enterprise,” said Manuvir Das, vice president of enterprise computing, NVIDIA.

OCI’s New Supercluster

“OCI is the first platform to make an AI supercomputer available at scale to tens of thousands of users across all sectors. This is a crucial skill as an increasing number of businesses need computational power for their particular AI use cases. We continue to extend our relationship with NVIDIA to meet this demand, “the executive vice president of Oracle Cloud Infrastructure, Clay Magouyrk.

OCI Compute Bare Metal, an ultra-low latency RoCE cluster based on NVIDIA networking, and a selection of HPC storage are all included in OCI’s Supercluster. Several OCI Compute Bare Metal instances that can effectively handle massively parallel workloads have been deployed and tested by NVIDIA. The number of OCI Compute Bare Metal instances with 32,768 A100 GPUs that OCI Supercluster networking can support has increased to 4096. There are currently a few OCI Compute bare metal instances with NVIDIA H100 GPUs available.

Also, NVIDIA said that Oracle would upgrade its networking stack with NVIDIA BlueField-3 DPUs.

Generative AI Services for Building Custom Enterprise Models

The modeling services offered by NVIDIA AI Foundations include biology, language, pictures, video, and 3D. Businesses may create specialized, domain-specific, generative AI apps for intelligent conversation and customer assistance, professional content production, digital simulation, and more using the NVIDIA NeMoTM language service and the NVIDIA Picasso image, video, and 3D services. The NVIDIA BioNeMoTM cloud service provides tools to swiftly configure and deploy generative AI applications for biology AI model training and inference.

On OCI, the OCI Supercluster, which includes purpose-built RDMA networking that delivers near-line rate performance with microsecond latency and eliminates blocking issues for RDMA-dependent workloads, benefits custom models built with NVIDIA AI Foundations and model families like GPT-3.