Serverless GPUs

Run machine learning inference at scale

Priced by the hour, billed by the minute — so you only pay for what you use.

Launch instance

background image

HOW IT WORKS

Launch
ready
Always available, pre-configured NVIDIA GPU clusters and ML frameworks
Safe
& secure
Complete isolation via a separate control plane to keep your data private
Easily
automates
Turnkey autoscaling, fully managed and load balanced
Cost
saving
Priced by the hour, billed by the minute — so you pay for only what you use

WHY SERVERLESS KUBERNETES

Pre-configured by experts to streamline ambitious builds

SPEED
5s
Or less to start-up and go
SCALE
1000+
GPUs to build with

Discover our latest

serverless clusters

NVIDIA
H100 SXM
80GB vRAM | 3.3TB/s Bandwidth
For top-tier performance
NVIDIA
H100 PCIe
80GB vRAM | 2TB/s Bandwidth
Perfect for large-scale AI and HPC workloads
NVIDIA
L40s
48GB vRAM | 864GB/s Bandwidth
For cost-effective and performant inference
NVIDIA
L4
24GB VRAM
For cost-efficient AI compute

Why developers love Ori

Vinay Maniam
Founding Engineer, nCompass

Serverless Kubernetes

GPU resources

RESOURCE	PRICE ($ PER MINUTE)	PRICE ($ PER HOUR)
NVIDIA H200 141GB	0.07875/min	4.73/hr
NVIDIA H100 80GB SXM	0.06525/min	3.92/hr
NVIDIA H100 80GB PCIe	0.06525/min	3.92/hr
NVIDIA L40S 48GB	0.03488/min	2.09/hr
NVIDIA L4 24GB	0.019044/min	1.14/hr

Other resources

RESOURCE	PRICE ($ PER MINUTE)	PRICE ($ PER HOUR)
Memory (MB)	0.00000008/min	0.000005/hr
CPU (1/100)	0.00000167/min	0.0001/hr
Load Balancer	0.0003015/min	0.01809/hr

background image

CUSTOM PRICING

Need a custom cloud, or pricing for a large-scale project? Let's talk.

Talk to an expert

Chart your own
AI reality