Ori GPU Instances: Virtual Machines Designed for the Realities of AI

AI workloads have outgrown the assumptions that shaped traditional virtual machines(VMs). As workloads become bursty and cost-sensitive AI builders need higher utilization, not static over-provisioning. Yet most hyperscale cloud GPU VMs still rely on hour-based billing, slow provisioning, and coarse allocation models that were never designed for AI at scale.

Ori GPU Instances take a different approach. They are GPU-based virtual machines purpose-built for AI and HPC workloads, engineered to deliver near bare-metal performance while introducing the flexibility, speed, and cost efficiency modern AI teams expect.

Virtual Machines, Re-Engineered for AI

Ori Virtual Machines are not a repackaged general-purpose VM service. They are designed from the ground up to align with GPU-centric workloads and the way AI teams actually work.

Each VM supports:

Fractional GPU allocation, including technologies such as MIG, enabling precise right-sizing
Minute-based usage and billing, rather than coarse hourly blocks
One-click suspend and resume, allowing teams to pause workloads without tearing environments down
An accelerated lifecycle, with provisioning and de-provisioning typically under 2 minutes

Crucially, these capabilities are delivered without sacrificing performance. Ori VMs are tuned specifically for AI workloads, achieving hardware-aligned performance with minimal degradation from bare metal.

Near Bare-Metal Performance, With VM Flexibility

Virtualisation has long been seen as a trade-off: flexibility at the cost of performance. Ori challenges that assumption.

Internal benchmarks from Ori show:

VM creation and termination consistently under 2 minutes
Less than 5% performance difference from bare metal on key AI training workloads

This is achieved by tightly aligning VM abstractions with the underlying GPU hardware and avoiding unnecessary layers that introduce latency or overhead. The result is a VM that behaves predictably for both training and inference, while still delivering the operational benefits of virtual machines.

Built for Cost-Efficient AI Workloads

AI infrastructure costs are rarely driven by peak usage alone. Idle GPUs, oversized instances, and long billing increments quietly erode efficiency.

Ori Virtual Machines address this directly:

Per-minute billing ensures teams only pay for what they use
Fractional GPU support prevents over-provisioning for small or exploratory workloads
Suspend and resume allows environments to be paused without losing state

Suspend, Resume, and Move at the Speed of Development

GPU Instances on Ori are designed for fast iteration. With one-click suspend and resume, teams can pause workloads when they are not actively computing and resume them instantly when needed, without rebuilding environments or paying for idle GPUs.

Provisioning and de-provisioning are equally fast. VM creation and termination consistently complete in under two minutes, enabling developers to scale from a single experiment to a full-node training job without friction.

This accelerated lifecycle dramatically improves developer experience while enabling more aggressive, cost-efficient experimentation.

Intelligent Scheduling at Cluster Scale

What truly differentiates Ori Virtual Machines is not just what happens inside a single VM, but how VMs operate as part of a larger AI-native platform.

Ori VMs are orchestrated by the Ori Cluster OS, which applies intelligent, cluster-wide scheduling to:

Ensure optimal placement of GPU workloads
Maximise utilisation across fractional and full-GPU instances
Balance competing workloads without manual intervention

Where hyperscalers often rely on fragmented scheduling layers and static allocation models, Ori treats GPU VMs as cluster-aware entities, optimising utilisation across the entire environment, not just individual instances.

Who Ori Virtual Machines Are For

Ori Virtual Machines are designed for teams that need performance, flexibility, and efficiency—without compromise:

AI researchers and data scientists experimenting with new models
Startups iterating rapidly on training and inference pipelines
Enterprises running short-lived training or inference jobs in production
Sovereign Partners running PoC (Proof-of-concept) private or sovereign AI clouds

Whether you’re running early experiments or scaling production workloads, Ori VMs adapt to

A Clear Alternative to Legacy GPU VMs

Ori GPU Instances combine AI-optimised performance, minute-level billing, fractional GPU allocation, one-click suspension, and sub-2-minute provisioning, flexibility that major clouds typically lack or only offer in narrow configurations.

Here is a snapshot of the benefits you can expect by switching to Ori:

	Ori GPU Instances	Legacy Cloud Providers
Billing by usage	Per-minute	Per-hour
Provisioning Times	Under 2 minutes	10-15 minutes
Fractional GPUs	Yes	No
Suspend Enabled	Users can suspend and resume any time	Provider-dependent and often requires Kubernetes
Pricing	60% more cost-efficient than traditional hyperscalers	Expensive instances with high on-demand pricing

* NVIDIA H200 Instances cost $3.5/GPU/Hr on Ori. In comparison AWS EC2 Instances with H200 (p5en.48xlarge) cost $8.7/GPU/Hr as of January 6, 2026.

As AI infrastructure continues to evolve, Ori Virtual Machines represent a shift away from legacy VM assumptions toward a modern, AI-native compute primitive, one built for speed, efficiency, and performance at scale.

How to run GPU Instances on Ori

Step1: Head to the Ori cloud console and click on Virtual Machines.

Step 2: Pick a GPU of your choice. Ori offers a wide range of GPUs that includes NVIDIA A16, A40, A100, V100, V100S, L4, L40S, H100, H200 and more. GPU Instances are available in several configurations as 1,2,4 or 8, and fractional instances (1/24, 1/16, 1/8, 1/4 etc.) for many GPU models.

Step 3: Choose the number of CPU cores, system memory and system storage for the virtual machine.

Step 4: Deploy your virtual machine in a location of your choice.

Step 5: Choose Debian or Ubuntu as the OS for the VM image. Use the init script to pre-install NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks.

Step 6: For enhanced network security you can add the virtual machine to a Virtual Private Cloud (VPC).

Step 7: Add your public SSH key, give the VM an appropriate name and hit launch.

The Takeaway

GPU Instances on Ori redefine what virtual machines can be for AI:

Near bare-metal performance for all types of AI workloads
Fractional GPUs and minute-based billing for efficient economics
Sub-2-minute provisioning for rapid iteration
Intelligent scheduling for cluster-wide efficiency

For teams building, scaling, or operating modern AI workloads, Ori GPU Instances offer a rare combination: the flexibility of virtualization, without paying the performance penalty.

Deploy a GPU Instance

Build limitless AI on Ori

Chart your own AI reality with Ori's comprehensive AI cloud platform.