Ori GPU Instances: Virtual Machines Designed for the Realities of AI

AI workloads have outgrown the assumptions that shaped traditional virtual machines(VMs). As workloads become bursty and cost-sensitive AI builders need higher utilization, not static over-provisioning. Yet most hyperscale cloud GPU VMs still rely on hour-based billing, slow provisioning, and coarse allocation models that were never designed for AI at scale.
Ori GPU Instances take a different approach. They are GPU-based virtual machines purpose-built for AI and HPC workloads, engineered to deliver near bare-metal performance while introducing the flexibility, speed, and cost efficiency modern AI teams expect.
Virtual Machines, Re-Engineered for AI
Ori Virtual Machines are not a repackaged general-purpose VM service. They are designed from the ground up to align with GPU-centric workloads and the way AI teams actually work.
Each VM supports:
- Fractional GPU allocation, including technologies such as MIG, enabling precise right-sizing
- Minute-based usage and billing, rather than coarse hourly blocks
- One-click suspend and resume, allowing teams to pause workloads without tearing environments down
- An accelerated lifecycle, with provisioning and de-provisioning typically under 2 minutes
Crucially, these capabilities are delivered without sacrificing performance. Ori VMs are tuned specifically for AI workloads, achieving hardware-aligned performance with minimal degradation from bare metal.
Near Bare-Metal Performance, With VM Flexibility
Virtualisation has long been seen as a trade-off: flexibility at the cost of performance. Ori challenges that assumption.
Internal benchmarks from Ori show:
- VM creation and termination consistently under 2 minutes
- Less than 5% performance difference from bare metal on key AI training workloads
This is achieved by tightly aligning VM abstractions with the underlying GPU hardware and avoiding unnecessary layers that introduce latency or overhead. The result is a VM that behaves predictably for both training and inference, while still delivering the operational benefits of virtual machines.
Built for Cost-Efficient AI Workloads
AI infrastructure costs are rarely driven by peak usage alone. Idle GPUs, oversized instances, and long billing increments quietly erode efficiency.
Ori Virtual Machines address this directly:
- Per-minute billing ensures teams only pay for what they use
- Fractional GPU support prevents over-provisioning for small or exploratory workloads
- Suspend and resume allows environments to be paused without losing state
Suspend, Resume, and Move at the Speed of Development
GPU Instances on Ori are designed for fast iteration. With one-click suspend and resume, teams can pause workloads when they are not actively computing and resume them instantly when needed, without rebuilding environments or paying for idle GPUs.
Provisioning and de-provisioning are equally fast. VM creation and termination consistently complete in under two minutes, enabling developers to scale from a single experiment to a full-node training job without friction.
This accelerated lifecycle dramatically improves developer experience while enabling more aggressive, cost-efficient experimentation.
Intelligent Scheduling at Cluster Scale
What truly differentiates Ori Virtual Machines is not just what happens inside a single VM, but how VMs operate as part of a larger AI-native platform.
Ori VMs are orchestrated by the Ori Cluster OS, which applies intelligent, cluster-wide scheduling to:
- Ensure optimal placement of GPU workloads
- Maximise utilisation across fractional and full-GPU instances
- Balance competing workloads without manual intervention
Where hyperscalers often rely on fragmented scheduling layers and static allocation models, Ori treats GPU VMs as cluster-aware entities, optimising utilisation across the entire environment, not just individual instances.
Who Ori Virtual Machines Are For
Ori Virtual Machines are designed for teams that need performance, flexibility, and efficiency—without compromise:
- AI researchers and data scientists experimenting with new models
- Startups iterating rapidly on training and inference pipelines
- Enterprises running short-lived training or inference jobs in production
- Sovereign Partners running PoC (Proof-of-concept) private or sovereign AI clouds
Whether you’re running early experiments or scaling production workloads, Ori VMs adapt to
A Clear Alternative to Legacy GPU VMs
Ori GPU Instances combine AI-optimised performance, minute-level billing, fractional GPU allocation, one-click suspension, and sub-2-minute provisioning, flexibility that major clouds typically lack or only offer in narrow configurations.
Here is a snapshot of the benefits you can expect by switching to Ori:
| Ori GPU Instances | Legacy Cloud Providers | |
|---|---|---|
| Billing by usage | Per-minute | Per-hour |
| Provisioning Times | Under 2 minutes | 10-15 minutes |
| Fractional GPUs | Yes | No |
| Suspend Enabled | Users can suspend and resume any time | Provider-dependent and often requires Kubernetes |
| Pricing | 60% more cost-efficient than traditional hyperscalers | Expensive instances with high on-demand pricing |
* NVIDIA H200 Instances cost $3.5/GPU/Hr on Ori. In comparison AWS EC2 Instances with H200 (p5en.48xlarge) cost $8.7/GPU/Hr as of January 6, 2026.
As AI infrastructure continues to evolve, Ori Virtual Machines represent a shift away from legacy VM assumptions toward a modern, AI-native compute primitive, one built for speed, efficiency, and performance at scale.
How to run GPU Instances on Ori
Step1: Head to the Ori cloud console and click on Virtual Machines.
Step 2: Pick a GPU of your choice. Ori offers a wide range of GPUs that includes NVIDIA A16, A40, A100, V100, V100S, L4, L40S, H100, H200 and more. GPU Instances are available in several configurations as 1,2,4 or 8, and fractional instances (1/24, 1/16, 1/8, 1/4 etc.) for many GPU models.

Step 3: Choose the number of CPU cores, system memory and system storage for the virtual machine.
Step 4: Deploy your virtual machine in a location of your choice.
Step 5: Choose Debian or Ubuntu as the OS for the VM image. Use the init script to pre-install NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks.

Step 6: For enhanced network security you can add the virtual machine to a Virtual Private Cloud (VPC).
Step 7: Add your public SSH key, give the VM an appropriate name and hit launch.
The Takeaway
GPU Instances on Ori redefine what virtual machines can be for AI:
- Near bare-metal performance for all types of AI workloads
- Fractional GPUs and minute-based billing for efficient economics
- Sub-2-minute provisioning for rapid iteration
- Intelligent scheduling for cluster-wide efficiency
For teams building, scaling, or operating modern AI workloads, Ori GPU Instances offer a rare combination: the flexibility of virtualization, without paying the performance penalty.
