Tutorials

How to run Magistral Small on a cloud GPU

Deepak Manoor
Posted : June, 27, 2025
Posted : June, 27, 2025

    Mistral AI has launched Magistral, its first series of reasoning models, available in two versions: Magistral Small (open-source) and Magistral Medium (enterprise-grade, access via API and Mistral’s Le Chat). These models are based on a transformer architecture fine-tuned through Mistral’s proprietary Reinforcement Learning from Verifiable Rewards (RLVR) framework, which replaces external critics with a generator–verifier setup. This approach yields transparent, step-by-step “chain‑of‑thought” reasoning at scale.

    Here’s a brief overview of Magistral Small’s specifications:

    Magistral Small
    ArchitectureReinforcement Learning from Verifiable Rewards (RLVR)Group with Relative Policy Optimization (GRPO) as the RL algorithm
    Parameters24 billion
    Context window 128k tokens maximum, 40.9k tokens recommended
    LicensingApache 2.0: Commercial and research

    Magistral Small’s benchmarks demonstrate strong overall performance, exceeding Llama 4 but trailing DeepSeek R1 and Qwen 3 series of models.

    ModelAIME24 AIME25GPQA DiamondLivecodebench
    Magistral Small70.6862.7668.1855.84
    Qwen 3 32B (Dense)81.472.9N/A65.7
    Qwen 3 30B A3B (MoE)80.470.965.862.6
    DeepSeek R179.87071.565.9
    DeepSeek V339.228.859.136.2
    Llama 4 MaverickN/AN/A69.843.4
    Llama 4 ScoutN/AN/A57.232.8

    Source: Llama 4, Qwen 3, Magistral & DeepSeek

    How to run Qwen 3 with Ollama

    Pre-requisites

    Create a GPU virtual machine (VM) on Ori Global Cloud. We chose a set up with an NVIDIA L40S GPU and Ubuntu 22.04 as our OS, since we ran the Q8_0 quantized version, however you might need to use an H100 GPU if you choose the FP16 version of the model.

    Quick Tip

    Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

    Step 1: SSH into your VM, install Python and create a virtual environment

    Bash/ShellCopy
    1apt install python3.11-venv
    2python3.11 -m venv mistral-env

    Step 2: Activate the virtual environment

    Bash/ShellCopy
    1source mistral-env/bin/activate

    Step 3: Install Ollama and specify the number of GPUs to be used

    Bash/ShellCopy
    1curl -fsSL https://ollama.com/install.sh | sh

    Step 4: Run Magistral 24B Small (Quantized Q8_0)

    Bash/ShellCopy
    1ollama run magistral:24b-small-2506-q8_0
    2set verbose

    Step 5: Install OpenWebui on the VM via another terminal window and run it

    Bash/ShellCopy
    1pip install open-webui 
    2open-webui serve

    Step 6: Access OpenWebUI on your browser through the default 8080 port.

    http://"VM-IP":8080/

    Click on “Get Started” to create an Open WebUI account, if you haven’t installed it on the virtual machine before.

    Step 7: Choose magistral:24b-small-2506-q8_0 from the Models drop down and chat away!

    Is Magistral Small better than Mistral Small 3?

    We tried out the Mistral Small 3 model a few months ago. So, we tested Magistral with the prompts on which Small 3 didn’t do too well

    Prompt: How many ‘r’s in “strawberry” ?

    Mistral Small 3: The word "strawberry" contains 2 letter “r”s

    Magistral Small: 3

    Prompt: How many ‘l’s in “strawberry” ?

    Mistral Small 3: The word "strawberry" contains 2 letter “l”s

    Magistral Small: 0

    How many r's in strawberry

    Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections

    Mistral Small 3: 7

    Magistral Small: 3 The correct answer is 3 (or 3 square units).

    Magistral Math

    Overall, Magistral Small shows a significant leap over Mistral Small 3 in terms of performance. The benefits of a reasoning model are quite evident here with the enhanced accuracy in models, indicating that reasoning models are the way forward for stronger performance.

    Our take on Magistral Small

    Speed

    Magistral is comparable with frontier open source models such as the Qwen 3 in terms of speed with more than 26 tokens per second. Both models answered the question below correctly but Magistral took only 1 minute and 0.4 seconds whereas Qwen 3 took 1minute and 38 seconds. 

    Prompt: What is larger: 134.59 or 134.6?

    Magistral:

    Magistral Small Performance

    Qwen 3:

    Magistral vs Qwen

    Accuracy

    In our observation, Magistral Small is nearly as good as Qwen 3 with some exceptions.

    Prompt: Exactly how many days ago did the French Revolution start? Today is June 11th, 2025.Magistral got this question completely wrong with its response being 460 days. This response also took 17 minutes. 

    The Magistral Small failed to generate the perfect code for the Tetris game whereas Qwen 3 got it right in one shot.

    Both models failed to generate the code that could satisfy this prompt

    Prompt: "write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically"

    Magistral Response:

    Flexibility

    The absence of a non-reasoning mode in Magistral Small makes it less flexible when compared to Qwen 3. Magistral goes into very long reasoning loops for several minutes which makes it difficult for several use cases, especially when its responses to those prompts are incorrect. Overall, Magistral is an impressive reasoning model from Mistral and a preview of stronger reasoning models that are set to emerge from leading AI labs. Although it is quite accurate and fast in terms of performance, the lack of a non-reasoning mode makes it less flexible especially for simple prompts.

    Build your enterprise AI on Ori

    Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables AI teams and businesses to deploy their AI models and applications in a variety of ways:

    Share