Mistral AI has established itself as a key player in the generative AI space by developing open, portable, and highly customizable models. A defining strength of their model portfolio is the ability to deliver exceptional performance with relatively compact models, such as the Pixtral 12B and the Ministral series. With the launch of Mistral’s Small 3 model, they have taken it a step further by delivering a compact yet performant model for a variety of language, coding and math tasks.

Mistral Small 3 supports multilingual inputs and output in 11 languages: English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. Mistral is looking to target a range of business use cases with this model: fraud detection, virtual customer service, sentiment analysis, feedback systems, customer triaging, and more.

Here’s a quick rundown of Mistral Small 3 24B specifications:

Mistral Small 3 24B
Mistral Small 3 24B Architecture	Designed for low latency, Tekken Tokenizer with a 131k vocabulary size
Variants	Pretrained Base model and Instruction tuned
Parameters	24 billion
Context Window	32k tokens
Licensing	Apache 2.0: Commercial and Non-commercial

Benchmark results shared by Mistral indicate that Small 3 exceeds GPT4o-mini and Gemma 2 27B, while being close to Llama 3.3 70B and Qwen 2.5 32B in terms of performance.

Connect with our team and other AI builders

Join Ori on Discord

How to use Mistral Small 3 with vLLM on an Ori virtual machine

Prerequisites

Create a GPU virtual machine (VM) on Ori Global Cloud. We chose the NVIDIA H100 SXM GPU because the vLLM server for Mistral Small 3 Instruct in BF16 format needs about 71 GB of memory, which includes model weights, Pytorch activation and KV cache. If you’re using smaller, quantized versions of the model, we recommend using the NVIDIA L40S GPU for its strong inference performance.

Quick tip

Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

Step 1: SSH into your VM, install Python and create a virtual environment

Bash/ShellCopy

1apt install python3.11-venv
2python3.11 -m venv mistral-env

Activate the virtual environment

Bash/ShellCopy

1source mistral-env/bin/activate

Step 2: Install the Hugging Face CLI if you haven’t before and log in. Make sure to have your Hugging Face token ready and to have accepted model terms of conditions .

Bash/ShellCopy

1pip install -U "huggingface_hub[cli]"
2huggingface-cli login

Step 3: Install vLLM and update tokenizer

Bash/ShellCopy

1pip install vllm
2pip install --upgrade mistral_common

Step 4: Load the model and run the vLLM server. We chose the Mistral Small 24B Instruct variant for its better performance.

Bash/ShellCopy

1vllm serve "mistralai/Mistral-Small-24B-Instruct-2501"

The default port for vLLM is 8000, however you can use the –port option to change it.

Note: It'll take a while to load all the model shards for the first time. Step 6: Query the model in another terminal window and use the JSON utility JQ to format model responses

Bash/ShellCopy

1sudo apt install jq
2curl -s "http://localhost:8000/v1/chat/completions" \
3     -H "Content-Type: application/json" \
4     -d '{
5        "model": "mistralai/Mistral-Small-24B-Instruct-2501",
6        "temperature": 0.15,
7        "messages": [
8           {"role": "user", "content": "How many times does the letter r appear in the word strawberry?"}
9        ]
10    }' | jq -r '."choices"[0]."message"."content"'

How good is Mistral Small 3?

We threw down a few verbal, math and coding reasoning challenges at Mistral Small 3 24B. The model did quite well with verbal intelligence, however at times it struggled with our coding and math questions.

Prompt: What is the best French cheese? Mistral Small 3 handled this subjective prompt quite well, it laid out some of the options and their strengths, and let the person asking the question decide based on these attributes.

Prompt: Translate this sentence to French, German and Italian Visit our website to learn more about end-to-end AI infrastructure. Go to www.ori.co.

The model understood the question correctly and translated the statement with reasonable accuracy.

Prompt: How many ‘r’s in “strawberry” ?

The model got the answer to this question wrong because the word strawberry has 3 ‘r’s.

Prompt: How many ‘l’s in strawberry”?

Mistral Small 3 answered incorrectly to this prompt too.

Prompt: Write the game Snake in Python

Creating a simple version of the classic Snake game in Python can be a fun and educational project. We'll use the `tkinter` library for the graphical interface and `random` for generating food positions.

Below is a basic implementation of Snake using `tkinter`

PythonCopy

1import tkinter as tk
2
3import random
4
5class SnakeGame:
6
7    def __init__(self, master):
8
9        self.master = master
10
11        self.master.title("Snake Game")
12
13        self.master.resizable(False, False)
14
15        self.canvas = tk.Canvas(self.master, width=400, height=400, bg="black")
16
17        self.canvas.pack()
18
19        self.snake = [(100, 50), (90, 50), (80, 50)]
20
21        self.food = self.spawn_food()
22
23        self.direction = "Right"
24
25        self.canvas.bind_all("<KeyPress>", self.change_direction)
26
27        self.game_loop()
28
29    def change_direction(self, event):
30
31        if event.keysym == "Up" and self.direction != "Down":
32
33            self.direction = "Up"
34
35        elif event.keysym == "Down" and self.direction != "Up":
36
37            self.direction = "Down"
38
39        elif event.keysym == "Left" and self.direction != "Right":
40
41            self.direction = "Left"
42
43        elif event.keysym == "Right" and self.direction != "Left":
44
45            self.direction = "Right"
46
47    def game_loop(self):
48
49        head = self.snake[0]
50
51        if self.direction == "Up":
52
53            new_head = (head[0], head[1] - 10)
54
55        elif self.direction == "Down":
56
57            new_head = (head[0], head[1] + 10)
58
59        elif self.direction == "Left":
60
61            new_head = (head[0] - 10, head[1])
62
63        elif self.direction == "Right":
64
65            new_head = (head[0] + 10, head[1])
66
67        self.snake.insert(0, new_head)
68
69        # Check for collisions
70
71        if (
72
73            new_head in self.snake[len(self.snake) - 1:len(self.snake)]  # Self collision
74
75            or new_head[0] < 0
76
77            or new_head[0] >= 400
78
79            or new_head[1] < 0
80
81            or new_head[1] >= 400
82
83        ):
84            self.game_over()
85
86            return
87
88        if new_head == self.food:
89
90            self.food = self.spawn_food()
91
92        else:
93
94            self.snake.pop()
95
96        self.canvas.delete("all")
97
98        self.draw_snake()
99
100        self.canvas.create_oval(self.food[0], self.food[1], self.food[0] + 10, self.food[1] + 10, fill="red")
101
102
103        self.master.after(100, self.game_loop)
104
105    def draw_snake(self):
106
107        for segment in self.snake:
108
109            self.canvas.create_rectangle(
110
111                segment[0], segment[1], segment[0] + 10, segment[1] + 10, fill="white"
112
113            )
114
115    def spawn_food(self):
116
117        while True:
118
119            food = (random.randint(0, 39) * 10, random.randint(0, 39) * 10)
120
121            if food not in self.snake:
122
123                return food
124
125    def game_over(self):
126
127        self.canvas.create_text(200, 200, text="Game Over", font=("Helvetica", 24), fill="white")
128
129if __name__ == "__main__":
130
131    root = tk.Tk()
132
133    game = SnakeGame(root)
134
135    root.mainloop()

Prompt: What is Agentic AI? Explain like I’m five.

Prompt: Write the Tetris game in Python.

The code generated by Mistral failed to run at first pass.

Prompt: Give me a concise summary of the Collatz_conjecture.

Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections.

Problem source: UC Davis

The answer provided by Mistral Small 3 is incorrect. The correct answer is 3 (or 3 square units).

Prompt: Quelles sont les sept merveilles du monde?

Overall, our perception of Mistral Small 3 was that it excelled at verbal analysis and translation, but delivered mixed results for coding and math prompts.

Chart your own AI reality with Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:

GPU instances, on-demand virtual machines backed by top-tier GPUs to run AI workloads.
Inference Endpoints to run and scale your favorite open source models with just one click.
GPU Clusters to train and serve your most ambitious AI models.
Serverless Kubernetes helps you run inference at scale without having to manage infrastructure.
Private Cloud provides a single-tenant environment, top-notch performance and secure infrastructure for enterprise AI builds.

Train, scale and serve AI on Ori

Build limitless AI on Ori

Chart your own AI reality with Ori's comprehensive AI cloud platform.

How to run Mistral Small 3 on a cloud GPU with vLLM

Connect with our team and other AI builders

How to use Mistral Small 3 with vLLM on an Ori virtual machine

Prerequisites

Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

How good is Mistral Small 3?

Chart your own AI reality with Ori

Build limitless AI on Ori