How to build a multi‑agent research assistant for life sciences

In many life sciences workflows, time often goes to tasks such as crafting search strings, triaging papers, reformatting notes, and stitching results from siloed systems. These manual steps slow decisions, introduce variability, and make reproducibility and audit trails harder than they need to be.

Agentic AI offers a practical solution by turning intent into a repeatable workflow that plans, retrieves, and synthesizes evidence. An agent converts a plain-language question into precise queries, pulls from authoritative sources and internal repositories, invokes the right tools at the right moment, and returns a structured, citation-ready draft for human review. Every step is logged for provenance, scope stays controlled, and the system fits your existing data, security, and governance.

Built on a simple loop of plan → retrieve → synthesize, these agents give teams both speed and rigor. CrewAI provides a lightweight, open-source way to operationalize this loop—composing task-specific agents, tools, and shared memory into dependable, auditable multi-step workflows. The result is more time for hypothesis and validation, less time on mechanics, and a measurable uptick in consistency across literature reviews, protocol comparisons, and decision briefs.

This tutorial demonstrates a research assistant for life‑sciences questions that plans PubMed searches and synthesizes results into a structured scientific report. It’s built with CrewAI’s multi-agent orchestration framework, Streamlit front-end, powered by an Ori‑hosted open-source LLM (such as GPT‑OSS‑120b) and deployed on an Ori Inference Endpoint.

Github Repository

Check out our Github repository to access the source code and documentation for this assistant.

What the agent crew does

At its core, the project runs a two‑agent crew sequentially:

A planning agent generates a precise PubMed search strategy tailored to the user’s question (life‑sciences only).
A synthesis agent turns retrieved findings into a structured scientific report. The system is intentionally strict in scope—non‑life‑sciences prompts are rejected with a clear “Out‑of‑scope” message—so it behaves predictably in research contexts.

Under the hood, PubMed retrieval is handled with NCBI E‑utilities (esearch + efetch) implemented in src/tools/pubmed.py. The Streamlit UI performs a hidden PubMed pre‑fetch from the user prompt and passes the summary downstream as pubmed_seed_results, so the agents begin with relevant signal rather than cold‑starting every time.

How it works (architecture at a glance)

User Interface (app.py) Loads environment variables from .env, caches a crew instance, executes a hidden PubMed search via pubmed_search_and_summarise(), and then calls crew.kickoff(inputs={"input": <user_prompt>, "pubmed_seed_results": <markdown>}). This keeps the chat experience fast and consistent.
Crew Orchestration (src/main.py) Two Tasks run with Process.sequential (planning → synthesis). Both tasks enforce a life‑sciences scope, ensuring queries stay within domain.
Agents (src/agents/*.py) Each agent uses crewai.LLM pointed at your Ori endpoint, with thinking disabled and a flat strategy to keep outputs concise and auditable. Because the endpoint is OpenAI‑compatible, the setup is straightforward.
Model & Provider The system targets an Ori‑hosted, OpenAI‑compatible LLM gpt-oss-120b

Step-by-step Guide

1) Create and activate a virtual environment

Bash/ShellCopy

1python3 -m venv .venv
2source .venv/bin/activate

2) Install dependencies

Bash/ShellCopy

1pip install -r requirements.txt

3) Configure environment Create a file named .env in the project root (same folder as app.py):

PythonCopy

1ORI_API_KEY="<your_ori_access_token>"
2ORI_API_BASE="<your_ori_openai_compatible_base>"  # e.g., https://gpt-120b-agent.lon3.inference.ogc.ori.co/v1
3ORI_MODEL_ID="model"                              # Ori endpoints commonly expect "model"

Note: do not commit .env. OPENAI_API_KEY is mirrored automatically from ORI_API_KEY for provider compatibility.

4) Run the Streamlit app

Bash/ShellCopy

1streamlit run app.py

Then open the Local URL (typically http://localhost:8501). Optionally, run in the foreground with logs:

Bash/ShellCopy

1pkill -f streamlit || true && ./.venv/bin/streamlit run app.py | cat

Project structure

research_assistant_agent/

├── requirements.txt # Python dependencies

├── README.md # This guide

├── app.py # Streamlit UI (hidden PubMed prefetch + chat)

├── src/

│ ├── __init__.py

│ ├── config.py # Env/model configuration (mirrors ORI_API_KEY to OPENAI_API_KEY)

│ ├── model_provider.py # OpenAI client setup for Ori (if needed by future code)

│ ├── tools/

│ │ └── pubmed.py # PubMed E-utilities search/fetch and markdown summariser

│ ├── agents/

│ │ ├── planning_agent.py

│ │ └── synthesis_agent.py

│ └── main.py # build_crew() + CLI entrypoint

CLI usage (note) & sharing

CLI caveat The CLI in src/main.py invokes the crew with only {input}. The synthesis task expects pubmed_seed_results, which the Streamlit UI provides. For now, prefer running via Streamlit; add a small wrapper if you need CLI support.
Sharing the app
- Same network: streamlit run app.py --server.address 0.0.0.0 --server.port 8501 and share the Network URL.
- Tunnels: ngrok http 8501 or npx localtunnel --port 8501 to share a public URL.
- Streamlit Community Cloud: push to GitHub, deploy, and set secrets (ORI_API_KEY, ORI_API_BASE, ORI_MODEL_ID).

See the multi-agent assistant in action

Turn it into production AI

FAQs:

How can I scale this demo for larger use-cases?

Ori Inference Endpoints lets you scale inference instances automatically so you can specify the maximum number of instances to run when you have a lot of requests coming and scale down all the way to zero when demand is low.

Can I connect internal data or other tools besides PubMed?

Yes. Add tools under src/tools/ and wire them into the crew (e.g., enterprise search, ELN/LIMS, vector DBs, protocol libraries). Because the model endpoint is OpenAI‑compatible, you can reuse existing SDKs while keeping your data secure on Ori

What LLMs can be used other than gpt-oss?

Inference Endpoints support a host of other models such as Llama 4, Llama 3.2, DeepSeek R1, Mistral, Qwen 2.5 and many more. You can also bring your own model with Model Registry.

Build your life sciences AI on Ori

Power your AI with a cloud platform that helps you navigate the unique needs of life sciences:

Train life science models at any scale with GPU Instances and Supercomputers
Serve models and agents with autoscaling and low-latency Inference Endpoints
Run sensitive workloads with deep governance and isolation across environments
Meet data-residency needs with Ori Sovereign AI Cloud
Build AI on a compliance-first platform that meets HIPAA, SOC2 and ISO standards

Power your AI with Ori

Build limitless AI on Ori

Chart your own AI reality with Ori's comprehensive AI cloud platform.