Whenever we seek guidance on health matters online, we instinctively turn to trusted national health agencies, such as the NHS in UK, for reliable information—however scouring through its vast library of health content takes time and effort. What if we could build a ChatGPT like chatbot that answers all your health queries? Many AI systems generate impressive results, but since they are prone to hallucinations it sometimes leads to inaccurate responses, which is risky in healthcare. By integrating trusted NHS data into our agentic Retrieval-Augmented Generation (RAG) pipeline, we ensure accurate and reliable answers.
This blog walks you through the entire process—from scraping NHS Health A–Z data, storing it on Ori’s Object Storage, and finally building an agentic RAG system with two retrieval methods (BM25 and vector databases) using Hugging Face’s smol agents library. At the end, we’ll compare the responses from both methods that can help you decide which one best fits your specific use case.

Background

Retrieval-Augmented Generation (RAG) combines a retrieval module with a generative language model. The retrieval component fetches the most relevant documents or passages from a corpus, and the generative model uses these documents to produce informed, contextually accurate answers. With Agentic RAG, we can go a step further that not only retrieves relevant information but also uses that data to guide the generation process in a more informed and interactive manner. This approach aims to bridge the gap between trusted, authoritative content and the dynamic, conversational abilities of modern generative models. Let’s explore how to build an intelligent, agentic RAG system designed specifically for NHS health conditions data.

Connect with our team and other AI builders

Join Ori on Discord

Scraping the NHS Health A–Z Website

We start by extracting the links for each health condition listed on the NHS Health A–Z website. Below is a simplified example of how you might accomplish this in Python:

PythonCopy

1import requests
2from bs4 import BeautifulSoup
3import csv
4from urllib.parse import urljoin
5
6BASE_URL = "https://www.nhs.uk/conditions/"
7
8url = "https://www.nhs.uk/conditions/"
9response = requests.get(url)
10response.raise_for_status()
11
12soup = BeautifulSoup(response.text, "html.parser")
13
14# Find all anchor tags
15links = soup.find_all("a")
16
17# Open a CSV file to write the results 
18with open("nhs_conditions_links.csv", mode="w", newline="", encoding="utf-8") as csv_file:
19    writer = csv.writer(csv_file)
20    # Write header row
21    writer.writerow(["Link Text", "URL"])
22    
23    for link in links:
24        link_text = link.get_text(strip=True)
25        link_href = link.get("href")
26
27        # We’ll also filter out empty or “#” links if needed:
28        if link_href and link_href not in ["#", ""]:
29            full_url = urljoin(BASE_URL, link_href)
30            writer.writerow([link_text, full_url])

We’ll then scrape the data from each of the extracted URLs on various health conditions and save into a Markdown CSV file.

PythonCopy

1import pandas as pd
2import re
3import requests
4from markdownify import markdownify
5from requests.exceptions import RequestException
6
7def markdown_from_urls(
8        nhs_conditions_links,
9        nhs_conditions_dataset
10):
11    #Read the input csv into a pandas dataframe
12    df = pd.read_csv(nhs_conditions_links)
13
14    #Store the markdown results in a new list or directly in a new column
15    md_contents =[]
16
17    #Iterate through each row
18    for idx, row in df.iterrows():
19        url = row["URL"]
20
21        print(f"[{idx+1}/{len(df)}] Fetching {url}")
22
23        try:
24            response = requests.get(url, timeout=10)
25            response.raise_for_status()
26
27            #Convert HTML to Markdown
28            markdown_content = markdownify(response.text)
29
30            # Remove multiple line breaks
31            markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)
32
33            # Return markdown_content
34    
35        except RequestException as e:
36            return f"Error fetching the webpage: {url(e)}"
37
38        except Exception as e:
39            return f"An unexpected error occurred: {url(e)}"
40
41        #Add the markdown content to our list
42        md_contents.append(markdown_content)
43
44    #Create a new dataframe column with markdown content
45    df["Markdown"] = md_contents
46
47    #Save to a new csv
48    df.to_csv(nhs_conditions_dataset, index=False, encoding="utf-8")
49
50def main():
51    #print("inside main")
52    markdown_from_urls("nhs_conditions_links.csv",
53        "nhs_conditions_dataset.csv")     
54    
55if __name__ == "__main__":
56    main()

Storing Data on S3

Once you have your CSV file, the next step is to store it on OGC object storage (S3) or any preferable compatible S3 storage. Refer to our docs to get started on OGC S3. Once you create and set up your bucket, you can use the following command to copy the file to S3.

Bash/ShellCopy

1aws s3 cp /path/to/file/filename --endpoint-url=https://s3.<bucket_region>.oriobjects.cloud s3://bucket_name

The S3-hosted CSV will serve as our data source for the retrieval modules. Before we dive into the implementation, it’s important to note that we’re using the s3fs library—one of the S3-compatible tools that simplifies reading and managing objects stored in your S3 bucket. Alternatively, you could also use boto3 to interact with your S3 storage. Note: For optimal compatibility, ensure that you have the botocore package version 1.35.99 installed to avoid enabling the checksum header by default.

Implementing the Retrieval Methods

We use two approaches to retrieve relevant responses from the NHS data:

BM25 Retriever
Vector Database

In order to power the agent, we would need an LLM inference API. We are using OGC Inference Endpoints API, or alternatively you could also use the default Hugging Face’s HfApiModel. This inference powers the agent.

Approach #1: BM25 Ranking Function

BM25 is a classical ranking function used in information retrieval. It evaluates the relevance of documents by comparing the query terms with document term frequencies. Here is an overview of BM25 and how it can be used.

Below is our agentic RAG system built with the BM25 retrieval method, leveraging the RetrieverTool from the smol agents library.

Start by installing the required dependencies:

Bash/ShellCopy

1pip install smolagents pandas langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade -q

PythonCopy

1from langchain.docstore.document import Document
2from langchain.text_splitter import RecursiveCharacterTextSplitter
3from langchain_community.retrievers import BM25Retriever
4
5import os
6import s3fs
7import csv
8
9key = os.environ["ACCESS_KEY_ID"]
10secret = os.environ["SECRET_ACCESS_KEY"]
11endpoint_url = os.environ["ENDPOINT_URL"]
12
13fs = s3fs.S3FileSystem(
14    key = key,
15    secret = secret,
16    endpoint_url = endpoint_url,
17    
18    config_kwargs={
19        'region_name': 'eu-central-003',
20        'signature_version': 's3v4',
21    }
22)
23
24with fs.open("nhs-dataset/nhs_conditions_dataset.csv", "rb") as f:
25    decoded_content = f.read().decode("utf-8")
26
27    reader = csv.DictReader(decoded_content.splitlines())
28    docs = list(reader)
29
30source_docs = [
31    Document(page_content=doc["Markdown"], metadata={"URL": doc["URL"].split("/")[-2]})
32    for doc in docs
33]
34
35text_splitter = RecursiveCharacterTextSplitter(
36    chunk_size=500,
37    chunk_overlap=50,
38    add_start_index=True,
39    strip_whitespace=True,
40    separators=["\n\n", "\n", ".", " ", ""],
41)
42docs_processed = text_splitter.split_documents(source_docs)
43
44from smolagents import Tool
45
46class RetrieverTool(Tool):
47    name = "retriever"
48    description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
49    inputs = {
50        "query": {
51            "type": "string",
52            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
53        }
54    }
55    output_type = "string"
56
57    def __init__(self, docs, **kwargs):
58        super().__init__(**kwargs)
59        self.retriever = BM25Retriever.from_documents(
60            docs, k=10
61        )
62
63    def forward(self, query: str) -> str:
64        assert isinstance(query, str), "Your search query must be a string"
65
66        docs = self.retriever.invoke(
67            query,
68        )
69        return "\nRetrieved documents:\n" + "".join(
70            [
71                f"\n\n===== Document {str(i)} =====\n" + doc.page_content
72                for i, doc in enumerate(docs)
73            ]
74        )
75
76retriever_tool = RetrieverTool(docs_processed)
77
78# Using Ori Endpoint
79
80from smolagents import OpenAIServerModel
81model = OpenAIServerModel(
82    model_id=model,
83    api_base="https://for-smol-agent.inference.ogc.ori.co/openai/v1/",
84    api_key=os.environ["ACCESS_TOKEN"],
85)
86
87# If using an HfApiModel
88# from smolagents import HfApiModel
89
90# agent = CodeAgent(
91#     tools=[retriever_tool], model=HfApiModel(), max_steps=4,verbosity_level=2
92# )
93
94from smolagents import CodeAgent
95
96agent = CodeAgent(
97    	tools=[retriever_tool], 
98model=model, 
99max_steps=4, 
100verbosity_level=2
101)
102
103agent_output = agent.run("What to do if your throat is paining?")
104
105print("Final output:")
106print(agent_output)

On running with the question “What to do if your throat is paining?”, we get the following:

HTMLCopy

1Final output:
2Okay, so I'm trying to figure out what to do if my throat is hurting. The user wants a concise answer, so I need to be efficient but still helpful.
3
4First, I recall that when your throat gets bad, it's important to assess the situation. Maybe there's an infection or something else causing it. But without symptoms like fever or a runny nose, it's hard to say for sure.
5
6Hydration seems like a good first step. Staying hydrated can help ease discomfort and prevent dehydration. I'll mention drinking fluids, maybe water or herbal teas.
7
8If the pain persists, I should advise consulting a healthcare professional. They can give a proper diagnosis and suggest treatments like antibiotics or over-the-counter remedies if necessary.
9
10Using a saline solution might help clear the throat, so that's another step to include. Maybe just a quick gargle with some saltwater.
11
12Keeping the environment clean is also important. Wiping down surfaces could reduce any irritation or allergens that might be causing the pain.
13
14I should keep each step brief and clear, without being too technical. The user wants a straightforward guide, so each tip should stand alone. I'll structure it in a numbered list for easy reading.
15
16I need to make sure not to repeat any previous errors, like using functions that aren't available, so I'll stick to what's possible with the tools at hand.
17
18Putting it all together: assess the situation, stay hydrated, consult a professional, gargle with saline, and keep the area clean. That should cover the main actions needed.
19</think>
20
21To address the throbbing pain in your throat and provide a structured response based on typical health advice, here are the key steps to take:
22
231. **Assess the Situation**: Evaluate if the pain is due to an infection, allergies, or something else. Consider any additional symptoms like fever or runny nose.
24
252. **Hydration**: Drink fluids to stay hydrated. Simple options include water or herbal teas to alleviate discomfort.
26
273. **Consult a Healthcare Professional**: If the pain persists or worsens, seek medical advice. They may perform a proper diagnosis and recommend appropriate treatments, such as antibiotics or over-the-counter remedies.
28
294. **Remove Irritants**: To help alleviate the throat pain, you can gargle with a saline solution or mouthwash. Avoid irritants like alcohol, smoking, and certain foods or drinks.
30
315. **Maintain Hygiene**: Clean frequently touched surfaces to reduce irritation and avoid potential irritants that might be contributing to the symptom.
32
33By following these steps, you can better manage the throbbing pain and address the underlying cause if necessary.
34

Approach #2: Using a Vector Database

For a more semantic approach, we’ll create embeddings using the all-MiniLM-L6-v2, a sentence-transformer model from Hugging Face and then store them in a Chroma vector database for the RAG. This method captures contextual meaning better than simple keyword matching.

Note: In production, consider using more advanced vector databases such as Pinecone, Weaviate for scalability and additional features.

PythonCopy

1from langchain.docstore.document import Document
2from langchain.text_splitter import RecursiveCharacterTextSplitter
3from langchain_chroma import Chroma
4
5from langchain_huggingface import HuggingFaceEmbeddings
6from tqdm import tqdm
7from transformers import AutoTokenizer
8
9import os
10import s3fs
11import csv
12
13key = os.environ["ACCESS_KEY_ID"]
14secret = os.environ["SECRET_ACCESS_KEY"]
15endpoint_url = os.environ["ENDPOINT_URL"]
16
17fs = s3fs.S3FileSystem(
18    key = key,
19    secret = secret,
20    endpoint_url = endpoint_url,
21    
22    config_kwargs={
23        'region_name': 'eu-central-003',
24        'signature_version': 's3v4',
25    }
26)
27
28with fs.open("nhs-dataset/nhs_conditions_dataset.csv", "rb") as f:
29    decoded_content = f.read().decode("utf-8")
30
31    reader = csv.DictReader(decoded_content.splitlines())
32    docs = list(reader)
33
34source_docs = [
35    Document(page_content=doc["Markdown"], metadata={"URL": doc["URL"].split("/")[-2]}) for doc in docs
36]
37
38text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
39    AutoTokenizer.from_pretrained("thenlper/gte-small"),
40    chunk_size=200,
41    chunk_overlap=20,
42    add_start_index=True,
43    strip_whitespace=True,
44    separators=["\n\n", "\n", ".", " ", ""],
45)
46
47# Split docs and keep only unique ones
48print("Splitting documents...")
49docs_processed = []
50unique_texts = {}
51for doc in tqdm(source_docs):
52    new_docs = text_splitter.split_documents([doc])
53    for new_doc in new_docs:
54        if new_doc.page_content not in unique_texts:
55            unique_texts[new_doc.page_content] = True
56            docs_processed.append(new_doc)
57
58
59print("Embedding documents... ")
60
61# Initialize embeddings and ChromaDB vector store
62embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
63
64vector_store = Chroma.from_documents(docs_processed, embeddings, persist_directory="./chroma_db")
65
66from smolagents import Tool
67
68class RetrieverTool(Tool):
69    name = "retriever"
70    description = (
71        "Uses semantic search to retrieve the parts of documentation that could be most relevant to answer your query."
72    )
73    inputs = {
74        "query": {
75            "type": "string",
76            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
77        }
78    }
79    output_type = "string"
80
81    def __init__(self, vector_store, **kwargs):
82        super().__init__(**kwargs)
83        self.vector_store = vector_store
84
85    def forward(self, query: str) -> str:
86        assert isinstance(query, str), "Your search query must be a string"
87        docs = self.vector_store.similarity_search(query, k=10)
88        return "\nRetrieved documents:\n" + "".join(
89            [f"\n\n===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(docs)]
90        )
91    
92retriever_tool = RetrieverTool(vector_store)
93
94from smolagents import OpenAIServerModel, CodeAgent
95
96model = OpenAIServerModel(
97    model_id=model,
98    api_base="https://for-smol-agent.inference.ogc.ori.co/openai/v1/",
99    api_key=os.environ["ACCESS_TOKEN"],
100)
101
102# from smolagents import HfApiModel
103# model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")
104
105agent = CodeAgent(
106        tools=[retriever_tool],
107        model = model
108        # model=HfApiModel(),
109        max_steps=4,
110        verbosity_level=2,
111    )
112
113agent_output = agent.run("What to do if your throat is paining?")
114
115
116print("Final output:")
117print(agent_output)

This generates the following output

HTMLCopy

1Final output:
2['Get plenty of rest', 
3'Drink plenty of fluids', 
4'Take painkillers like paracetamol or ibuprofen', 
5'Try adding honey to a warm drink to soothe your throat', 
6'Gargle with warm salty water (not for children)',
7'Cover your mouth and nose when coughing or sneezing', 
8'Wash your hands regularly']

Comparing the responses

BM25-based retrieval: The output generated by BM25 retriever is more aligned with the original NHS content. This retrieval method surfaces text that more closely matches the official NHS wording and structure, thus including formal “red flag” advice. This is because BM25 can retrieve exact or near-exact text matches from your source documents.
Vector Database: Provides a broader coverage of symptomatic care. The embeddings-based approach captures a wider spread of advice across multiple documents (e.g., mention of humidifiers, more pediatric nuances) because vector-based similarity can cluster semantically related content. However, it may occasionally miss important keywords or phrases if those items are not “close” in the embedding space.

How can we enhance model output further?

Improve Prompt Engineering: For the LLM part of your RAG pipeline, refine your prompt to specifically request both:

Self-care advice (hydration, rest, etc.), and
When to seek professional help (urgent symptoms, red flags).
Provide instructions for the model to include disclaimers or relevant official guidance.

Use More Domain-Specific Models
- Embeddings from domain-specific models (e.g., BioClinicalBERT, PubMedBERT) might capture medical nuances better.
- Similarly, a more capable LLM with domain fine-tuning might produce more coherent, comprehensive answers.

In this blog, we’ve built an agentic RAG system that scrapes NHS Health A–Z data, stores it securely on S3, and utilizes two retrieval methods, BM25 for keyword matching and a vector database for semantic search, integrated with Hugging Face’s smol agents library. This framework effectively combines retrieval and generation to build domain-specific intelligent agents and can be adapted for various applications, with enhancements including fine-tuning, user feedback integration, and advanced vector scalability.

Chart your own AI reality with Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:

Deploy Private Clouds to build secure Enterprise AI, faster.
Operate Inference Endpoints effortlessly at any scale.
Leverage GPU Instances as on-demand virtual machines.
Scale GPU Clusters for training and inference.
Manage AI workloads on Serverless Kubernetes without infrastructure overhead.

Build limitless AI on Ori

Chart your own AI reality with Ori's comprehensive AI cloud platform.

How to build an AI Agent for Health Advice

Background

Connect with our team and other AI builders

Scraping the NHS Health A–Z Website

Storing Data on S3

Implementing the Retrieval Methods

Approach #1: BM25 Ranking Function

Approach #2: Using a Vector Database

Comparing the responses

How can we enhance model output further?

Chart your own AI reality with Ori

Build limitless AI on Ori