Tutorials

How to build an AI Agent for Health Advice

Posted : February, 14, 2025
Posted : February, 14, 2025
    AI agents healthcare

    Whenever we seek guidance on health matters online, we instinctively turn to trusted national health agencies, such as the NHS in UK, for reliable information—however scouring through its vast library of health content takes time and effort. What if we could build a ChatGPT like chatbot that answers all your health queries? Many AI systems generate impressive results, but since they are prone to hallucinations it sometimes leads to inaccurate responses, which is risky in healthcare. By integrating trusted NHS data into our agentic Retrieval-Augmented Generation (RAG) pipeline, we ensure accurate and reliable answers.
    This blog walks you through the entire process—from scraping NHS Health A–Z data, storing it on Ori’s Object Storage, and finally building an agentic RAG system with two retrieval methods (BM25 and vector databases) using Hugging Face’s smol agents library. At the end, we’ll compare the responses from both methods that can help you decide which one best fits your specific use case.

    Background

    Retrieval-Augmented Generation (RAG) combines a retrieval module with a generative language model. The retrieval component fetches the most relevant documents or passages from a corpus, and the generative model uses these documents to produce informed, contextually accurate answers. With Agentic RAG, we can go a step further that not only retrieves relevant information but also uses that data to guide the generation process in a more informed and interactive manner. This approach aims to bridge the gap between trusted, authoritative content and the dynamic, conversational abilities of modern generative models. Let’s explore how to build an intelligent, agentic RAG system designed specifically for NHS health conditions data.

    Connect with our team and other AI builders

    Join Ori on Discord

    Scraping the NHS Health A–Z Website

    We start by extracting the links for each health condition listed on the NHS Health A–Z website. Below is a simplified example of how you might accomplish this in Python:

    PythonCopy
    1import requests
    2from bs4 import BeautifulSoup
    3import csv
    4from urllib.parse import urljoin
    5
    6BASE_URL = "https://www.nhs.uk/conditions/"
    7
    8url = "https://www.nhs.uk/conditions/"
    9response = requests.get(url)
    10response.raise_for_status()
    11
    12soup = BeautifulSoup(response.text, "html.parser")
    13
    14# Find all anchor tags
    15links = soup.find_all("a")
    16
    17# Open a CSV file to write the results 
    18with open("nhs_conditions_links.csv", mode="w", newline="", encoding="utf-8") as csv_file:
    19    writer = csv.writer(csv_file)
    20    # Write header row
    21    writer.writerow(["Link Text", "URL"])
    22    
    23    for link in links:
    24        link_text = link.get_text(strip=True)
    25        link_href = link.get("href")
    26
    27        # We’ll also filter out empty or “#” links if needed:
    28        if link_href and link_href not in ["#", ""]:
    29            full_url = urljoin(BASE_URL, link_href)
    30            writer.writerow([link_text, full_url])

    We’ll then scrape the data from each of the extracted URLs on various health conditions and save into a Markdown CSV file.

    PythonCopy
    1import pandas as pd
    2import re
    3import requests
    4from markdownify import markdownify
    5from requests.exceptions import RequestException
    6
    7def markdown_from_urls(
    8        nhs_conditions_links,
    9        nhs_conditions_dataset
    10):
    11    #Read the input csv into a pandas dataframe
    12    df = pd.read_csv(nhs_conditions_links)
    13
    14    #Store the markdown results in a new list or directly in a new column
    15    md_contents =[]
    16
    17    #Iterate through each row
    18    for idx, row in df.iterrows():
    19        url = row["URL"]
    20
    21        print(f"[{idx+1}/{len(df)}] Fetching {url}")
    22
    23        try:
    24            response = requests.get(url, timeout=10)
    25            response.raise_for_status()
    26
    27            #Convert HTML to Markdown
    28            markdown_content = markdownify(response.text)
    29
    30            # Remove multiple line breaks
    31            markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)
    32
    33            # Return markdown_content
    34    
    35        except RequestException as e:
    36            return f"Error fetching the webpage: {url(e)}"
    37
    38        except Exception as e:
    39            return f"An unexpected error occurred: {url(e)}"
    40
    41        #Add the markdown content to our list
    42        md_contents.append(markdown_content)
    43
    44    #Create a new dataframe column with markdown content
    45    df["Markdown"] = md_contents
    46
    47    #Save to a new csv
    48    df.to_csv(nhs_conditions_dataset, index=False, encoding="utf-8")
    49
    50def main():
    51    #print("inside main")
    52    markdown_from_urls("nhs_conditions_links.csv",
    53        "nhs_conditions_dataset.csv")     
    54    
    55if __name__ == "__main__":
    56    main()

    Storing Data on S3

    Once you have your CSV file, the next step is to store it on OGC object storage (S3) or any preferable compatible S3 storage. Refer to our docs to get started on OGC S3. Once you create and set up your bucket, you can use the following command to copy the file to S3.

    Bash/ShellCopy
    1aws s3 cp /path/to/file/filename --endpoint-url=https://s3.<bucket_region>.oriobjects.cloud s3://bucket_name

    The S3-hosted CSV will serve as our data source for the retrieval modules. Before we dive into the implementation, it’s important to note that we’re using the s3fs library—one of the S3-compatible tools that simplifies reading and managing objects stored in your S3 bucket. Alternatively, you could also use boto3 to interact with your S3 storage. Note: For optimal compatibility, ensure that you have the botocore package version 1.35.99 installed to avoid enabling the checksum header by default.

    Implementing the Retrieval Methods

    We use two approaches to retrieve relevant responses from the NHS data:

    1. BM25 Retriever

    2. Vector Database

    In order to power the agent, we would need an LLM inference API. We are using OGC Inference Endpoints API, or alternatively you could also use the default Hugging Face’s HfApiModel. This inference powers the agent.

    Approach #1: BM25 Ranking Function

    BM25 is a classical ranking function used in information retrieval. It evaluates the relevance of documents by comparing the query terms with document term frequencies. Here is an overview of BM25 and how it can be used.

    Below is our agentic RAG system built with the BM25 retrieval method, leveraging the RetrieverTool from the smol agents library.

    Start by installing the required dependencies:

    Bash/ShellCopy
    1pip install smolagents pandas langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade -q
    PythonCopy
    1from langchain.docstore.document import Document
    2from langchain.text_splitter import RecursiveCharacterTextSplitter
    3from langchain_community.retrievers import BM25Retriever
    4
    5import os
    6import s3fs
    7import csv
    8
    9key = os.environ["ACCESS_KEY_ID"]
    10secret = os.environ["SECRET_ACCESS_KEY"]
    11endpoint_url = os.environ["ENDPOINT_URL"]
    12
    13fs = s3fs.S3FileSystem(
    14    key = key,
    15    secret = secret,
    16    endpoint_url = endpoint_url,
    17    
    18    config_kwargs={
    19        'region_name': 'eu-central-003',
    20        'signature_version': 's3v4',
    21    }
    22)
    23
    24with fs.open("nhs-dataset/nhs_conditions_dataset.csv", "rb") as f:
    25    decoded_content = f.read().decode("utf-8")
    26
    27    reader = csv.DictReader(decoded_content.splitlines())
    28    docs = list(reader)
    29
    30source_docs = [
    31    Document(page_content=doc["Markdown"], metadata={"URL": doc["URL"].split("/")[-2]})
    32    for doc in docs
    33]
    34
    35text_splitter = RecursiveCharacterTextSplitter(
    36    chunk_size=500,
    37    chunk_overlap=50,
    38    add_start_index=True,
    39    strip_whitespace=True,
    40    separators=["\n\n", "\n", ".", " ", ""],
    41)
    42docs_processed = text_splitter.split_documents(source_docs)
    43
    44from smolagents import Tool
    45
    46class RetrieverTool(Tool):
    47    name = "retriever"
    48    description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
    49    inputs = {
    50        "query": {
    51            "type": "string",
    52            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
    53        }
    54    }
    55    output_type = "string"
    56
    57    def __init__(self, docs, **kwargs):
    58        super().__init__(**kwargs)
    59        self.retriever = BM25Retriever.from_documents(
    60            docs, k=10
    61        )
    62
    63    def forward(self, query: str) -> str:
    64        assert isinstance(query, str), "Your search query must be a string"
    65
    66        docs = self.retriever.invoke(
    67            query,
    68        )
    69        return "\nRetrieved documents:\n" + "".join(
    70            [
    71                f"\n\n===== Document {str(i)} =====\n" + doc.page_content
    72                for i, doc in enumerate(docs)
    73            ]
    74        )
    75
    76retriever_tool = RetrieverTool(docs_processed)
    77
    78# Using Ori Endpoint
    79
    80from smolagents import OpenAIServerModel
    81model = OpenAIServerModel(
    82    model_id=model,
    83    api_base="https://for-smol-agent.inference.ogc.ori.co/openai/v1/",
    84    api_key=os.environ["ACCESS_TOKEN"],
    85)
    86
    87# If using an HfApiModel
    88# from smolagents import HfApiModel
    89
    90# agent = CodeAgent(
    91#     tools=[retriever_tool], model=HfApiModel(), max_steps=4,verbosity_level=2
    92# )
    93
    94from smolagents import CodeAgent
    95
    96agent = CodeAgent(
    97    	tools=[retriever_tool], 
    98model=model, 
    99max_steps=4, 
    100verbosity_level=2
    101)
    102
    103agent_output = agent.run("What to do if your throat is paining?")
    104
    105print("Final output:")
    106print(agent_output)

    On running with the question “What to do if your throat is paining?”, we get the following:

    HTMLCopy
    1Final output:
    2Okay, so I'm trying to figure out what to do if my throat is hurting. The user wants a concise answer, so I need to be efficient but still helpful.
    3
    4First, I recall that when your throat gets bad, it's important to assess the situation. Maybe there's an infection or something else causing it. But without symptoms like fever or a runny nose, it's hard to say for sure.
    5
    6Hydration seems like a good first step. Staying hydrated can help ease discomfort and prevent dehydration. I'll mention drinking fluids, maybe water or herbal teas.
    7
    8If the pain persists, I should advise consulting a healthcare professional. They can give a proper diagnosis and suggest treatments like antibiotics or over-the-counter remedies if necessary.
    9
    10Using a saline solution might help clear the throat, so that's another step to include. Maybe just a quick gargle with some saltwater.
    11
    12Keeping the environment clean is also important. Wiping down surfaces could reduce any irritation or allergens that might be causing the pain.
    13
    14I should keep each step brief and clear, without being too technical. The user wants a straightforward guide, so each tip should stand alone. I'll structure it in a numbered list for easy reading.
    15
    16I need to make sure not to repeat any previous errors, like using functions that aren't available, so I'll stick to what's possible with the tools at hand.
    17
    18Putting it all together: assess the situation, stay hydrated, consult a professional, gargle with saline, and keep the area clean. That should cover the main actions needed.
    19</think>
    20
    21To address the throbbing pain in your throat and provide a structured response based on typical health advice, here are the key steps to take:
    22
    231. **Assess the Situation**: Evaluate if the pain is due to an infection, allergies, or something else. Consider any additional symptoms like fever or runny nose.
    24
    252. **Hydration**: Drink fluids to stay hydrated. Simple options include water or herbal teas to alleviate discomfort.
    26
    273. **Consult a Healthcare Professional**: If the pain persists or worsens, seek medical advice. They may perform a proper diagnosis and recommend appropriate treatments, such as antibiotics or over-the-counter remedies.
    28
    294. **Remove Irritants**: To help alleviate the throat pain, you can gargle with a saline solution or mouthwash. Avoid irritants like alcohol, smoking, and certain foods or drinks.
    30
    315. **Maintain Hygiene**: Clean frequently touched surfaces to reduce irritation and avoid potential irritants that might be contributing to the symptom.
    32
    33By following these steps, you can better manage the throbbing pain and address the underlying cause if necessary.
    34  

    Approach #2: Using a Vector Database

    For a more semantic approach, we’ll create embeddings using the all-MiniLM-L6-v2, a sentence-transformer model from Hugging Face and then store them in a Chroma vector database for the RAG. This method captures contextual meaning better than simple keyword matching.

    Note: In production, consider using more advanced vector databases such as Pinecone, Weaviate for scalability and additional features.

    PythonCopy
    1from langchain.docstore.document import Document
    2from langchain.text_splitter import RecursiveCharacterTextSplitter
    3from langchain_chroma import Chroma
    4
    5from langchain_huggingface import HuggingFaceEmbeddings
    6from tqdm import tqdm
    7from transformers import AutoTokenizer
    8
    9import os
    10import s3fs
    11import csv
    12
    13key = os.environ["ACCESS_KEY_ID"]
    14secret = os.environ["SECRET_ACCESS_KEY"]
    15endpoint_url = os.environ["ENDPOINT_URL"]
    16
    17fs = s3fs.S3FileSystem(
    18    key = key,
    19    secret = secret,
    20    endpoint_url = endpoint_url,
    21    
    22    config_kwargs={
    23        'region_name': 'eu-central-003',
    24        'signature_version': 's3v4',
    25    }
    26)
    27
    28with fs.open("nhs-dataset/nhs_conditions_dataset.csv", "rb") as f:
    29    decoded_content = f.read().decode("utf-8")
    30
    31    reader = csv.DictReader(decoded_content.splitlines())
    32    docs = list(reader)
    33
    34source_docs = [
    35    Document(page_content=doc["Markdown"], metadata={"URL": doc["URL"].split("/")[-2]}) for doc in docs
    36]
    37
    38text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
    39    AutoTokenizer.from_pretrained("thenlper/gte-small"),
    40    chunk_size=200,
    41    chunk_overlap=20,
    42    add_start_index=True,
    43    strip_whitespace=True,
    44    separators=["\n\n", "\n", ".", " ", ""],
    45)
    46
    47# Split docs and keep only unique ones
    48print("Splitting documents...")
    49docs_processed = []
    50unique_texts = {}
    51for doc in tqdm(source_docs):
    52    new_docs = text_splitter.split_documents([doc])
    53    for new_doc in new_docs:
    54        if new_doc.page_content not in unique_texts:
    55            unique_texts[new_doc.page_content] = True
    56            docs_processed.append(new_doc)
    57
    58
    59print("Embedding documents... ")
    60
    61# Initialize embeddings and ChromaDB vector store
    62embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    63
    64vector_store = Chroma.from_documents(docs_processed, embeddings, persist_directory="./chroma_db")
    65
    66from smolagents import Tool
    67
    68class RetrieverTool(Tool):
    69    name = "retriever"
    70    description = (
    71        "Uses semantic search to retrieve the parts of documentation that could be most relevant to answer your query."
    72    )
    73    inputs = {
    74        "query": {
    75            "type": "string",
    76            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
    77        }
    78    }
    79    output_type = "string"
    80
    81    def __init__(self, vector_store, **kwargs):
    82        super().__init__(**kwargs)
    83        self.vector_store = vector_store
    84
    85    def forward(self, query: str) -> str:
    86        assert isinstance(query, str), "Your search query must be a string"
    87        docs = self.vector_store.similarity_search(query, k=10)
    88        return "\nRetrieved documents:\n" + "".join(
    89            [f"\n\n===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(docs)]
    90        )
    91    
    92retriever_tool = RetrieverTool(vector_store)
    93
    94from smolagents import OpenAIServerModel, CodeAgent
    95
    96model = OpenAIServerModel(
    97    model_id=model,
    98    api_base="https://for-smol-agent.inference.ogc.ori.co/openai/v1/",
    99    api_key=os.environ["ACCESS_TOKEN"],
    100)
    101
    102# from smolagents import HfApiModel
    103# model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")
    104
    105agent = CodeAgent(
    106        tools=[retriever_tool],
    107        model = model
    108        # model=HfApiModel(),
    109        max_steps=4,
    110        verbosity_level=2,
    111    )
    112
    113agent_output = agent.run("What to do if your throat is paining?")
    114
    115
    116print("Final output:")
    117print(agent_output)

    This generates the following output

    HTMLCopy
    1Final output:
    2['Get plenty of rest', 
    3'Drink plenty of fluids', 
    4'Take painkillers like paracetamol or ibuprofen', 
    5'Try adding honey to a warm drink to soothe your throat', 
    6'Gargle with warm salty water (not for children)',
    7'Cover your mouth and nose when coughing or sneezing', 
    8'Wash your hands regularly']

    Comparing the responses

    1. BM25-based retrieval: The output generated by BM25 retriever is more aligned with the original NHS content. This retrieval method surfaces text that more closely matches the official NHS wording and structure, thus including formal “red flag” advice. This is because BM25 can retrieve exact or near-exact text matches from your source documents.
    2. Vector Database: Provides a broader coverage of symptomatic care. The embeddings-based approach captures a wider spread of advice across multiple documents (e.g., mention of humidifiers, more pediatric nuances) because vector-based similarity can cluster semantically related content. However, it may occasionally miss important keywords or phrases if those items are not “close” in the embedding space.

    How can we enhance model output further?

    1. Improve Prompt Engineering: For the LLM part of your RAG pipeline, refine your prompt to specifically request both:
    • Self-care advice (hydration, rest, etc.), and
    • When to seek professional help (urgent symptoms, red flags).
    • Provide instructions for the model to include disclaimers or relevant official guidance.

    1. Use More Domain-Specific Models
      • Embeddings from domain-specific models (e.g., BioClinicalBERT, PubMedBERT) might capture medical nuances better.
      • Similarly, a more capable LLM with domain fine-tuning might produce more coherent, comprehensive answers.

    In this blog, we’ve built an agentic RAG system that scrapes NHS Health A–Z data, stores it securely on S3, and utilizes two retrieval methods, BM25 for keyword matching and a vector database for semantic search, integrated with Hugging Face’s smol agents library. This framework effectively combines retrieval and generation to build domain-specific intelligent agents and can be adapted for various applications, with enhancements including fine-tuning, user feedback integration, and advanced vector scalability.

    Chart your own AI reality with Ori

    Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:

    Share