Skip to content

Retrieval Augmented Generation Chatbot

Overview

A complete RAG (Retrieval Augmented Generation) chatbot that answers cooking questions using a recipe collection from GitHub. The system ingests markdown recipe files, splits them into chunks, generates embeddings, stores them in a vector database, and provides conversational search with context-aware responses using memory to maintain conversation history.

Architecture

flowchart TD
    subgraph APP ["📱 recipe_rag_chatbot"]
        direction TB

    subgraph FLOW_0 ["🔄 recipe_chat"]
        direction LR
        FLOW_0_START@{shape: circle, label: "▶️ Start"}
        FLOW_0_S0@{shape: rect, label: "⚙️ extract_question"}
        FLOW_0_S1@{shape: cyl, label: "🔎 search_recipes"}
        FLOW_0_S2@{shape: doc, label: "📄 build_context_prompt"}
        FLOW_0_S3@{shape: rounded, label: "✨ generate_response"}
        FLOW_0_START -->|user_message| FLOW_0_S0
        FLOW_0_S0 -->|user_question| FLOW_0_S1
        FLOW_0_S1 -->|search_results| FLOW_0_S2
        FLOW_0_S0 -->|user_question| FLOW_0_S2
        FLOW_0_S2 -->|context_prompt| FLOW_0_S3
    end

    subgraph FLOW_1 ["🔄 recipe_ingestion"]
        direction TB
        FLOW_1_S0@{shape: rect, label: "⚙️ load_recipes"}
        FLOW_1_S1@{shape: rect, label: "⚙️ split_recipes"}
        FLOW_1_S2@{shape: rect, label: "⚙️ embed_chunks"}
        FLOW_1_S3@{shape: rect, label: "💾 index_recipes"}
        FLOW_1_S0 -->|recipe_document| FLOW_1_S1
        FLOW_1_S1 -->|recipe_chunk| FLOW_1_S2
        FLOW_1_S2 -->|embedded_chunk| FLOW_1_S3
    end

    subgraph RESOURCES ["🔧 Shared Resources"]
        direction LR
        AUTH_AWS_AUTH@{shape: hex, label: "🔐 aws_auth (AWS)"}
        MODEL_CLAUDE_SONNET@{shape: rounded, label: "✨ claude_sonnet (aws-bedrock)" }
        MODEL_CLAUDE_SONNET -.->|uses| AUTH_AWS_AUTH
        MODEL_TITAN_EMBED@{shape: rounded, label: "✨ titan_embed (aws-bedrock)" }
        MODEL_TITAN_EMBED -.->|uses| AUTH_AWS_AUTH
        INDEX_RECIPE_INDEX@{shape: cyl, label: "�️ recipe_index"}
        EMB_TITAN_EMBED@{shape: rounded, label: "🎯 titan_embed"}
        INDEX_RECIPE_INDEX -.->|embeds| EMB_TITAN_EMBED
        MEM_RECIPE_CHAT_MEMORY@{shape: win-pane, label: "🧠 recipe_chat_memory (10KT)"}
    end

    end

    FLOW_0_S1 -.-> INDEX_RECIPE_INDEX
    FLOW_0_S3 -.->|uses| MODEL_CLAUDE_SONNET
    FLOW_0_S3 -.->|stores| MEM_RECIPE_CHAT_MEMORY
    FLOW_1_S3 -.->|writes| INDEX_RECIPE_INDEX

    %% Styling
    classDef appBox fill:none,stroke:#495057,stroke-width:3px
    classDef flowBox fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
    classDef llmNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef modelNode fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    classDef authNode fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
    classDef telemetryNode fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef resourceBox fill:#f5f5f5,stroke:#616161,stroke-width:1px

    class APP appBox
    class FLOW_0 flowBox
    class RESOURCES resourceBox
    class TELEMETRY telemetryNode

Complete Code

id: recipe_rag_chatbot
description: |
  RAG chatbot for the Chowdown recipe collection from GitHub.

  This application provides two flows:

  1. recipe_chat: Conversational chatbot that answers questions about recipes
     - Uses RAG to find relevant recipes based on user questions
     - Maintains conversation history with memory
     - Provides cooking advice, recipe recommendations, and ingredient information

  2. recipe_ingestion: Ingests recipe markdown files into vector database
     - Clones/fetches recipes from GitHub (clarklab/chowdown)
     - Splits recipe documents into searchable chunks
     - Generates embeddings using AWS Bedrock Titan
     - Stores in Qdrant vector database for fast similarity search

  Prerequisites:
  - AWS credentials configured (AWS_PROFILE environment variable)
  - Qdrant running locally on port 6333 (or update args for Qdrant Cloud)
  - Clone the recipe repo: git clone https://github.com/clarklab/chowdown.git

  To ingest recipes:
    qtype run recipe_chatbot.qtype.yaml --flow recipe_ingestion

  To start chatbot:
    qtype serve recipe_chatbot.qtype.yaml --flow recipe_chat

# AWS Authentication for Bedrock
auths:
  - type: aws
    id: aws_auth
    profile_name: ${AWS_PROFILE}

# Models
models:
  # Embedding model for vectorizing recipes and queries
  - type: EmbeddingModel
    id: titan_embed
    provider: aws-bedrock
    model_id: amazon.titan-embed-text-v2:0
    dimensions: 1024
    auth: aws_auth

  # Chat model for conversational responses
  - type: Model
    id: claude_sonnet
    provider: aws-bedrock
    model_id: us.anthropic.claude-3-5-sonnet-20241022-v2:0
    inference_params:
      temperature: 0.7
      max_tokens: 4096
    auth: aws_auth

# Vector index for recipe embeddings
indexes:
  - type: VectorIndex
    module: llama_index.vector_stores.qdrant.QdrantVectorStore
    id: recipe_index
    name: chowdown_recipes
    embedding_model: titan_embed
    args:
      collection_name: chowdown_recipes
      url: http://localhost:6333
      api_key: ""  # Empty for local Qdrant

# Memory for maintaining conversation context
memories:
  - id: recipe_chat_memory
    token_limit: 10000
    chat_history_token_ratio: 0.7

# Flows
flows:
  # Conversational chatbot flow
  - type: Flow
    id: recipe_chat
    description: Chat with the recipe collection using RAG

    interface:
      type: Conversational

    variables:
      - id: user_message
        type: ChatMessage
      - id: user_question
        type: text
      - id: search_results
        type: list[RAGSearchResult]
      - id: context_prompt
        type: text
      - id: assistant_response
        type: ChatMessage

    inputs:
      - user_message

    outputs:
      - assistant_response

    steps:
      # Extract text from user's chat message
      - id: extract_question
        type: FieldExtractor
        json_path: "$.blocks[?(@.type == 'text')].content"
        inputs:
          - user_message
        outputs:
          - user_question

      # Search recipe vector index for relevant recipes
      - id: search_recipes
        type: VectorSearch
        index: recipe_index
        default_top_k: 5
        inputs:
          - user_question
        outputs:
          - search_results

      # Build prompt with recipe context
      - id: build_context_prompt
        type: PromptTemplate
        template: |
          You are a helpful cooking assistant with access to a collection of recipes from Chowdown.

          Here are the most relevant recipes based on the user's question:

          {search_results}

          User question: {user_question}

          Please provide a helpful answer based on the recipes above. If you're suggesting a recipe, 
          include key ingredients and brief cooking instructions. If the recipes don't contain 
          relevant information, politely say so and offer general cooking advice if appropriate.
        inputs:
          - search_results
          - user_question
        outputs:
          - context_prompt

      # Generate conversational response using LLM with memory
      - id: generate_response
        type: LLMInference
        model: claude_sonnet
        memory: recipe_chat_memory
        system_message: |
          You are a friendly and knowledgeable cooking assistant. You help users find recipes, 
          answer questions about ingredients, suggest substitutions, and provide cooking tips.
          Base your answers on the provided recipe context, but feel free to add general 
          cooking knowledge when helpful. Be conversational and enthusiastic about food!
        inputs:
          - context_prompt
        outputs:
          - assistant_response

  # Recipe ingestion flow
  - type: Flow
    id: recipe_ingestion
    description: Load recipes from local GitHub clone, chunk, embed, and index

    variables:
      - id: recipe_document
        type: RAGDocument
      - id: recipe_chunk
        type: RAGChunk
      - id: embedded_chunk
        type: RAGChunk

    outputs:
      - embedded_chunk

    steps:
      # Load recipe markdown files from local clone
      - id: load_recipes
        type: DocumentSource
        reader_module: llama_index.core.SimpleDirectoryReader
        args:
          input_dir: "./chowdown/_recipes"
          recursive: false
          required_exts: [".md"]
        outputs:
          - recipe_document

      # Split recipes into chunks for better retrieval
      - id: split_recipes
        type: DocumentSplitter
        splitter_name: "SentenceSplitter"
        chunk_size: 512
        chunk_overlap: 50
        inputs:
          - recipe_document
        outputs:
          - recipe_chunk

      # Generate embeddings for each chunk
      - id: embed_chunks
        type: DocumentEmbedder
        model: titan_embed
        concurrency_config:
          num_workers: 5
        inputs:
          - recipe_chunk
        outputs:
          - embedded_chunk

      # Store embedded chunks in Qdrant
      - id: index_recipes
        type: IndexUpsert
        index: recipe_index
        batch_config:
          batch_size: 25
        inputs:
          - embedded_chunk
        outputs:
          - embedded_chunk

Running the Example

Prerequisites

Start Qdrant vector database locally:

docker run -p 6333:6333 qdrant/qdrant

Clone the recipe repository:

git clone https://github.com/clarklab/chowdown.git

Ingest Recipe Documents

Run the ingestion flow to populate the vector index:

AWS_PROFILE=my_profile qtype run examples/rag/recipe_chatbot.qtype.yaml --flow recipe_ingestion --progress

This will: 1. Load all markdown files from chowdown/_recipes/ 2. Split them into 512-token chunks with 50-token overlap 3. Generate embeddings using AWS Bedrock Titan 4. Store vectors in Qdrant collection chowdown_recipes

You should see the output similar to:

2026-02-04 06:38:06,222 - qtype.commands.run - INFO - Running flow from recipe_chatbot.qtype.yaml
2026-02-04 06:38:06,315 - qtype.commands.run - INFO - Executing flow recipe_ingestion from recipe_chatbot.qtype.yaml
/Users/lou.kratz/repos/qtype-cicd-fix/.venv/lib/python3.13/site-packages/llama_index/vector_stores/qdrant/base.py:238: UserWarning: Api key is used with an insecure connection.
  self._client = qdrant_client.QdrantClient(
/Users/lou.kratz/repos/qtype-cicd-fix/.venv/lib/python3.13/site-packages/llama_index/vector_stores/qdrant/base.py:241: UserWarning: Api key is used with an insecure connection.
  self._aclient = qdrant_client.AsyncQdrantClient(
╭─────────────────────────────────────────────── Flow Progress ────────────────────────────────────────────────╮
│                                                                                                              │
│  Step load_recipes   12.9 msg/s ▁▁▂▄▄▅▅▅▅▄▆▆▆▇▇█▇▇… ✔ 34 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:02    │
│  Step split_recipes  14.9 msg/s ▁▁▁▃▂▅▅▅▆▅▆▆▇▇▇█▇▇… ✔ 39 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:02    │
│  Step embed_chunks   18.7 msg/s ██▃▃▁▂▂▁▂▁▁▁▁▁▁▁▁▁… ✔ 39 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:02    │
│  Step index_recipes  47.0 msg/s ████████▁           ✔ 39 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:00    │
│                                                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
2026-02-04 06:38:11,141 - qtype.commands.run - INFO - ✅ Flow execution completed successfully
2026-02-04 06:38:11,141 - qtype.commands.run - INFO - Processed 39 rows
2026-02-04 06:38:11,141 - qtype.commands.run - INFO - 
Results summary: 39 rows, 1 columns: ['embedded_chunk']

Start the Chatbot

Launch the conversational UI:

AWS_PROFILE=my_profile qtype serve examples/rag/recipe_chatbot.qtype.yaml --flow recipe_chat

Then open http://localhost:8000 and ask questions like: - "What dessert recipes do you have?" - "What can I make with chicken?"

A screenshot the ui showing a user asking for a healthy recipe and the AI responding with bean sprout stir fry

Key Features

  • Conversational Interface: Flow interface type that accumulates messages in conversation_history for stateful multi-turn chat
  • Memory: Conversation buffer with token_limit (10,000) and chat_history_token_ratio (0.7) that auto-flushes oldest messages when limit exceeded
  • DocumentSource: Loads markdown files via LlamaIndex SimpleDirectoryReader with required_exts file filter
  • DocumentSplitter: Splits documents with SentenceSplitter using chunk_size (512) and chunk_overlap (50) parameters
  • DocumentEmbedder: Generates embeddings with AWS Bedrock Titan, processes chunks concurrently via num_workers (5)
  • VectorIndex: Qdrant vector store with embedding_model reference and dimensions (1024)
  • IndexUpsert: Writes to vector index in batches via batch_size (25)
  • VectorSearch: Semantic search with default_top_k (5) returns chunks by embedding distance
  • FieldExtractor: Extracts text from ChatMessage using JSONPath $.blocks[?(@.type == 'text')].content
  • PromptTemplate: Injects search results and query into template string for LLM context
  • LLMInference: Calls model with system_message and memory reference for conversation history
  • RAGDocument: Domain type with content, file_id, file_name, metadata fields
  • RAGChunk: Domain type with content, chunk_id, document_id, vector fields
  • RAGSearchResult: Domain type with content (RAGChunk), doc_id, score fields

Learn More