Retrieval Augmented Generation Chatbot¶
Overview¶
A complete RAG (Retrieval Augmented Generation) chatbot that answers cooking questions using a recipe collection from GitHub. The system ingests markdown recipe files, splits them into chunks, generates embeddings, stores them in a vector database, and provides conversational search with context-aware responses using memory to maintain conversation history.
Architecture¶
flowchart TD
subgraph APP ["📱 recipe_rag_chatbot"]
direction TB
subgraph FLOW_0 ["🔄 recipe_chat"]
direction LR
FLOW_0_START@{shape: circle, label: "▶️ Start"}
FLOW_0_S0@{shape: rect, label: "⚙️ extract_question"}
FLOW_0_S1@{shape: cyl, label: "🔎 search_recipes"}
FLOW_0_S2@{shape: doc, label: "📄 build_context_prompt"}
FLOW_0_S3@{shape: rounded, label: "✨ generate_response"}
FLOW_0_START -->|user_message| FLOW_0_S0
FLOW_0_S0 -->|user_question| FLOW_0_S1
FLOW_0_S1 -->|search_results| FLOW_0_S2
FLOW_0_S0 -->|user_question| FLOW_0_S2
FLOW_0_S2 -->|context_prompt| FLOW_0_S3
end
subgraph FLOW_1 ["🔄 recipe_ingestion"]
direction TB
FLOW_1_S0@{shape: rect, label: "⚙️ load_recipes"}
FLOW_1_S1@{shape: rect, label: "⚙️ split_recipes"}
FLOW_1_S2@{shape: rect, label: "⚙️ embed_chunks"}
FLOW_1_S3@{shape: rect, label: "💾 index_recipes"}
FLOW_1_S0 -->|recipe_document| FLOW_1_S1
FLOW_1_S1 -->|recipe_chunk| FLOW_1_S2
FLOW_1_S2 -->|embedded_chunk| FLOW_1_S3
end
subgraph RESOURCES ["🔧 Shared Resources"]
direction LR
AUTH_AWS_AUTH@{shape: hex, label: "🔐 aws_auth (AWS)"}
MODEL_CLAUDE_SONNET@{shape: rounded, label: "✨ claude_sonnet (aws-bedrock)" }
MODEL_CLAUDE_SONNET -.->|uses| AUTH_AWS_AUTH
MODEL_TITAN_EMBED@{shape: rounded, label: "✨ titan_embed (aws-bedrock)" }
MODEL_TITAN_EMBED -.->|uses| AUTH_AWS_AUTH
INDEX_RECIPE_INDEX@{shape: cyl, label: "�️ recipe_index"}
EMB_TITAN_EMBED@{shape: rounded, label: "🎯 titan_embed"}
INDEX_RECIPE_INDEX -.->|embeds| EMB_TITAN_EMBED
MEM_RECIPE_CHAT_MEMORY@{shape: win-pane, label: "🧠 recipe_chat_memory (10KT)"}
end
end
FLOW_0_S1 -.-> INDEX_RECIPE_INDEX
FLOW_0_S3 -.->|uses| MODEL_CLAUDE_SONNET
FLOW_0_S3 -.->|stores| MEM_RECIPE_CHAT_MEMORY
FLOW_1_S3 -.->|writes| INDEX_RECIPE_INDEX
%% Styling
classDef appBox fill:none,stroke:#495057,stroke-width:3px
classDef flowBox fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
classDef llmNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef modelNode fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
classDef authNode fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
classDef telemetryNode fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef resourceBox fill:#f5f5f5,stroke:#616161,stroke-width:1px
class APP appBox
class FLOW_0 flowBox
class RESOURCES resourceBox
class TELEMETRY telemetryNode
Complete Code¶
id: recipe_rag_chatbot
description: |
RAG chatbot for the Chowdown recipe collection from GitHub.
This application provides two flows:
1. recipe_chat: Conversational chatbot that answers questions about recipes
- Uses RAG to find relevant recipes based on user questions
- Maintains conversation history with memory
- Provides cooking advice, recipe recommendations, and ingredient information
2. recipe_ingestion: Ingests recipe markdown files into vector database
- Clones/fetches recipes from GitHub (clarklab/chowdown)
- Splits recipe documents into searchable chunks
- Generates embeddings using AWS Bedrock Titan
- Stores in Qdrant vector database for fast similarity search
Prerequisites:
- AWS credentials configured (AWS_PROFILE environment variable)
- Qdrant running locally on port 6333 (or update args for Qdrant Cloud)
- Clone the recipe repo: git clone https://github.com/clarklab/chowdown.git
To ingest recipes:
qtype run recipe_chatbot.qtype.yaml --flow recipe_ingestion
To start chatbot:
qtype serve recipe_chatbot.qtype.yaml --flow recipe_chat
# AWS Authentication for Bedrock
auths:
- type: aws
id: aws_auth
profile_name: ${AWS_PROFILE}
# Models
models:
# Embedding model for vectorizing recipes and queries
- type: EmbeddingModel
id: titan_embed
provider: aws-bedrock
model_id: amazon.titan-embed-text-v2:0
dimensions: 1024
auth: aws_auth
# Chat model for conversational responses
- type: Model
id: claude_sonnet
provider: aws-bedrock
model_id: us.anthropic.claude-3-5-sonnet-20241022-v2:0
inference_params:
temperature: 0.7
max_tokens: 4096
auth: aws_auth
# Vector index for recipe embeddings
indexes:
- type: VectorIndex
module: llama_index.vector_stores.qdrant.QdrantVectorStore
id: recipe_index
name: chowdown_recipes
embedding_model: titan_embed
args:
collection_name: chowdown_recipes
url: http://localhost:6333
api_key: "" # Empty for local Qdrant
# Memory for maintaining conversation context
memories:
- id: recipe_chat_memory
token_limit: 10000
chat_history_token_ratio: 0.7
# Flows
flows:
# Conversational chatbot flow
- type: Flow
id: recipe_chat
description: Chat with the recipe collection using RAG
interface:
type: Conversational
variables:
- id: user_message
type: ChatMessage
- id: user_question
type: text
- id: search_results
type: list[RAGSearchResult]
- id: context_prompt
type: text
- id: assistant_response
type: ChatMessage
inputs:
- user_message
outputs:
- assistant_response
steps:
# Extract text from user's chat message
- id: extract_question
type: FieldExtractor
json_path: "$.blocks[?(@.type == 'text')].content"
inputs:
- user_message
outputs:
- user_question
# Search recipe vector index for relevant recipes
- id: search_recipes
type: VectorSearch
index: recipe_index
default_top_k: 5
inputs:
- user_question
outputs:
- search_results
# Build prompt with recipe context
- id: build_context_prompt
type: PromptTemplate
template: |
You are a helpful cooking assistant with access to a collection of recipes from Chowdown.
Here are the most relevant recipes based on the user's question:
{search_results}
User question: {user_question}
Please provide a helpful answer based on the recipes above. If you're suggesting a recipe,
include key ingredients and brief cooking instructions. If the recipes don't contain
relevant information, politely say so and offer general cooking advice if appropriate.
inputs:
- search_results
- user_question
outputs:
- context_prompt
# Generate conversational response using LLM with memory
- id: generate_response
type: LLMInference
model: claude_sonnet
memory: recipe_chat_memory
system_message: |
You are a friendly and knowledgeable cooking assistant. You help users find recipes,
answer questions about ingredients, suggest substitutions, and provide cooking tips.
Base your answers on the provided recipe context, but feel free to add general
cooking knowledge when helpful. Be conversational and enthusiastic about food!
inputs:
- context_prompt
outputs:
- assistant_response
# Recipe ingestion flow
- type: Flow
id: recipe_ingestion
description: Load recipes from local GitHub clone, chunk, embed, and index
variables:
- id: recipe_document
type: RAGDocument
- id: recipe_chunk
type: RAGChunk
- id: embedded_chunk
type: RAGChunk
outputs:
- embedded_chunk
steps:
# Load recipe markdown files from local clone
- id: load_recipes
type: DocumentSource
reader_module: llama_index.core.SimpleDirectoryReader
args:
input_dir: "./chowdown/_recipes"
recursive: false
required_exts: [".md"]
outputs:
- recipe_document
# Split recipes into chunks for better retrieval
- id: split_recipes
type: DocumentSplitter
splitter_name: "SentenceSplitter"
chunk_size: 512
chunk_overlap: 50
inputs:
- recipe_document
outputs:
- recipe_chunk
# Generate embeddings for each chunk
- id: embed_chunks
type: DocumentEmbedder
model: titan_embed
concurrency_config:
num_workers: 5
inputs:
- recipe_chunk
outputs:
- embedded_chunk
# Store embedded chunks in Qdrant
- id: index_recipes
type: IndexUpsert
index: recipe_index
batch_config:
batch_size: 25
inputs:
- embedded_chunk
outputs:
- embedded_chunk
Running the Example¶
Prerequisites¶
Start Qdrant vector database locally:
Clone the recipe repository:
Ingest Recipe Documents¶
Run the ingestion flow to populate the vector index:
AWS_PROFILE=my_profile qtype run examples/rag/recipe_chatbot.qtype.yaml --flow recipe_ingestion --progress
This will:
1. Load all markdown files from chowdown/_recipes/
2. Split them into 512-token chunks with 50-token overlap
3. Generate embeddings using AWS Bedrock Titan
4. Store vectors in Qdrant collection chowdown_recipes
You should see the output similar to:
2026-02-04 06:38:06,222 - qtype.commands.run - INFO - Running flow from recipe_chatbot.qtype.yaml
2026-02-04 06:38:06,315 - qtype.commands.run - INFO - Executing flow recipe_ingestion from recipe_chatbot.qtype.yaml
/Users/lou.kratz/repos/qtype-cicd-fix/.venv/lib/python3.13/site-packages/llama_index/vector_stores/qdrant/base.py:238: UserWarning: Api key is used with an insecure connection.
self._client = qdrant_client.QdrantClient(
/Users/lou.kratz/repos/qtype-cicd-fix/.venv/lib/python3.13/site-packages/llama_index/vector_stores/qdrant/base.py:241: UserWarning: Api key is used with an insecure connection.
self._aclient = qdrant_client.AsyncQdrantClient(
╭─────────────────────────────────────────────── Flow Progress ────────────────────────────────────────────────╮
│ │
│ Step load_recipes 12.9 msg/s ▁▁▂▄▄▅▅▅▅▄▆▆▆▇▇█▇▇… ✔ 34 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:02 │
│ Step split_recipes 14.9 msg/s ▁▁▁▃▂▅▅▅▆▅▆▆▇▇▇█▇▇… ✔ 39 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:02 │
│ Step embed_chunks 18.7 msg/s ██▃▃▁▂▂▁▂▁▁▁▁▁▁▁▁▁… ✔ 39 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:02 │
│ Step index_recipes 47.0 msg/s ████████▁ ✔ 39 succeeded ✖ 0 errors ⟳ - hits ✗ - misses 0:00:00 │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
2026-02-04 06:38:11,141 - qtype.commands.run - INFO - ✅ Flow execution completed successfully
2026-02-04 06:38:11,141 - qtype.commands.run - INFO - Processed 39 rows
2026-02-04 06:38:11,141 - qtype.commands.run - INFO -
Results summary: 39 rows, 1 columns: ['embedded_chunk']
Start the Chatbot¶
Launch the conversational UI:
Then open http://localhost:8000 and ask questions like: - "What dessert recipes do you have?" - "What can I make with chicken?"

Key Features¶
- Conversational Interface: Flow interface type that accumulates messages in
conversation_historyfor stateful multi-turn chat - Memory: Conversation buffer with
token_limit(10,000) andchat_history_token_ratio(0.7) that auto-flushes oldest messages when limit exceeded - DocumentSource: Loads markdown files via LlamaIndex
SimpleDirectoryReaderwithrequired_extsfile filter - DocumentSplitter: Splits documents with
SentenceSplitterusingchunk_size(512) andchunk_overlap(50) parameters - DocumentEmbedder: Generates embeddings with AWS Bedrock Titan, processes chunks concurrently via
num_workers(5) - VectorIndex: Qdrant vector store with
embedding_modelreference and dimensions (1024) - IndexUpsert: Writes to vector index in batches via
batch_size(25) - VectorSearch: Semantic search with
default_top_k(5) returns chunks by embedding distance - FieldExtractor: Extracts text from ChatMessage using JSONPath
$.blocks[?(@.type == 'text')].content - PromptTemplate: Injects search results and query into template string for LLM context
- LLMInference: Calls model with
system_messageandmemoryreference for conversation history - RAGDocument: Domain type with
content,file_id,file_name,metadatafields - RAGChunk: Domain type with
content,chunk_id,document_id,vectorfields - RAGSearchResult: Domain type with
content(RAGChunk),doc_id,scorefields
Learn More¶
- Tutorial: Building a Stateful Chatbot
- How-To: Use Environment Variables
- How-To: Configure AWS Authentication