Build a Conversational Chatbot¶
Time: 20 minutes
Prerequisites: Tutorial 1: Build Your First QType Application
Example: hello_world_chat.qtype.yaml
What you'll learn: Add memory to your QType application and create a chatbot that remembers previous messages in the conversation.
What you'll build: A stateful chatbot that maintains conversation history and provides contextual responses.
Part 1: Stateless vs. Stateful Applications (3 minutes)¶
In Build Your First QType Application, you built a stateless application - it processed each question independently with no memory of previous interactions:
Today you'll build a stateful chatbot that remembers the conversation:
You: What is 2+2?
AI: 4.
You: What about that times 3?
AI: 12. I multiplied the previous answer (4) by 3. ✅
This requires two new concepts: Memory and Conversational Interface.
Flow Interfaces: Complete vs Conversational¶
QType flows have two interface types that control how they process requests:
Complete Interface (from previous tutorial)¶
- Default behavior - You don't need to specify it
- Processes one request → one response
- No memory between requests
- Each request is independent
- Like a REST API call or function call
Example use cases:
- Simple Q&A
- Data transformation
- Single-step calculations
In YAML (optional to specify):
Conversational Interface (This Tutorial)¶
- Explicit configuration - You must specify it
- Maintains conversation history
- Tracks message roles (user/assistant)
- Perfect for back-and-forth interaction
Example use cases:
- Chatbots
- Virtual assistants
- Multi-turn dialogues
In YAML (required):
flows:
- type: Flow
id: chat_flow
interface:
type: Conversational # Required for conversation memory
Let's compare the interfaces:
| Feature | Complete Interface | Conversational Interface |
Key Rule: Memory only works with Conversational interface. If your flow uses memory, it must declare interface.type: Conversational.
Part 1: Add Memory to Your Application (5 minutes)¶
Create Your Chatbot File¶
Create a new file called my_chatbot.qtype.yaml. Start by copying your application structure from the previous tutorial:
id: my_chatbot
description: A conversational chatbot with memory
models:
- type: Model
id: gpt-4
provider: openai
model_id: gpt-4-turbo
inference_params:
temperature: 0.7
What's different: We changed the id and description to reflect that this is a chatbot.
Add Memory Configuration¶
Now add a memory configuration before the flows: section:
What this means:
memories:- Section for memory configurations (new concept!)id: chat_memory- A nickname you'll use to reference this memorytoken_limit: 50000- Maximum total tokens (includes conversation + system messages)chat_history_token_ratio: 0.7- Reserve 70% of tokens for conversation history
Why tokens matter:
LLMs have a maximum context window (how much text they can "see" at once). GPT-4-turbo has a 128k token limit, but we're using 50k here for cost efficiency. The chat_history_token_ratio ensures the AI always has room to see enough conversation history while leaving space for its response.
Check your work:
- Save the file
- Validate:
qtype validate my_chatbot.qtype.yaml - Should pass ✅ (even though we haven't added flows yet)
Part 2: Create a Conversational Flow (7 minutes)¶
Set Up the Conversational Flow¶
Add this flow definition:
flows:
- type: Flow
id: chat_flow
description: Main chat flow with conversation memory
interface:
type: Conversational
variables:
- id: user_message
type: ChatMessage
- id: response_message
type: ChatMessage
inputs:
- user_message
outputs:
- response_message
New concepts explained:
interface.type: Conversational - This is the key difference from the previous Complete interface!
- Tells QType this flow maintains conversation state
- Automatically manages message history
- Required when using memory in LLMInference steps
ChatMessage type - A special domain type for chat applications
- Represents a single message in a conversation
- Contains structured blocks (text, images, files, etc.) and metadata
- Different from the simple
texttype used in stateless applications
ChatMessage Structure:
ChatMessage:
blocks:
- type: text
content: "Hello, how can I help?"
- type: image
url: "https://example.com/image.jpg"
role: assistant # or 'user', 'system'
metadata:
timestamp: "2025-11-08T10:30:00Z"
The blocks list allows multimodal messages (text + images + files), while role indicates who sent the message. QType automatically handles this structure when managing conversation history.
Why two variables?
user_message- What the user typesresponse_message- What the AI responds- QType tracks both in memory for context
Check your work:
- Validate:
qtype validate my_chatbot.qtype.yaml - Should still pass ✅
Add the Chat Step¶
Add the LLM inference step that connects to your memory:
steps:
- type: LLMInference
id: chat_step
model: gpt-4
memory: chat_memory
system_message: "You are a helpful assistant. Be friendly and conversational."
inputs:
- user_message
outputs:
- response_message
What's new:
memory: chat_memory - Links this step to the memory configuration
- Automatically sends conversation history with each request
- Updates memory after each exchange
- This line is what enables "remembering" previous messages
system_message with personality - Unlike the previous generic message, this shapes the AI's behavior for conversation
Check your work:
- Validate:
qtype validate my_chatbot.qtype.yaml - Should pass ✅
Part 3: Set Up and Test (8 minutes)¶
Configure Authentication¶
Create .env in the same folder (or update your existing one):
Already using AWS Bedrock? Replace the model configuration with:
models:
- type: Model
id: claude
provider: aws-bedrock
model_id: amazon.nova-lite-v1:0
inference_params:
temperature: 0.7
And update the step to use model: claude.
Start the Chat Interface¶
Unlike the previous tutorial where you used qtype run for one-off questions, conversational applications work better with the web interface:
What you'll see:
Visit: http://localhost:8000/ui
You should see a chat interface with your application name at the top.
Test Conversation Memory¶
Try this conversation to see memory in action:
You: My name is Alex and I love pizza.
AI: Nice to meet you, Alex! Pizza is delicious...
You: What's my name?
AI: Your name is Alex! ✅
You: What food do I like?
AI: You mentioned you love pizza! ✅
Experiment:
- Refresh the page - memory resets (new session)
-
Try a multi-step math problem:
-
"Remember the number 42"
- "Now multiply that by 2"
- Does it remember 42?
Part 4: Understanding What's Happening (Bonus)¶
The Memory Lifecycle¶
Here's what happens when you send a message:
User: "What's my name?"
↓
QType: Get conversation history from memory
↓
Memory: Returns previous messages (including "My name is Alex")
↓
QType: Combines system message + history + new question
↓
LLM: Processes full context → "Your name is Alex!"
↓
QType: Saves new exchange to memory
↓
User: Sees response
Key insight: The LLM itself has no memory - QType handles this by:
- Storing all previous messages
- Sending relevant history with each new question
- Managing token limits automatically
Why Token Management Matters¶
Your chat_history_token_ratio: 0.7 setting means:
- 70% of tokens → Conversation history (up to 35,000 tokens with our 50k limit)
- 30% of tokens → System message + AI response (15,000 tokens)
If the conversation gets too long, QType automatically:
- Keeps recent messages
- Drops older messages
- Ensures the AI always has enough tokens to respond
Try it: Have a very long conversation (50+ exchanges). Notice how the AI forgets early messages but remembers recent context.
What You've Learned¶
Congratulations! You've mastered:
✅ Memory configuration - Storing conversation state
✅ Conversational flows - Multi-turn interactions
✅ ChatMessage type - Domain-specific data types
✅ Token management - Controlling context window usage
✅ Web interface - Using qtype serve for chat applications
Compare: Complete vs Conversational Interfaces¶
| Feature | Complete Interface | Conversational Interface |
|---|---|---|
| Interface | Complete (default) |
Conversational (explicit) |
| Memory | None | chat_memory configuration |
| Variable Types | text (primitive) |
ChatMessage (domain type) |
| Testing | qtype run (command line) |
qtype serve (web UI) |
| Use Case | One-off questions | Multi-turn conversations |
Next Steps¶
Reference the complete example:
hello_world_chat.qtype.yaml- Full working example
Learn more:
- Memory Concept - Advanced memory strategies
- ChatMessage Reference - Full type specification
- Flow Interfaces - Complete vs Conversational
Common Questions¶
Q: Why do I need ChatMessage instead of text?
A: ChatMessage includes metadata (role, attachments) that QType uses to properly format conversation history for the LLM. The text type is for simple strings without this context.
Q: Can I have multiple memory configurations?
A: Yes! You can define multiple memories in the memories: section and reference different ones in different flows or steps.
Q: Can I use memory with the Complete interface?
A: No - memory only works with Conversational interface. Complete flows are stateless by design. If you need to remember information between requests, you must use the Conversational interface.
Q: When should I use Complete vs Conversational?
A: Use Complete for independent requests (data transformation, single questions, API-like behavior). Use Conversational when you need context from previous interactions (chatbots, assistants, multi-step conversations).
Q: How do I clear memory during a conversation?
A: Currently, you need to start a new session (refresh the page in the UI). Programmatic memory clearing is planned for a future release.