Build a Conversational Chatbot¶

Time: 20 minutes
Prerequisites: Tutorial 1: Build Your First QType Application
Example: hello_world_chat.qtype.yaml

What you'll learn: Add memory to your QType application and create a chatbot that remembers previous messages in the conversation.

What you'll build: A stateful chatbot that maintains conversation history and provides contextual responses.

Part 1: Stateless vs. Stateful Applications (3 minutes)¶

In Build Your First QType Application, you built a stateless application - it processed each question independently with no memory of previous interactions:

You: What is 2+2?
AI: 4.

You: What about that times 3?
AI: I don't know what "that" refers to.  ❌

Today you'll build a stateful chatbot that remembers the conversation:

You: What is 2+2?
AI: 4.

You: What about that times 3?  
AI: 12. I multiplied the previous answer (4) by 3.  ✅

This requires two new concepts: Memory and Conversational Interface.

Flow Interfaces: Complete vs Conversational¶

QType flows have two interface types that control how they process requests:

Complete Interface (from previous tutorial)¶

Default behavior - You don't need to specify it
Processes one request → one response
No memory between requests
Each request is independent
Like a REST API call or function call

Example use cases:

Simple Q&A
Data transformation
Single-step calculations

In YAML (optional to specify):

flows:

- type: Flow
    id: simple_flow
    interface:
      type: Complete  # Optional - this is the default

Conversational Interface (This Tutorial)¶

Explicit configuration - You must specify it
Maintains conversation history
Tracks message roles (user/assistant)
Perfect for back-and-forth interaction

Example use cases:

Chatbots
Virtual assistants
Multi-turn dialogues

In YAML (required):

flows:
  - type: Flow
    id: chat_flow
    interface:
      type: Conversational  # Required for conversation memory

Let's compare the interfaces:

| Feature | Complete Interface | Conversational Interface |

Key Rule: Memory only works with Conversational interface. If your flow uses memory, it must declare interface.type: Conversational.

Part 1: Add Memory to Your Application (5 minutes)¶

Create Your Chatbot File¶

Create a new file called my_chatbot.qtype.yaml. Start by copying your application structure from the previous tutorial:

id: my_chatbot
description: A conversational chatbot with memory

models:

- type: Model
    id: gpt-4
    provider: openai
    model_id: gpt-4-turbo
    inference_params:
      temperature: 0.7

What's different: We changed the id and description to reflect that this is a chatbot.

Add Memory Configuration¶

Now add a memory configuration before the flows: section:

memories:

- id: chat_memory
    token_limit: 50000
    chat_history_token_ratio: 0.7

What this means:

memories: - Section for memory configurations (new concept!)
id: chat_memory - A nickname you'll use to reference this memory
token_limit: 50000 - Maximum total tokens (includes conversation + system messages)
chat_history_token_ratio: 0.7 - Reserve 70% of tokens for conversation history

Why tokens matter:
LLMs have a maximum context window (how much text they can "see" at once). GPT-4-turbo has a 128k token limit, but we're using 50k here for cost efficiency. The chat_history_token_ratio ensures the AI always has room to see enough conversation history while leaving space for its response.

Check your work:

Save the file
Validate: qtype validate my_chatbot.qtype.yaml
Should pass ✅ (even though we haven't added flows yet)

Part 2: Create a Conversational Flow (7 minutes)¶

Set Up the Conversational Flow¶

Add this flow definition:

flows:

- type: Flow
    id: chat_flow
    description: Main chat flow with conversation memory
    interface:
      type: Conversational
    variables:

- id: user_message
        type: ChatMessage
      - id: response_message
        type: ChatMessage
    inputs:

- user_message
    outputs:

- response_message

New concepts explained:

interface.type: Conversational - This is the key difference from the previous Complete interface!

Tells QType this flow maintains conversation state
Automatically manages message history
Required when using memory in LLMInference steps

ChatMessage type - A special domain type for chat applications

Represents a single message in a conversation
Contains structured blocks (text, images, files, etc.) and metadata
Different from the simple text type used in stateless applications

ChatMessage Structure:

ChatMessage:
  blocks:
    - type: text
      content: "Hello, how can I help?"
    - type: image
      url: "https://example.com/image.jpg"
  role: assistant  # or 'user', 'system'
  metadata:
    timestamp: "2025-11-08T10:30:00Z"

The blocks list allows multimodal messages (text + images + files), while role indicates who sent the message. QType automatically handles this structure when managing conversation history.

Why two variables?

user_message - What the user types
response_message - What the AI responds
QType tracks both in memory for context

Check your work:

Validate: qtype validate my_chatbot.qtype.yaml
Should still pass ✅

Add the Chat Step¶

Add the LLM inference step that connects to your memory:

    steps:

- type: LLMInference
        id: chat_step
        model: gpt-4
        memory: chat_memory
        system_message: "You are a helpful assistant. Be friendly and conversational."
        inputs:

- user_message
        outputs:

- response_message

What's new:

memory: chat_memory - Links this step to the memory configuration - Automatically sends conversation history with each request - Updates memory after each exchange - This line is what enables "remembering" previous messages

system_message with personality - Unlike the previous generic message, this shapes the AI's behavior for conversation

Check your work:

Validate: qtype validate my_chatbot.qtype.yaml
Should pass ✅

Part 3: Set Up and Test (8 minutes)¶

Configure Authentication¶

Create .env in the same folder (or update your existing one):

OPENAI_API_KEY=sk-your-key-here

Already using AWS Bedrock? Replace the model configuration with:

models:

- type: Model
    id: claude
    provider: aws-bedrock
    model_id: amazon.nova-lite-v1:0
    inference_params:
      temperature: 0.7

And update the step to use model: claude.

Start the Chat Interface¶

Unlike the previous tutorial where you used qtype run for one-off questions, conversational applications work better with the web interface:

qtype serve my_chatbot.qtype.yaml

What you'll see:

INFO:     Started server process
INFO:     Uvicorn running on http://127.0.0.1:8000

Visit: http://localhost:8000/ui

You should see a chat interface with your application name at the top.

Test Conversation Memory¶

Try this conversation to see memory in action:

You: My name is Alex and I love pizza.
AI: Nice to meet you, Alex! Pizza is delicious...

You: What's my name?
AI: Your name is Alex!  ✅

You: What food do I like?
AI: You mentioned you love pizza!  ✅

Experiment:

Refresh the page - memory resets (new session)
Try a multi-step math problem:
"Remember the number 42"
"Now multiply that by 2"
Does it remember 42?

Part 4: Understanding What's Happening (Bonus)¶

The Memory Lifecycle¶

Here's what happens when you send a message:

User: "What's my name?"
  ↓
QType: Get conversation history from memory
  ↓
Memory: Returns previous messages (including "My name is Alex")
  ↓
QType: Combines system message + history + new question
  ↓
LLM: Processes full context → "Your name is Alex!"
  ↓
QType: Saves new exchange to memory
  ↓
User: Sees response

Key insight: The LLM itself has no memory - QType handles this by:

Storing all previous messages
Sending relevant history with each new question
Managing token limits automatically

Why Token Management Matters¶

Your chat_history_token_ratio: 0.7 setting means:

70% of tokens → Conversation history (up to 35,000 tokens with our 50k limit)
30% of tokens → System message + AI response (15,000 tokens)

If the conversation gets too long, QType automatically:

Keeps recent messages
Drops older messages
Ensures the AI always has enough tokens to respond

Try it: Have a very long conversation (50+ exchanges). Notice how the AI forgets early messages but remembers recent context.

What You've Learned¶

Congratulations! You've mastered:

✅ Memory configuration - Storing conversation state
✅ Conversational flows - Multi-turn interactions
✅ ChatMessage type - Domain-specific data types
✅ Token management - Controlling context window usage
✅ Web interface - Using qtype serve for chat applications

Compare: Complete vs Conversational Interfaces¶

Feature	Complete Interface	Conversational Interface
Interface	`Complete` (default)	`Conversational` (explicit)
Memory	None	`chat_memory` configuration
Variable Types	`text` (primitive)	`ChatMessage` (domain type)
Testing	`qtype run` (command line)	`qtype serve` (web UI)
Use Case	One-off questions	Multi-turn conversations

Next Steps¶

Reference the complete example:

hello_world_chat.qtype.yaml - Full working example

Learn more:

Memory Concept - Advanced memory strategies
ChatMessage Reference - Full type specification
Flow Interfaces - Complete vs Conversational

Common Questions¶

Q: Why do I need ChatMessage instead of text?
A: ChatMessage includes metadata (role, attachments) that QType uses to properly format conversation history for the LLM. The text type is for simple strings without this context.

Q: Can I have multiple memory configurations?
A: Yes! You can define multiple memories in the memories: section and reference different ones in different flows or steps.

Q: Can I use memory with the Complete interface?
A: No - memory only works with Conversational interface. Complete flows are stateless by design. If you need to remember information between requests, you must use the Conversational interface.

Q: When should I use Complete vs Conversational?
A: Use Complete for independent requests (data transformation, single questions, API-like behavior). Use Conversational when you need context from previous interactions (chatbots, assistants, multi-step conversations).

Q: How do I clear memory during a conversation?
A: Currently, you need to start a new session (refresh the page in the UI). Programmatic memory clearing is planned for a future release.