Memory
Memory in QType provides persistent storage for conversation history and contextual state data across multiple steps or conversation turns. It enables applications to maintain context between interactions, allowing for more coherent and context-aware conversations in chatbots, agents, and multi-turn workflows.
Memory is particularly useful for maintaining chat history, storing previous outputs, and preserving important context that should persist across different steps in a flow or across multiple invocations of an application.
Rules and Behaviors
- Unique IDs: Each memory block must have a unique
id
within the application. Duplicate memory IDs will result in a validation error. - Token Management: Memory automatically manages token limits to prevent exceeding model context windows. When the token limit is reached, older content is flushed based on the
token_flush_size
. - Chat History Ratio: The
chat_history_token_ratio
determines what portion of the total memory should be reserved for chat history versus other contextual data. - Default Values: Memory has sensible defaults - 100,000 token limit, 70% chat history ratio, and 3,000 token flush size.
- Reference by Steps: Memory can be referenced by
LLMInference
andAgent
steps to maintain context across interactions.
Memory
Session or persistent memory used to store relevant conversation or state data across steps or turns.
- id (
str
): Unique ID of the memory block. - token_limit (
int
): Maximum number of tokens to store in memory. - chat_history_token_ratio (
float
): Ratio of chat history tokens to total memory tokens. - token_flush_size (
int
): Size of memory to flush when it exceeds the token limit.
Related Concepts
Memory is primarily used by LLM-based steps like LLMInference and Agent to maintain conversational context.