Skip to content

Memory

Memory in QType provides persistent storage for conversation history and contextual state data across multiple steps or conversation turns. It enables applications to maintain context between interactions, allowing for more coherent and context-aware conversations in chatbots, agents, and multi-turn workflows.

Memory configurations are defined at the application level and referenced by steps that need to maintain state.

Key Principles

Centralized Definition

Memory objects are defined once at the application level and can be shared across multiple steps:

id: my_app
memories:
  - id: chat_memory
    token_limit: 50000
    chat_history_token_ratio: 0.7

flows:
  - type: Flow
    id: chat_flow
    steps:
      - type: LLMInference
        model: gpt4
        memory: chat_memory  # References by ID

Reference by ID

Steps reference memory configurations by their ID (as a string), not by embedding the memory object inline.

Rules and Behaviors

  • Unique IDs: Each memory block must have a unique id within the application. Duplicate memory IDs will result in a validation error.
  • Token Management: Memory automatically manages token limits to prevent exceeding model context windows. When the token limit is reached, older content is flushed based on the token_flush_size.
  • Chat History Ratio: The chat_history_token_ratio determines what portion of the total memory should be reserved for chat history versus other contextual data.
  • Default Values: Memory has sensible defaults - 100,000 token limit, 70% chat history ratio, and 3,000 token flush size.
  • Shared Memory: Multiple steps can reference the same memory ID to share conversational context.

Memory

Session or persistent memory used to store relevant conversation or state data across steps or turns.

  • id (str): Unique ID of the memory block.
  • token_limit (int): Maximum number of tokens to store in memory.
  • chat_history_token_ratio (float): Ratio of chat history tokens to total memory tokens.
  • token_flush_size (int): Size of memory to flush when it exceeds the token limit.

Memory is primarily used by LLM-based steps like LLMInference and Agent to maintain conversational context.

Example Usage