Skip to content

LLMInference

LLMInference is a step that performs direct language model inference, sending prompts to AI models and capturing their responses. It provides the core interface for integrating large language models into QType workflows, supporting both simple text generation and complex conversational interactions.

LLMInference steps can maintain conversation context through memory, apply system prompts for role-setting, and process inputs/outputs concurrently when configured.

Key Principles

Explicit Variable Declaration

All inputs and outputs must be declared in the flow's variables section and referenced by ID:

flows:
  - type: Flow
    id: my_flow
    variables:
      - id: user_prompt
        type: text
      - id: ai_response
        type: text
    steps:
      - type: LLMInference
        id: llm_step
        model: gpt4
        inputs:
          - user_prompt  # References declared variable
        outputs:
          - ai_response

Model Reference by ID

The model field references a model by its ID (as a string):

models:
  - type: Model
    id: gpt4
    provider: openai

flows:
  - steps:
      - type: LLMInference
        model: gpt4  # String reference to model ID

Rules and Behaviors

  • Required Model: The model field is mandatory and must reference a model ID defined in the application.
  • Required Variables: All inputs and outputs must be declared in the flow's variables section.
  • Memory Integration: Can optionally reference a Memory object by ID to maintain conversation history and context.
  • System Message: Optional system_message field sets the AI's role and behavior context.
  • Concurrency Support: Supports concurrency_config for processing multiple inputs concurrently.

LLMInference

Defines a step that performs inference using a language model. It can take input variables and produce output variables based on the model's response.

  • type (Literal): (No documentation available.)
  • memory (Reference[Memory] | str | None): A reference to a Memory object to retain context across interactions.
  • model (Reference[Model] | str): The model to use for inference.
  • system_message (str | None): Optional system message to set the context for the model.

LLMInference steps require Model configurations, may use Memory for context retention, often consume output from PromptTemplate steps, and are extended by Agent steps for tool-enabled interactions.

Example Usage