Skip to content

Read Data from Files

Load structured data from files using FileSource, which supports CSV, JSON, JSONL, and Parquet formats with automatic format detection based on file extension.

QType YAML

steps:
  - id: read_data
    type: FileSource
    path: batch_inputs.csv
    outputs:
      - query
      - topic

Explanation

  • FileSource: Step that reads structured data from files using fsspec-compatible URIs
  • path: File path (relative to YAML file or absolute), supports local files and cloud storage (s3://, gs://, etc.)
  • outputs: Column names from the file to extract as variables (must match actual column names)
  • Format detection: Automatically determined by file extension (.csv, .json, .jsonl, .parquet)
  • Streaming: Emits one FlowMessage per row, enabling downstream steps to process data in parallel

Complete Example

id: read_file_example
description: Read data from a CSV file

models:
  - type: Model
    id: nova
    provider: aws-bedrock
    model_id: amazon.nova-lite-v1:0

flows:
  - type: Flow
    id: process_file_data
    description: Read and process data from a CSV file

    variables:
      - id: query
        type: text
      - id: topic
        type: text
      - id: prompt
        type: text
      - id: answer
        type: text

    inputs: []

    outputs:
      - query
      - topic
      - answer

    steps:
      - id: read_data
        type: FileSource
        path:
          uri: examples/data_processing/batch_inputs.csv
        outputs:
          - query
          - topic

      - id: create_prompt
        type: PromptTemplate
        template: |
          Topic: {topic}
          Question: {query}

          Provide a concise answer:
        inputs:
          - query
          - topic
        outputs:
          - prompt

      - id: generate_answer
        type: LLMInference
        model: nova
        inputs:
          - prompt
        outputs:
          - answer

See Also