Read Data from Files¶
Load structured data from files using FileSource, which supports CSV, JSON, JSONL, and Parquet formats with automatic format detection based on file extension.
QType YAML¶
Explanation¶
- FileSource: Step that reads structured data from files using fsspec-compatible URIs
- path: File path (relative to YAML file or absolute), supports local files and cloud storage (s3://, gs://, etc.)
- outputs: Column names from the file to extract as variables (must match actual column names)
- Format detection: Automatically determined by file extension (.csv, .json, .jsonl, .parquet)
- Streaming: Emits one FlowMessage per row, enabling downstream steps to process data in parallel
Complete Example¶
id: read_file_example
description: Read data from a CSV file
models:
- type: Model
id: nova
provider: aws-bedrock
model_id: amazon.nova-lite-v1:0
flows:
- type: Flow
id: process_file_data
description: Read and process data from a CSV file
variables:
- id: query
type: text
- id: topic
type: text
- id: prompt
type: text
- id: answer
type: text
inputs: []
outputs:
- query
- topic
- answer
steps:
- id: read_data
type: FileSource
path:
uri: examples/data_processing/batch_inputs.csv
outputs:
- query
- topic
- id: create_prompt
type: PromptTemplate
template: |
Topic: {topic}
Question: {query}
Provide a concise answer:
inputs:
- query
- topic
outputs:
- prompt
- id: generate_answer
type: LLMInference
model: nova
inputs:
- prompt
outputs:
- answer