Load Documents¶
Load documents from files, directories, or external systems using LlamaIndex readers with DocumentSource.
Note: DocumentSource is a source step that generates data independently, so flows using it typically require no inputs.
QType YAML¶
steps:
- type: DocumentSource
id: load_docs
reader_module: llama_index.core.SimpleDirectoryReader
args:
input_dir: ./data
required_exts: [".md", ".txt"]
recursive: true
loader_args:
num_workers: 4
outputs:
- document
Explanation¶
- reader_module: Python module path to a class that inherits from
llama_index.core.readers.base.BaseReader(most common:llama_index.core.SimpleDirectoryReader) - args: Arguments passed to the reader class constructor (e.g.,
input_dir,required_exts,recursive,file_extractor) - loader_args: Arguments passed to the reader's
load_data()method (e.g.,num_workersfor parallel processing) - outputs: Variable to store loaded documents (type:
RAGDocument) - DocumentSource fans out, emitting one message per document - Critical distinction: Constructor args configure the reader instance;
load_dataargs control how documents are loaded
Common Reader Modules¶
SimpleDirectoryReader (llama_index.core.SimpleDirectoryReader):
- Constructor args: input_dir, input_files, required_exts, exclude, recursive, file_extractor, file_metadata, encoding
- Loader args: num_workers (parallel processing)
- Supports 15+ file types including PDF, DOCX, CSV, Markdown, images, audio/video
- Full documentation
JSONReader (llama_index.readers.json.JSONReader):
- Constructor args: levels_back, collapse_length, ensure_ascii, is_jsonl, clean_json
- Loader args: input_file, extra_info
- Supports both JSON and JSONL (JSON Lines) formats
- Full documentation
Dynamic Arguments¶
You can pass flow variables as constructor arguments by including them in args. At runtime, QType merges message variables with the configured args:
variables:
- id: data_path
type: text
steps:
- type: DocumentSource
id: load_docs
reader_module: llama_index.core.SimpleDirectoryReader
args:
input_dir: data_path # References variable from message
inputs: [data_path]
Complete Example¶
# Load all markdown files from docs directory using DocumentSource
#
# This example demonstrates using DocumentSource with SimpleDirectoryReader
# to load documents from a local directory with file filtering.
id: load_documents_example
description: Load markdown files from docs directory
flows:
- type: Flow
id: load_md_files
description: Load all markdown files from docs directory
variables:
- id: document
type: RAGDocument
inputs: []
outputs:
- document
steps:
- type: DocumentSource
id: md_docs
reader_module: llama_index.core.SimpleDirectoryReader
args:
input_dir: docs
required_exts: [".md"]
recursive: true
outputs:
- document