Working with Types and Structured Data¶
Time: 25 minutes
Prerequisites: Tutorial 1: Your First QType Application
Example: 03_structured_data.qtype.yaml
What you'll learn:
- Define custom types for your domain
- Parse LLM JSON responses into structured data
- Build typed objects from extracted fields
- Work with list types
What you'll build: An application that analyzes product reviews and extracts structured sentiment data including ratings, confidence scores, and key points.
Background: Why Custom Types?¶
So far you've worked with simple text types. But real applications need structure:
- Domain models - Product reviews, user profiles, search results
- Validation - Ensure data has required fields
- Type safety - Catch errors before runtime
- Composability - Build complex types from simpler ones
QType lets you define CustomTypes to model your domain, just like you'd define classes in Python.
Part 1: Define Your Custom Type (5 minutes)¶
Create Your Application File¶
Create 03_structured_data.qtype.yaml:
id: review_sentiment_analyzer
description: |
Analyzes a product review to extract structured sentiment insights.
Demonstrates custom types and structured data extraction.
Define a CustomType¶
Before the models: section, add a types: section:
types:
- id: ReviewSentiment
description: Structured sentiment analysis of a review
properties:
sentiment: text
confidence: float
key_points: list[text]
rating: int
What this means:
types: section - Where you define custom data structures
properties: - The fields your type contains:
- sentiment: text - A simple text field
- confidence: float - A decimal number (0.0-1.0)
- key_points: list[text] - A list of strings (new!)
- rating: int - An integer (1-5 star rating)
List types use the syntax list[element_type]. Examples:
- list[text] - List of strings
- list[int] - List of integers
- list[ReviewSentiment] - List of custom types
Add Model Configuration¶
models:
- type: Model
id: analyzer_model
provider: aws-bedrock
model_id: amazon.nova-lite-v1:0
inference_params:
temperature: 0.7
max_tokens: 512
Check your work:
Should pass ✅
Part 2: Build the Analysis Flow (10 minutes)¶
Define Flow Variables¶
Add the flow structure:
flows:
- id: analyze_review
description: Analyzes a single review and extracts structured sentiment
inputs:
- review_text
outputs:
- result
variables:
- id: review_text
type: text
- id: raw_llm_response
type: text
- id: llm_response
type: text
- id: sentiment
type: text
- id: confidence
type: float
- id: key_points
type: list[text]
- id: rating
type: int
- id: result
type: ReviewSentiment
What's new:
Multiple variable types - Notice we have:
- Simple types (text, float, int)
- List type (list[text])
- Custom type (ReviewSentiment)
Why so many variables? Each step transforms data from one form to another. This explicit data flow makes debugging easier and documents how information moves through your application.
Step 1: Create the Analysis Prompt¶
Add the first step under steps::
steps:
# Step 1: Create analysis prompt
- id: analysis_prompt
type: PromptTemplate
template: |
Analyze this product review and extract structured information.
Review: {{review_text}}
Respond with ONLY valid JSON, no other text or markdown. Use this exact structure:
{{{{
"sentiment": "positive|negative|neutral|mixed",
"confidence": 0.95,
"key_points": ["point 1", "point 2"],
"rating": 4
}}}}
Where:
- sentiment: overall sentiment (positive/negative/neutral/mixed)
- confidence: your confidence score (0.0-1.0)
- key_points: 2-3 main points from the review
- rating: estimated star rating 1-5 based on the tone
Return ONLY the JSON object, nothing else.
inputs:
- review_text
outputs:
- raw_llm_response
Key technique - Escaping braces:
Notice {{ and }} in the template? QType uses Python's .format() method where {variable} is a placeholder. To include literal curly braces in the output, you must double them:
- {{ → outputs {
- }} → outputs }
So to output the JSON structure, we use {{ ... }} which renders as { ... } in the actual prompt.
Why "ONLY valid JSON"? LLMs often add explanatory text or wrap JSON in markdown code fences. Being explicit reduces these issues.
Step 2: Run LLM Inference¶
# Step 2: Run LLM inference
- id: analyze
type: LLMInference
model: analyzer_model
inputs:
- raw_llm_response
outputs:
- llm_response
LLMInference step sends the prompt to your model and returns the response as text. Simple and familiar from Tutorial 1.
Step 3: Parse JSON with Decoder¶
Here's the new step type:
# Step 3: Parse the JSON response and build the ReviewSentiment object
# Decoder converts the JSON string into structured data
- id: parse_and_build
type: Decoder
format: json
inputs:
- llm_response
outputs:
- sentiment
- confidence
- key_points
- rating
What Decoder does:
format: json - Tells QType to parse as JSON (also supports xml)
Multiple outputs - Each output name must match a field in the JSON:
{
"sentiment": "positive", ← goes to sentiment variable
"confidence": 0.95, ← goes to confidence variable
"key_points": [...], ← goes to key_points variable
"rating": 4 ← goes to rating variable
}
Smart parsing:
- Automatically strips markdown code fences (```json)
- Validates JSON syntax
- Maps JSON types to QType types (string→text, number→float/int, array→list)
- Raises clear errors if fields are missing or malformed
Check your work:
Step 4: Construct the Typed Object¶
Final step - convert individual fields into your custom type:
# Step 4: Construct a ReviewSentiment object
# Construct builds typed objects from the decoded fields
- id: build_result
type: Construct
output_type: ReviewSentiment
field_bindings:
sentiment: sentiment
confidence: confidence
key_points: key_points
rating: rating
inputs:
- sentiment
- confidence
- key_points
- rating
outputs:
- result
What Construct does:
output_type: ReviewSentiment - Specifies which custom type to build
field_bindings: - Maps type field names to input variables:
In this case, names match (sentiment: sentiment), but you could use different names:
field_bindings:
sentiment: analyzed_sentiment # Maps analyzed_sentiment variable to sentiment property
Why Construct? It validates that:
- All required properties are provided
- Types match (float for confidence, int for rating, etc.)
- The result is a valid ReviewSentiment instance
This catches errors early rather than failing later in your application.
Final validation:
Should pass ✅
Part 3: Test Your Application (5 minutes)¶
Run It!¶
qtype run -i '{"review_text":"These headphones are amazing! Great sound quality and super comfortable. Battery lasts all day."}' 03_structured_data.qtype.yaml
Expected output:
{
"result": {
"sentiment": "positive",
"confidence": 0.95,
"key_points": [
"Great sound quality",
"Super comfortable",
"Long battery life"
],
"rating": 5
}
}
Try Different Reviews¶
# Negative review
qtype run -i '{"review_text":"Terrible product. Broke after one week and customer service was unhelpful."}' 03_structured_data.qtype.yaml
# Mixed review
qtype run -i '{"review_text":"Good sound but uncomfortable after an hour. Battery is okay but not great."}' 03_structured_data.qtype.yaml
Notice how the LLM adapts its analysis while maintaining the structured format!
Part 4: Understanding the Data Flow (5 minutes)¶
The Complete Pipeline¶
Here's what happens when you run the application:
1. User Input (text)
"These headphones are amazing!"
↓
2. PromptTemplate
Creates prompt with JSON format instructions
↓
3. LLMInference
Sends to model → Returns JSON string
↓
4. Decoder
Parses JSON string → Extracts individual fields
{sentiment: "positive", confidence: 0.95, ...}
↓
5. Construct
Builds ReviewSentiment object from fields
ReviewSentiment(sentiment="positive", confidence=0.95, ...)
↓
6. Output (ReviewSentiment)
Validated, typed data ready for downstream use
Key insight: Each step has a single, focused responsibility: - PromptTemplate - Format instructions - LLMInference - Get AI response - Decoder - Parse structured data - Construct - Validate and type
This separation makes each step testable and reusable.
Note: You could simplify this by having the LLM return {"result": {...}} and using Decoder to output directly to a ReviewSentiment variable, skipping the Construct step. However, this tutorial demonstrates both steps separately so you understand when to use each:
- Decoder - When you need to parse text and extract individual fields
- Construct - When you need to build typed objects from already-extracted data
In practice, use the approach that best fits your use case.
Error Handling¶
What happens if the LLM returns invalid JSON?
Decoder will fail with a clear error:
What if a field is missing?
What if a type is wrong?
These explicit errors help you debug issues quickly. In production, you might add retry logic or fallback values.
What You've Learned¶
Congratulations! You've mastered:
✅ CustomType definition - Modeling your domain with structured types
✅ List types - Working with list[text] and other collections
✅ Decoder step - Parsing JSON into individual typed fields
✅ Construct step - Building validated custom type instances
✅ Field Bindings - Connecting variables to type properties
✅ Type safety - Catching errors early with validation
Next Steps¶
Reference the complete example:
03_structured_data.qtype.yaml- Full working example
Learn more:
- CustomType Reference - Complete type system
- Decoder Step - Advanced parsing options
- Construct Step - Field mapping patterns
- Type System - Primitives, domain types, and custom types
Common Questions¶
Q: Can I nest custom types?
A: Yes! A CustomType property can be another CustomType:
types:
- id: Address
properties:
street: text
city: text
- id: User
properties:
name: text
address: Address # Nested custom type
Q: What if the LLM returns extra fields not in my type?
A: Extra fields are ignored. Decoder only extracts the fields you've specified in outputs:.
Q: Can Decoder output the entire JSON as one variable?
A: Not directly. Decoder maps JSON fields to individual outputs. If you need the whole JSON, use type: any in your variable and skip Decoder.
Q: When should I use Decoder vs FieldExtractor?
A: Use Decoder when you have a JSON/XML string to parse. Use FieldExtractor when you already have structured data and need to extract specific fields using JSONPath (covered in advanced tutorials).
Q: Can I make properties optional?
A: Currently all properties are required. For optional fields, you can define them in your flow logic but not include them in the Construct step.