Load Data from Athena¶
Query AWS Athena databases using standard SQL with the SQLSource step, which supports Athena through SQLAlchemy connection strings and AWS authentication.
QType YAML¶
flows:
- id: query-athena
steps:
- type: SQLSource
id: load_sales
connection: "awsathena+rest://:@athena.us-east-1.amazonaws.com:443/sales_db?s3_staging_dir=s3://my-results-bucket/athena-results/&work_group=primary&catalog_name=some_catalog"
query: |
SELECT
product_id,
product_name,
total_sales
FROM product_sales
WHERE total_sales >= :min_sales
ORDER BY total_sales DESC
inputs:
- min_sales
outputs:
- product_id
- product_name
- total_sales
Explanation¶
- awsathena+rest: PyAthena SQLAlchemy dialect for accessing Athena via REST API
- Connection string format:
awsathena+rest://:@athena.{REGION}.amazonaws.com:443/{DATABASE}?s3_staging_dir={S3_PATH}&work_group={WORKGROUP}&catalog_name={CATALOG}" - s3_staging_dir: S3 location where Athena writes query results (required by Athena)
- work_group: Athena workgroup name (e.g.,
primary) - auth: Reference to AWSAuthProvider for AWS credentials
- query: Standard SQL query with parameter substitution using
:parameter_namesyntax
Complete Example¶
id: athena-query-example
description: Query AWS Athena database and process results
auths:
- type: aws
id: aws_auth
region: us-east-1
profile_name: default
flows:
- type: Flow
id: query-athena
variables:
- id: min_sales
type: int
- id: product_id
type: text
- id: product_name
type: text
- id: total_sales
type: int
- id: region
type: text
inputs:
- min_sales
outputs:
- product_id
- product_name
- total_sales
- region
steps:
- type: SQLSource
id: load_sales
connection: "awsathena+rest://:@athena.us-east-1.amazonaws.com:443/sales_db?s3_staging_dir=s3://my-results-bucket/athena-results/&work_group=primary&catalog_name=some_catalog""
auth: aws_auth
query: |
SELECT
product_id,
product_name,
total_sales,
region
FROM product_sales
WHERE total_sales >= :min_sales
ORDER BY total_sales DESC
LIMIT 100
inputs:
- min_sales
outputs:
- product_id
- product_name
- total_sales
- region