AWS S3 Pipeline Template
Simple pipeline to sync AWS S3 documents to Swiss AI Hub data lake.
Setup
1. Create AWS Access Keys
- Go to AWS IAM Console → Users → Security credentials
- Create access key
- Note Access Key ID and Secret Access Key
2. Configure Environment
Copy variables from .env.template to your .env and fill in:
bash
RCLONE_S3_NAME=s3
RCLONE_S3_TYPE=s3
RCLONE_S3_ACCESS_KEY_ID=AKIA...
RCLONE_S3_SECRET_ACCESS_KEY=your-secret-key
RCLONE_S3_REGION=us-east-1Advanced Options
Custom endpoint (MinIO, SeaweedFS, etc.):
bash
RCLONE_S3_ENDPOINT=https://minio.example.com:9000Access Control:
bash
RCLONE_S3_ACL=privateStorage Class:
bash
RCLONE_S3_STORAGE_CLASS=STANDARD_IA3. Update Pipeline
Edit pipeline.py to point to your bucket:
python
source_remote=f"{s3.name}:my-bucket-name/path/to/folder"4. Run Pipeline
bash
uv run dagster dev -f pipeline.py