SFTP Pipeline Template
Simple pipeline to sync SFTP server documents to Swiss AI Hub data lake.
Setup
1. Get SFTP Credentials
You need:
- Host: SFTP server address
- Username: Your account username
- Password: Your account password (OR SSH key file)
- Port: Usually 22
2. Configure Environment
Copy variables from .env.template to your .env and fill in:
bash
RCLONE_SFTP_NAME=sftp
RCLONE_SFTP_TYPE=sftp
RCLONE_SFTP_HOST=sftp.example.com
RCLONE_SFTP_USER=username
RCLONE_SFTP_PASS=password
RCLONE_SFTP_PORT=223. Update Pipeline
Edit pipeline.py to point to your folder:
python
source_remote=f"{sftp.name}:/path/to/documents"4. Run Pipeline
bash
uv run dagster dev -f pipeline.pyUsing SSH Key
Instead of password, use SSH key file:
bash
# .env
RCLONE_SFTP_KEY_FILE=/secrets/ssh_key
# Remove RCLONE_SFTP_PASSMount key file in infra/docker-compose.dev.yml:
yaml
rclone:
volumes:
- ./ssh_key:/secrets/ssh_key:roAdvanced Options
Known hosts file:
bash
RCLONE_SFTP_KNOWN_HOSTS_FILE=/secrets/known_hostsSkip symbolic links during transfer:
bash
RCLONE_SFTP_SKIP_LINKS=trueNote on host key verification: By default, rclone uses the system's known_hosts file. To skip host key verification in development (not recommended for production), set
RCLONE_SFTP_KNOWN_HOSTS_FILEto an empty or non-existent file.
