Data pipelines

Pipelines are automated workflows that transform documents into searchable knowledge bases for AI agents. They monitor file storage locations, process documents when changes occur, and maintain vector databases that agents query for information.

Document processing workflow

Raw documents cannot be directly queried by agents. PDFs and Word files must be converted to text, split into manageable chunks, and transformed into vector embeddings that enable semantic search. Pipelines handle this transformation automatically.

The diagram shows the complete flow from document ingestion through to agent queries. Each stage transforms the data to make it searchable and retrievable.

Automatic synchronization

Pipelines monitor data sources for changes. When a document is added, modified, or deleted, the pipeline processes the change and updates the knowledge base. This keeps agent responses current without manual intervention.

Orchestration with Dagster

Dagster orchestrates pipeline execution, handling scheduling, retries, and logging. Each processing step is tracked, creating an audit trail from document ingestion through to storage. You can review pipeline runs to troubleshoot issues, verify document processing, and monitor data quality.

Introduction: The Swiss AI Hub Vision

Why Swiss AI Hub

Quick Start: Your First 30 Minutes

Platform Architecture Deep Dive

Deployment Guide

Monitoring & Alerting

Agents

Data pipelines

Knowledge management

Chat Interface

Access Management

Auditing & Observability

Language models

Memory

Multi-tenancy

Slack & Teams Integrations

API

Security

Compliance and regulations

Quick Start

Building Agents

Building Pipelines

Building Processes

Advanced SDK Topics

Features

Contributing

Using AI to Contribute

Certification

API Reference

Troubleshooting

Glossary

Pipeline

Sources

Data pipelines

Document processing workflow

Automatic synchronization

Orchestration with Dagster

Monitoring & Alerting

Sources

Data pipelines ​

Document processing workflow ​

Automatic synchronization ​

Orchestration with Dagster ​

Data pipelines

Document processing workflow

Automatic synchronization

Orchestration with Dagster