Language models

The Swiss AI Hub integrates with language model providers through LiteLLM, a unified gateway that handles routing, cost tracking, and security. Agents access models through this proxy layer without needing provider-specific code.

Supported models

LiteLLM supports 100+ LLM providers. The platform can integrate with any provider LiteLLM supports.

The platform uses a dual-mode inference model:

Non-GPU deployments: Swiss LLM Cloud (Swiss-hosted provider) for text generation, embedding, reranking, transcription, and OCR
GPU deployments: Local vLLM on an NVIDIA RTX 6000 Pro (96 GB VRAM) for fully air-gapped operation
Any additional OpenAI-compatible API endpoint can be added via LiteLLM configuration

Models are configured in LiteLLM with metadata about capabilities (chat, embedding, vision, function calling), token limits, and costs. Agents specify which model to use in their configuration. Adding new providers requires updating the LiteLLM configuration file.

Architecture

The platform uses three layers:

LLM proxy layer: Provides a unified gateway to language model providers. See Proxy server for routing, cost tracking, and retry handling.

Agent layer: Agents implement workflows using LLMs through the proxy. See Guards for input and output validation.

User layer: Users interact with agents through chat interfaces.

How the layers work together

When a user asks a question:

The question reaches the agent
Agent input guards (optional) validate the question is appropriate
Presidio (if enabled) scans for PII in the question at the proxy layer
The proxy routes the request to the configured LLM provider
The LLM generates a response
Agent output guards (optional) check response quality and redact PII from retrieved documents
The response reaches the user

This layered approach provides defense-in-depth for both functionality (guards ensure quality) and security (Presidio protects user input, output guards protect retrieved data).

Components

Proxy server: LiteLLM configuration, routing, and cost tracking
Data anonymization: Presidio integration for PII protection in user input
Guards: Agent-level input and output validation for quality and security

Introduction: The Swiss AI Hub Vision

Why Swiss AI Hub

Quick Start: Your First 30 Minutes

Platform Architecture Deep Dive

Deployment Guide

Monitoring & Alerting

Agents

Data pipelines

Knowledge management

Chat Interface

Access Management

Auditing & Observability

Language models

Memory

Multi-tenancy

Slack & Teams Integrations

API

Security

Compliance and regulations

Quick Start

Building Agents

Building Pipelines

Building Processes

Advanced SDK Topics

Features

Contributing

Using AI to Contribute

Certification

API Reference

Troubleshooting

Glossary

Pipeline

Sources

Language models

Supported models

Architecture

How the layers work together

Components

Monitoring & Alerting

Sources

Language models ​

Supported models ​

Architecture ​

How the layers work together ​

Components ​

Language models

Supported models

Architecture

How the layers work together

Components