LLM proxy
The LLM proxy (LiteLLM) provides a centralized gateway to language model providers. It abstracts vendor-specific APIs behind an OpenAI-compatible interface, allowing the platform to work with multiple AI providers without changing code.
Configuration
Models are configured in the LiteLLM configuration file. Each model entry specifies the provider, API endpoint, authentication, and capabilities.
Example model configuration:
model_list:
# Cloud model (Swiss LLM Cloud)
- model_name: text-generation/gemma-4-31B-it
litellm_params:
model: openai/google/gemma-4-31B-it
api_base: os.environ/SWISS_LLM_CLOUD_API_BASE_URL
api_key: os.environ/SWISS_LLM_CLOUD_API_KEY
drop_params: true
model_info:
mode: chat
supports_function_calling: true
input_cost_per_token: 0.0000002
output_cost_per_token: 0.0000008
# Local GPU model (vLLM)
- model_name: text-generation/Qwen3-VL-30B-A3B-Instruct-FP8
litellm_params:
model: openai/qwen3-vl-30b
api_base: http://vllm:8000/v1
api_key: os.environ/LOCAL_LLM_TOKEN
drop_params: true
model_info:
mode: chat
supports_function_calling: true
supports_vision: true
input_cost_per_token: 0
output_cost_per_token: 0The model_name identifies the model in agent configurations using the real canonical model name. The litellm_params section contains provider-specific connection details. The model_info section specifies capabilities and per-token pricing for cost tracking through Langfuse.
Core functions
Unified interface: LiteLLM provides an OpenAI-compatible API that works with Swiss LLM Cloud, locally hosted vLLM models, and other providers. Platform code uses the same interface regardless of which model handles the request.
Request routing: The proxy routes requests based on configured strategy. Current configuration uses "usage-based-routing-v2" which distributes load across available models.
Cost tracking: Usage tracking captures token consumption per request. Cost per token is configured for each model, allowing the platform to calculate and display costs per conversation. See Cost control for details on cost tracking and optimization.
PII protection: Presidio integration (when enabled) scans requests for personally identifiable information before sending them to external providers. See Data Anonymization for details.
Retry policies: The configuration specifies retry counts for timeout errors, rate limit errors, and internal server errors.
