Testing and debugging
Testing and debugging agents needs a different approach than traditional applications because of their event-driven, asynchronous nature.
Unit testing: direct step invocation
The simplest way to test individual steps is to instantiate the agent and call step methods directly with mocked dependencies:
from unittest.mock import AsyncMock, Mock
async def test_retrieve_step():
agent = MyAgent()
event = UserMessageEvent(messages=[...], user=fake_user(), locale="en")
memory = Mock(spec=AgentMemory)
memory.search_user_memory = AsyncMock(return_value=MemorySearchResult(...))
result = await agent.retrieve_step(event, memory)
assert isinstance(result, RetrieveUserMemoryEvent)
memory.search_user_memory.assert_called_once()This approach tests step logic in isolation without the dispatcher, NATS, or any infrastructure. Mock injected dependencies (AgentMemory, EventDisplayer, RunContext) and assert on the returned event.
Integration testing: pytest-bdd + AgentTestRunner
Use Behavior-Driven Development (BDD) with pytest-bdd for testing full agent workflows.
Basic test structure
- Feature file - Describe behavior in natural language
# tests/features/iterative_agent.feature
Feature: Iterative Processing Agent
An agent that performs iterative processing with configurable limits
Scenario: Agent processes data with iteration limit
Given an iterative processing agent with maximum 2 iterations
When I ask the agent to process some data
Then the agent should complete all iterations
And the agent should stop after reaching the limit
And the processing should be successful- Test implementation - Connect Gherkin to code
from swiss_ai_hub.core.testing.asyncio_utils.bdd import async_test
from pytest_bdd import given, parsers, scenarios, then, when
from swiss_ai_hub.agent.runners.agent_test_runner import AgentTestRunner
scenarios("./features/iterative_agent.feature")
@given(parsers.parse('an iterative processing agent with maximum {max_iterations:d} iterations'))
def _(max_iterations: int):
return AgentTestRunner(
agent_type=BoundedLoopAgent,
agent_config=BoundedLoopAgentConfig(
agent_id="iterative_agent",
loop_max=max_iterations
)
)@when("I ask the agent to process some data")
@async_test
async def _(agent_runner: AgentTestRunner):
async with agent_runner.test_run() as topic:
await agent_runner.send_event_from_topic(
topic=topic,
start_event=UserMessageEvent(
messages=[ChatMessage(content="Process this data", role=MessageRole.USER)],
user=fake_user()
)
)
@then("the agent should complete all iterations")
def _(agent_runner: AgentTestRunner):
iteration_events = agent_runner.get_events_of_class(BeginEvent)
assert len(iteration_events) == 3, f"Expected 3 iterations, got {len(iteration_events)}"AgentTestRunner: core testing tool
AgentTestRunner provides a sandboxed environment for testing agents.
Basic usage
async def test_simple_agent():
runner = AgentTestRunner(
agent_type=MyAgent,
agent_config=MyAgentConfig(agent_id="test_agent")
)
async with runner.test_run() as topic:
await runner.send_event_from_topic(
topic=topic,
start_event=UserMessageEvent(...)
)
# Assertions
assert runner.has_stop_event
stop_event = runner.get_stop_event()
assert "expected content" in stop_event.final_messageEvent inspection methods
Available methods
# Check for specific events
assert runner.has_start_event
assert runner.has_stop_event
# Get specific events
stop_event = runner.get_stop_event()
start_event = runner.get_start_event()
# Get events by type
all_events = runner.get_events_of_class(MyCustomEvent)
single_event = runner.get_event_of_class(MyCustomEvent)
# Count events
event_count = len(runner.get_events_of_class(ProcessingEvent))Debugging strategy: trace-driven development
Traditional debugging with breakpoints doesn't work well for event-driven agents. Use trace-driven debugging instead.
Your debugging toolkit: Langfuse tracing (primary), comprehensive logging, trigger scripts, event flow
inspection.
Essential tool: trigger.py scripts
Create trigger.py scripts to test specific scenarios:
# my_agent/trigger.py
import asyncio
from swiss_ai_hub.core.infrastructure.logging.logger import enable_logging
from swiss_ai_hub.agent.runners.agent_test_runner import AgentTestRunner
# ALWAYS enable logging for debugging
enable_logging()
async def main():
runner = AgentTestRunner(
agent_type=MyAgent,
agent_config=MyAgentConfig(
agent_id="debug_agent"
)
)
async with runner.test_run() as topic:
await runner.send_event_from_topic(
topic=topic,
start_event=UserMessageEvent(
messages=[ChatMessage(content="test input", role=MessageRole.USER)],
user=fake_user()
)
)
if __name__ == "__main__":
asyncio.run(main())Interactive testing: run.py scripts
For agents that need to run continuously:
# my_agent/run.py
import asyncio
from swiss_ai_hub.core.infrastructure.logging.logger import enable_logging
from swiss_ai_hub.agent.runners.agent_test_runner import AgentTestRunner
enable_logging()
async def main():
runner = AgentTestRunner(
agent_type=MyAgent,
agent_config=MyAgentConfig(agent_id="interactive_agent")
)
# Keeps agent running for interactive testing
await runner.run_forever()
if __name__ == "__main__":
asyncio.run(main())Langfuse tracing: visual debugging
Langfuse provides step-by-step visualization of agent execution at http://localhost:6006.
Key features:
- Trace view - See complete workflow execution
- Step details - Click steps to inspect inputs/outputs
- Timing analysis - Identify performance bottlenecks
- Error tracking - Pinpoint where failures occur
Debugging workflow:
- Run your
trigger.pyscript - Open Langfuse UI at
localhost:6006 - Find your agent's execution trace
- Click through steps to inspect event flow
- Identify where things go wrong
Running tests
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_my_agent.py
# Run with verbose output
uv run pytest -v tests/
# Run with coverage
uv run pytest --cov=swiss_ai_hub tests/Implementation checklist
Use this checklist when building or reviewing agents:
Before implementation
- [ ] Understand the execution model — steps are a dependency graph, not a sequence
- [ ] Review the memory lifecycle if your agent uses memory
- [ ] Study production agents:
packages/agent/swiss_ai_hub/agent/agents/rag_agent/,expert_rag_agent/
For each step
- [ ] Optional parameters (
T | None = None) have preconditions checking both config AND event presence - [ ] Precondition parameter types are a subset of the step's injectable types
- [ ] Return type correctly indicates terminal (
StopEvent) vs. non-terminal - [ ] No dependency on
StopEventor its subclasses as input parameters
For memory integration
- [ ] LLM step uses
as_stop_step=False(returnsLLMEvent, notLLMStopEvent) - [ ] Storage step depends on
LLMEvent - [ ] Final step has a precondition waiting for storage completion
After implementation
- [ ] Langfuse/Phoenix trace shows expected execution order
- [ ] No duplicate step executions (check for the optional parameter trap)
- [ ] No events after
StopEvent - [ ] Tests cover all config flag combinations
