Performance

Agency Latency

The end-to-end time between a user's high-level goal statement and the completion of the resulting autonomous workflow.

Agency latency measures the total elapsed time from the moment a user submits a goal to an agentic system to the moment the system delivers a completed outcome. It is one of the primary performance metrics for agentic websites, analogous to page load time in traditional web performance.

Agency latency is composed of several stages:
- Intent parsing latency: Time for the LLM to interpret and structure the user's goal
- Planning latency: Time to decompose the goal into a task graph
- Tool invocation latency: Cumulative time spent waiting for external API responses
- Synthesis latency: Time to compile results and generate the final output
- HITL wait time: Time spent waiting for human confirmations (often excluded from automated benchmarks)

Reducing agency latency is a multi-dimensional engineering challenge. Key optimization techniques include parallel tool execution (running independent tool calls simultaneously), LLM response caching for repeated sub-tasks, pre-fetching likely-needed data based on early intent signals, and streaming UI updates to provide perceived responsiveness during long-running workflows.

For user experience, perceived latency is often more important than actual latency. Agentic sites that provide real-time progress indicators—showing users what the agent is doing at each step—achieve significantly higher satisfaction scores even when total completion time is unchanged.