Performance

Agency Latency

The end-to-end time between a user's high-level goal statement and the completion of the resulting autonomous workflow.

Agency latency measures the total elapsed time from the moment a user submits a goal to an agentic system to the moment the system delivers a completed outcome. It is one of the primary performance metrics for agentic websites, analogous to page load time in traditional web performance.

Agency latency is composed of several stages:
- Intent parsing latency: Time for the LLM to interpret and structure the user's goal
- Planning latency: Time to decompose the goal into a task graph
- Tool invocation latency: Cumulative time spent waiting for external API responses
- Synthesis latency: Time to compile results and generate the final output
- HITL wait time: Time spent waiting for human confirmations (often excluded from automated benchmarks)

Reducing agency latency is a multi-dimensional engineering challenge. Key optimization techniques include parallel tool execution (running independent tool calls simultaneously), LLM response caching for repeated sub-tasks, pre-fetching likely-needed data based on early intent signals, and streaming UI updates to provide perceived responsiveness during long-running workflows.

For user experience, perceived latency is often more important than actual latency. Agentic sites that provide real-time progress indicators—showing users what the agent is doing at each step—achieve significantly higher satisfaction scores even when total completion time is unchanged.

Explore the Full Glossary

View All Terms

Model Context Protocol Autonomous Reasoning HITL Tool-Calling Semantic Dispatch Multimodal Orchestration Orchestration Layer Agentic Loop RAG Chain of Thought