Automated Failure Attribution in LLM Multi-Agent Systems: Pinpointing the Responsible Agent

By • min read

Large Language Model (LLM) multi-agent systems have become a popular paradigm for tackling complex tasks through collaborative effort. Yet even the most dynamic teams of AI agents can fail—and when they do, developers face a daunting question: which agent caused the breakdown, and at what point in the workflow? Sifting through extensive interaction logs to locate the root cause is akin to finding a needle in a haystack—time-consuming and exhausting. This frustration is all too familiar as multi-agent systems grow in complexity, making failures both common and notoriously difficult to diagnose due to autonomous collaboration and long information chains.

Now, researchers from Penn State University and Duke University, in collaboration with Google DeepMind, the University of Washington, Meta, Nanyang Technological University, and Oregon State University, have introduced a novel problem called Automated Failure Attribution. They present the first benchmark dataset for this task—Who&When—and develop several automated attribution methods. Their work highlights the inherent complexity of failure analysis and offers a new path toward more reliable LLM multi-agent systems. The paper has been accepted as a Spotlight presentation at the top-tier machine learning conference ICML 2025.

The Challenge of Diagnosing Failures in Multi-Agent LLM Systems

Multi-agent systems built on LLMs have shown great promise in domains ranging from software engineering to creative writing. However, these systems are fragile. A single agent’s error, a misunderstanding between agents, or a misstep in information relay can cascade into complete task failure.

Automated Failure Attribution in LLM Multi-Agent Systems: Pinpointing the Responsible Agent — Source: syncedreview.com

Why Failures Occur

Failures can stem from multiple sources:

Agent-level mistakes: An individual agent misinterprets instructions or generates incorrect output.
Communication breakdowns: Agents fail to share critical context or misinterpret exchanged messages.
Coordination failures: The overall workflow lacks proper orchestration, leading to redundant or conflicting actions.

These issues are compounded by the autonomous nature of agents, which makes it hard to trace the exact moment and cause of an error.

Current Debugging Inefficiencies

When a multi-agent system fails, developers typically resort to manual, slow methods:

Manual log archaeology: They must read long interaction logs line by line, searching for anomalies or contradictory statements.
Reliance on deep expertise: Effective debugging requires intimate knowledge of each agent’s role and the system’s design, which is not always available.

This process is not only tedious but also error-prone, delaying iterations and improvements.

Introducing Automated Failure Attribution

To overcome these bottlenecks, the research team formalized the problem of automated failure attribution: given a record of a failed multi-agent task, automatically identify which agent was responsible and at which step the failure originated. They built the first dedicated benchmark, Who&When, and tested multiple attribution methods.

The Who&When Benchmark Dataset

The Who&When dataset consists of carefully constructed failure scenarios in multi-agent LLM systems. Each scenario includes the full interaction log, a description of the task, and ground-truth labels specifying the responsible agent and the failure moment. This dataset enables systematic evaluation of attribution techniques. The researchers have made the dataset publicly available on Hugging Face.

Attribution Methods Developed

The team explored several automated approaches, ranging from simple heuristic rules to more sophisticated LLM-based reasoning:

Heuristic baselines: Methods that flag agents with the most errors or longest outputs.
Prompt-based attribution: Feeding the entire interaction log to an LLM and asking it to identify the failure source.
Step-wise reasoning: Breaking down the log into segments and analyzing each step individually for inconsistencies.

Their experiments revealed that while simple heuristics offer a starting point, LLM-based reasoning methods significantly improve attribution accuracy. However, the task remains challenging, especially in long, complex workflows with many agents.

Significance and Applications

This research lays the groundwork for automated debugging tools that can dramatically speed up development cycles. By pinpointing the exact agent and timing of a failure, developers can focus their attention on fixing specific components rather than re-reading entire logs. Potential applications include:

Real-time system monitoring and alerting during production.
Continuous integration pipelines that automatically flag regressions.
Educational tools for understanding multi-agent system behavior.

The work also underscores the need for more transparent and interpretable multi-agent architectures, as attribution becomes easier when agents can explain their actions.

Availability and Future Work

The paper “Which Agent Causes Task Failures and When?” is available on arXiv. The code and dataset are fully open-source. The researchers hope that Who&When will become a standard benchmark, encouraging further innovation in failure attribution and ultimately making LLM multi-agent systems more reliable and developer-friendly.