Automated Failure Attribution in LLM Multi-Agent Systems: Pinpointing the Responsible Agent

By • min read

Large Language Model (LLM) multi-agent systems have become a popular paradigm for tackling complex tasks through collaborative effort. Yet even the most dynamic teams of AI agents can fail—and when they do, developers face a daunting question: which agent caused the breakdown, and at what point in the workflow? Sifting through extensive interaction logs to locate the root cause is akin to finding a needle in a haystack—time-consuming and exhausting. This frustration is all too familiar as multi-agent systems grow in complexity, making failures both common and notoriously difficult to diagnose due to autonomous collaboration and long information chains.

Now, researchers from Penn State University and Duke University, in collaboration with Google DeepMind, the University of Washington, Meta, Nanyang Technological University, and Oregon State University, have introduced a novel problem called Automated Failure Attribution. They present the first benchmark dataset for this task—Who&When—and develop several automated attribution methods. Their work highlights the inherent complexity of failure analysis and offers a new path toward more reliable LLM multi-agent systems. The paper has been accepted as a Spotlight presentation at the top-tier machine learning conference ICML 2025.

The Challenge of Diagnosing Failures in Multi-Agent LLM Systems

Multi-agent systems built on LLMs have shown great promise in domains ranging from software engineering to creative writing. However, these systems are fragile. A single agent’s error, a misunderstanding between agents, or a misstep in information relay can cascade into complete task failure.

Automated Failure Attribution in LLM Multi-Agent Systems: Pinpointing the Responsible Agent
Source: syncedreview.com

Why Failures Occur

Failures can stem from multiple sources:

These issues are compounded by the autonomous nature of agents, which makes it hard to trace the exact moment and cause of an error.

Current Debugging Inefficiencies

When a multi-agent system fails, developers typically resort to manual, slow methods:

This process is not only tedious but also error-prone, delaying iterations and improvements.

Introducing Automated Failure Attribution

To overcome these bottlenecks, the research team formalized the problem of automated failure attribution: given a record of a failed multi-agent task, automatically identify which agent was responsible and at which step the failure originated. They built the first dedicated benchmark, Who&When, and tested multiple attribution methods.

The Who&When Benchmark Dataset

The Who&When dataset consists of carefully constructed failure scenarios in multi-agent LLM systems. Each scenario includes the full interaction log, a description of the task, and ground-truth labels specifying the responsible agent and the failure moment. This dataset enables systematic evaluation of attribution techniques. The researchers have made the dataset publicly available on Hugging Face.

Attribution Methods Developed

The team explored several automated approaches, ranging from simple heuristic rules to more sophisticated LLM-based reasoning:

Their experiments revealed that while simple heuristics offer a starting point, LLM-based reasoning methods significantly improve attribution accuracy. However, the task remains challenging, especially in long, complex workflows with many agents.

Significance and Applications

This research lays the groundwork for automated debugging tools that can dramatically speed up development cycles. By pinpointing the exact agent and timing of a failure, developers can focus their attention on fixing specific components rather than re-reading entire logs. Potential applications include:

The work also underscores the need for more transparent and interpretable multi-agent architectures, as attribution becomes easier when agents can explain their actions.

Availability and Future Work

The paper “Which Agent Causes Task Failures and When?” is available on arXiv. The code and dataset are fully open-source. The researchers hope that Who&When will become a standard benchmark, encouraging further innovation in failure attribution and ultimately making LLM multi-agent systems more reliable and developer-friendly.

Recommended

Discover More

7B AI Model Outperforms GPT-5 and Claude by Orchestrating Rival LLMs with Reinforcement Learning7 Theories That Might Explain Our Cosmic LonelinessUnderstanding the Creative Mind: Answers to Common QuestionsTesla's $573 Million Boost from Musk’s Other Ventures Signals a Deeper AI BetHow to Set Up Tesla Semi Charging Infrastructure: Basecharger & Megacharger Guide