AI Support Agent Performance: Data From Isara Reveals the Hidden Skill Gap

The Hidden Truth Behind AI Agent Performance Metrics

Most companies deploying AI agents in customer support are flying with a single instrument: the platform's own reporting. While containment rates and resolution counts look impressive on a dashboard, the problem is that the incentives behind how these numbers are calculated do not always align with your own. Platforms that charge per resolution are incentivized to call things resolved early, while those that charge per message are incentivized to let conversations run long. Isara addresses this by providing an independent, AI powered analysis of streaming textual data. By connecting insights across support and success, Isara ensures that critical signals like churn risk or expansion opportunities do not get lost between tools or teams.

We wanted to find out what was actually happening, so we measured it. Using Isara’s agent assessment tool, we evaluated 100,050 customer support interactions handled over a three week period in February 2026. Each interaction was assessed across a consistent set of dimensions: complexity, sensitivity, skill required, skill demonstrated, interpersonal quality, depth of knowledge, and contribution to both problem resolution and the customer relationship. The interactions were split between 211 human agents and 12 AI bots. Critically, the ticket difficulty was comparable across both groups because bots were not being handed only the easy cases. Average complexity was virtually identical, at 1.76 for humans and 1.77 for bots on a 1 to 5 scale, with over 90% of interactions for both falling in the minimal to low complexity range.

The Human and Bot Skill Gap: A Deep Exploration

The data reveals that the skill gap is real and larger than most leaders assume. Human agents demonstrated skill that consistently exceeded what their interactions required, with 98.4% of their interactions handled at or above the required level. Isara helps leadership teams visualize these performance tiers to ensure high standards are maintained through its Comprehensive Satisfaction Insights.

  • AI bots were 4.7 times more likely to fall short of the skill level their interaction required.

  • At the severe end, where the gap between what was needed and what was demonstrated exceeded one full level, bots were 8.4 times more likely to underperform.

  • This matters because a skill gap is a leading indicator, not a lagging one.

  • By the time it shows up in a complaint or a churn metric, many interactions have already gone wrong.

  • Across all assessed interactions, bots were less than 60% as likely as humans to move an interaction toward resolution.

  • Bots were also found to be 37% more likely than human agents to have a negative impact on the path to resolution, meaning they made the problem worse.

  • None of these were conversations that escalated or generated a complaint; they were simply conversations the platform marked as handled.

The impact on the customer relationship follows the same pattern. Human agents were net positive across all interactions, but bots were net negative. Bots had roughly half the proportion of relationship positive interactions compared to humans and were 1.6 times more likely to have a negative impact on the relationship in any given conversation. Isara identifies these heated conversations through its Escalation and Early Warning Signals, allowing teams to defuse them before they escalate. The resolution and relationship data paint a consistent picture: the bots in this sample are not neutral. They are actively producing worse outcomes, at scale, in ways that platform dashboards are not surfacing.

Original Insight: The Complexity Cliff and Routing Strategy

One of the most actionable findings from our study concerns how performance degrades with ticket difficulty. On low complexity interactions, which represent the majority of volume for both groups, bots perform adequately. The skill gap there is small at negative 0.15, and only 5% of interactions show underperformance. Isara allows users to visualize these top customer issues and jump directly to affected conversations through Customer Monitoring.

  • On medium complexity tickets, bots tip into underperformance, with 34.8% of interactions showing a skill deficit versus 7.1% for humans.

  • At high complexity, the picture is stark: 68.4% of bot interactions show a skill gap compared to 23.5% for humans.

  • High complexity bot interactions showed a mean gap of 1.23 skill levels.

  • Complexity is therefore a reliable and measurable routing signal, as the interactions where bots cause the most damage are identifiable in advance.

  • One finding that may be counterintuitive is that bots are not at a disadvantage on knowledge.

  • Depth of knowledge scores were effectively identical at 2.19 for bots versus 2.18 for humans.

  • The gap is in the application and execution, specifically interpersonal skill.

  • Interpersonal skill, which is the capacity to read tone and handle sensitive situations, scored 2.93 for humans versus 2.36 for bots.

  • Humans were rated 24% higher on interpersonal skill because it is not what the bot knows, but what it does with it.

The central issue is not that AI agents are performing poorly, but that most companies have no independent way of knowing they are. When a bot platform marks a conversation as resolved, that classification is produced by a system with a financial interest in the outcome. Isara solves this by providing a systematic, per interaction view that can identify which agents are creating value and which ticket types should never reach a bot. The companies that will have a retention advantage in the next three years are the ones building this independent view.

How Isara Addresses the AI Performance Gap

How can Isara help me identify the complexity cliff within my own support data?

Isara uses Customer Monitoring and Temperature to tag conversations with Areas of Concern. This allows you to see exactly where your AI agents begin to struggle with medium and high complexity tickets, enabling you to adjust your routing logic before customer relationships are damaged.

If AI bots are 37% more likely to make a problem worse, how does Isara detect this?

Unlike standard dashboards that focus on containment, Isara uses the Customer Frustration Watch to analyze how sentiment evolves during a conversation. It can detect when a bot is negatively impacting the path to resolution by identifying heated patterns and surfacing these early warning signals to human supervisors.

How does Isara bridge the gap between AI knowledge and interpersonal execution?

While bots often have the right knowledge, they lack the interpersonal quality noted in the study. Isara provides Comprehensive Satisfaction Insights and Proactive Service Analytics to measure how effectively your agents, both human and AI, are actually addressing customer needs beyond just reciting documentation.

Can Isara help fix the knowledge gaps that lead to bot underperformance?

Yes, the Knowledge Gap and Documentation Fixes feature integrates with your codebase to surface missing or unclear content. This ensures your bots have the best possible material to work with, while also identifying which complex issues should always be routed to humans for better interpersonal handling.

How can I use this data to improve my long term customer success strategy?

Isara connects operational data to strategic account management by detecting churn signals early. It is also launching Quarterly Business Review preparation tools and Revenue Expansion Signals to help success managers act on the expansion opportunities that AI agents might otherwise overlook during support interactions.

Previous
Previous

Who’s Monitoring the Monitor? The Rise of AI Oversight

Next
Next

The hidden risk of AI in customer conversations is not the mistake, it is the silence after