AI agent tool call errors: how OpenClaw recovers malformed tool arguments

Most AI agent tool call errors are not reasoning failures. The model picked the right tool and knew what it wanted to do. The arguments it emitted were just slightly malformed: a curly quote where a straight one belonged, an over-escaped string, a JSON array that almost parsed. The agent threw the whole call away, and the user saw a turn that stalled or looped for no visible reason.

OpenClaw v2026.5.28 ships a fix for one of the nastier versions of this: smart-quoted argument repair for edit arrays, hardened so a recovered call no longer corrupts valid input (#86611). Paired with the runtime’s tool-schema quarantine behavior, it changes how a self-hosted agent handles bad tool output. Instead of failing the turn or, worse, silently rewriting good arguments into broken ones, the runtime repairs what it safely can and isolates what it cannot. This post explains where tool call errors actually come from, what the fix changes, and how to reason about the trade-off between repairing and rejecting a call.

Why AI agent tool call errors are mostly an argument problem

When an agent calls a tool, two separate things have to go right. The model has to choose the correct tool, and it has to produce arguments that match that tool’s schema. The first is a reasoning task. The second is a formatting task, and formatting is where most failures live.

A 2026 PromptLayer write-up on debugging LLM apps puts it bluntly: vague tool descriptions and silent JSON errors cause more production incidents than bad model decisions. A Medium analysis of agent flakiness reaches the same conclusion, calling tool argument rot the leading cause of agents that work in a demo and fall over in production. The pattern shows up across frameworks: n8n’s community forum and r/n8n are full of “tool calling completely broken” threads that trace back to the model generating malformed tool call JSON, not to the model choosing the wrong action.

The reasons are mechanical:

  • Smart quotes. Models trained on prose emit typographic quotes where code needs straight ASCII quotes. A JSON parser rejects the result.
  • Over-escaping. A model that has seen a lot of escaped strings will double-escape, turning a valid \n into a literal backslash-n inside an already-quoted field.
  • Array edge cases. Multi-edit tools take an array of edits. One stray character anywhere in that array can fail the parse for every edit in the batch.
  • Schema drift. A plugin advertises a tool whose schema the runtime cannot fully represent, so any call against it is born invalid.

The common thread: the model’s intent was fine. The bytes were not.

What “repair” and “quarantine” mean

OpenClaw handles bad tool output two ways, and the distinction matters.

Repair means the runtime fixes the bytes and runs the call. This is appropriate when the corruption is unambiguous and reversible. A smart quote that should obviously be a straight quote can be repaired without guessing at intent.

Quarantine means the runtime refuses to expose or run a tool whose schema it cannot trust, and reports that it did so. The v2026.5.27 and v2026.5.28 notes describe the runtime quarantining unsupported tool schemas and reporting quarantined dynamic tools rather than letting a malformed tool definition poison the catalog.

ApproachWhen it appliesRisk it avoids
RepairCorruption is unambiguous (smart quotes, known escape patterns)Throwing away a call the model got right in substance
QuarantineThe tool schema itself is unsupported or malformedRunning a call against a definition the runtime cannot validate
RejectRepair would require guessing at intentCorrupting valid input by “fixing” the wrong thing

The third row is the one the v2026.5.28 fix is really about. Naive repair is dangerous. If you over-eagerly rewrite arguments, you can turn a call the model got right into one that is subtly wrong, and that failure is far harder to debug than an honest parse error.

What the v2026.5.28 fix actually changes

The release note is one sentence: harden smart-quoted argument repair for edit arrays and exact escaped arguments so model-produced tool calls recover without corrupting valid input (#86611).

Read it carefully, because the load-bearing clause is the second half. The repair path already existed. The problem was that under some inputs, repairing a smart-quoted edit array could mangle an argument that was actually fine, especially when a string contained an exact escaped sequence the repair logic treated as a quote to normalize. The fix tightens the repair so it recovers the genuinely broken case without touching the genuinely valid one.

For an edit array specifically, this is high-stakes. A multi-edit call applies a batch of find-and-replace operations to a file. If the repair logic corrupts one edit’s search or replace string while “fixing” quotes, the agent can write the wrong content to disk and report success. That is the worst class of bug: a silent, confident, wrong write. Hardening the repair so valid input survives untouched is what makes the recovery safe to rely on.

This sits alongside a broader v2026.5.28 push to reject malformed values at the boundary rather than late: browser tool timeouts, viewport and tab indices, schema array refs, and channel progress callbacks now reject bad input where it enters instead of letting it propagate (#82887, #87883). The philosophy is consistent: catch corruption early, repair only what is safe, and never let a malformed value travel deeper wearing a valid disguise.

How to reason about tool call recovery in your own setup

You do not configure the repair logic directly, but understanding it changes how you debug and design around tool calls.

  1. Stop blaming the model first. When a tool call fails, check whether the arguments were malformed before you assume the model reasoned badly. The fix is usually at the formatting layer, and prompt-tuning the reasoning will not touch it.
  2. Prefer fewer, simpler tool arguments. Every nested array and deeply escaped string is a surface for corruption. A tool that takes three flat string arguments survives messy model output far better than one that takes a single deeply nested JSON blob.
  3. Watch for quarantine reports. If a dynamic or plugin tool goes missing from the catalog, the runtime may have quarantined it for an unsupported schema. That is a signal to fix the plugin’s tool definition, not a bug in the agent.
  4. Keep the runtime current. Tool-argument recovery is exactly the kind of fix that accrues quietly across releases. Running an old build means eating parse failures that a newer one repairs cleanly.

For self-hosted operators choosing models, this also feeds into cost decisions. A cheaper model that emits slightly messier JSON is more viable when the runtime repairs the easy cases for you. See best cheap models for OpenClaw for how that trade-off plays out, and the LLM idle watchdog post for the adjacent class of failure where a stream stalls instead of malforming.

How tool call recovery fits the rest of the stack

Tool argument repair is one layer of a larger reliability story. The model proposes, the runtime validates and repairs, the tool executes, and the channel delivers the result. A failure at any layer reads to the user as “the agent broke.” OpenClaw’s approach is to harden each boundary independently so a problem at one layer does not silently corrupt the next. If you are evaluating self-hosted agent platforms on reliability, the what is OpenClaw overview covers how the tool, runtime, and channel layers are separated, which is what makes per-boundary hardening like this possible.

The takeaway: AI agent tool call errors are mostly malformed arguments, not bad decisions, and the right response is surgical. Repair what is unambiguous, quarantine what cannot be trusted, reject what would require a guess, and never corrupt valid input in the name of fixing broken input.

FAQ

What causes most AI agent tool call errors? Malformed arguments, not bad reasoning. The model usually picks the right tool but emits arguments with smart quotes, over-escaped strings, or array edge cases that fail JSON parsing. Industry write-ups consistently rank tool argument formatting above model decision quality as the cause of agent flakiness.

What does OpenClaw v2026.5.28 change about tool calls? It hardens smart-quoted argument repair for edit arrays and exact escaped arguments so a recovered tool call no longer corrupts valid input (#86611). The repair existed before; the fix stops it from mangling arguments that were already correct.

What is the difference between repairing and quarantining a tool call? Repair fixes unambiguous corruption (like a curly quote that should be straight) and runs the call. Quarantine refuses to expose a tool whose schema the runtime cannot trust and reports that it did so. Repair acts on bad arguments; quarantine acts on bad tool definitions.

Can argument repair make things worse? It can if done naively. Over-eager repair can rewrite a valid argument into a broken one, producing a silent wrong result that is harder to debug than an honest parse error. The v2026.5.28 fix specifically guards against this by leaving valid input untouched.

How do I reduce tool call errors in my own agent? Prefer tools with fewer, flatter arguments, keep the runtime current so you get the latest repair logic, and check whether failures are malformed arguments before assuming the model reasoned badly. Watch for quarantine reports as a sign a plugin’s tool schema needs fixing.

Sources: