Claude for Chrome Just Dropped Tool-Calls for Browser Control. Here's What Replaced Them.

Technical Dive
Jul 5
by Idan Raman
Beyond the Tool-Call Loop

Anthropic's Claude for Chrome extension quietly shipped something worth studying: a "Quick Mode" that throws out the standard computer-use tool-calling loop entirely. No tool definitions in the request, no tool_use/tool_result JSON round-tripping—just a handful of single-letter commands the model writes as plain text, parsed straight off the wire. Anthropic clocks it at roughly 3x faster than the standard loop. The interesting part isn't the speedup—it's what the change says about where a browser agent's latency actually comes from.

The tool-call tax nobody budgets for

In a normal computer-use loop, every step pays the same tax. The model returns a tool_use block—type, id, name, a nested input dict with an action and coordinates. You execute it, wrap the result in a tool_result block with a screenshot, and resend the whole conversation. None of that scaffolding is the decision; all of it is envelope. Multiply by a few hundred steps in a long-running agent and the JSON wrapping costs more tokens—and more round-trip time—than the clicks themselves.

What Quick Mode sends instead

Quick Mode sends tools: [] and a stop sequence, and lets the model answer in a compact grammar instead of a schema: short lines like a click coordinate, a string to type, a key to press. The extension parses the text directly and executes it over the Chrome DevTools Protocol, then feeds the next screenshot back as a plain message instead of a structured tool result. The difference per action is stark:

# a single click as a standard tool call
{
  "type": "tool_use",
  "id": "toolu_01A3B9F2",
  "name": "computer",
  "input": {"action": "left_click", "coordinate": [412, 208]}
}
# ~90 characters of envelope for one click

# the same click as a compact command
c 412,208
# 8 characters

Building the same loop yourself

You don't need Anthropic's extension to get this—you need raw mouse, keyboard, and screenshot access to a browser, plus a model willing to follow a short grammar you define in the prompt. Anchor's Python SDK hands you a live Playwright Browser for a cloud session, which is all the executor needs:

import os
from anchorbrowser import Anchorbrowser

anchor_client = Anchorbrowser(api_key=os.environ["ANCHOR_API_KEY"])
session = anchor_client.sessions.create()

COMMANDS = {
    "c": lambda page, args: page.mouse.click(*map(float, args.split(","))),
    "t": lambda page, args: page.keyboard.type(args),
    "k": lambda page, args: page.keyboard.press(args),
    "s": lambda page, args: page.mouse.wheel(*map(float, args.split(","))),
}

def run_commands(page, text: str) -> None:
    for line in text.strip().splitlines():
        code, _, args = line.strip().partition(" ")
        if code in COMMANDS:
            COMMANDS[code](page, args)

with anchor_client.browser.connect(session.data.id) as browser:
    page = browser.contexts[0].pages[0]
    page.goto("https://example.com")

    model_output = "c 412,208\nt browser agents\nk Enter"
    run_commands(page, model_output)
    page.screenshot(path="after.png")

The prompt still has to teach the model the grammar, and you still send a screenshot every turn—that overhead doesn't go away. What disappears is the JSON envelope around every single action, for the entire life of the task.

Where this stops paying off

Compact protocols trade the model's native tool-use training—which is heavily tuned toward valid, schema-checked JSON—for a bespoke grammar it only knows from a few lines of your prompt. Free text is more fragile to parse than a validated schema: one malformed line silently no-ops instead of raising a clear error. That's likely why Quick Mode ships behind an effort parameter (high/medium/low/none) rather than replacing tool-calling outright—treat it as a mode for high-frequency, low-ambiguity steps, with real tool-calling as the fallback for anything that needs a schema's guardrails.

Try it on your own sessions

Anchor has written before about why token usage explodes in browser agents; Quick Mode is a concrete example of the fix being protocol design rather than a bigger context window. Every Anchor session already gives you the raw Playwright surface this pattern needs—spin one up, drop your own command grammar in front of the model, and see how much of your token bill was JSON scaffolding all along.

Stay ahead in browser automation

We respect your inbox. Privacy policy

Welcome aboard! Thanks for signing up
Oops! Something went wrong while submitting the form.