Claude 4 + Anchor: Building a Production CUA Agent in Python

Hands On
Jun 7
by Idan Raman
Claude 4 + Anchor: Building a Production CUA Agent in Python

Anthropic's Claude 4 family—Opus 4.8 and Sonnet 4.6—ships with significantly improved computer use capabilities. The model can look at a screenshot, decide what to click, type, or scroll, and keep looping until the task is done. It sounds simple. But running CUA reliably in production requires something most local setups can't provide: a stable, persistent, observable browser you don't have to manage.

That's where Anchor comes in.

Why Local Chrome Isn't Enough

Running computer use against a local Chromium instance works fine in a demo. In production, you hit the same four walls fast:

  • Session state evaporates — cookies, local storage, and open tabs vanish between invocations
  • IP reputation — residential and datacenter IPs behave differently on every site
  • Parallelism — spinning up 50 Chromium processes on one machine is a bad time
  • Observability — when the agent does something wrong, you have no replay

Anchor gives you cloud-hosted browser sessions with persistent profiles, stable IPs, live view, and downloadable recordings. Your code calls an API, gets a WebSocket endpoint, and the agent starts working.

Setup

pip install anthropic playwright requests
playwright install chromium
import os, base64, time
import anthropic
import requests
from playwright.sync_api import sync_playwright

ANCHOR_KEY = os.environ["ANCHOR_API_KEY"]
client = anthropic.Anthropic()

Creating a Session

def create_session(profile_id: str | None = None) -> dict:
    payload = {"os_type": "linux", "screen": {"width": 1280, "height": 800}}
    if profile_id:
        payload["profile_id"] = profile_id  # restores cookies from last run

    resp = requests.post(
        "https://api.anchorbrowser.io/v1/sessions",
        headers={"anchor-api-key": ANCHOR_KEY},
        json=payload,
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()  # contains id, cdp_url, live_view_url

Pass a profile_id and Anchor restores the browser to its exact state from the last session—cookies, local storage, and all. No re-authenticating on every run.

The CUA Action Loop

def run_cua_agent(task: str, session: dict) -> str:
    session_id = session["id"]
    cdp_url = session["cdp_url"]

    def screenshot() -> str:
        r = requests.post(
            f"https://api.anchorbrowser.io/v1/sessions/{session_id}/os-control/screenshot",
            headers={"anchor-api-key": ANCHOR_KEY},
        )
        r.raise_for_status()
        return r.json()["screenshot"]  # base64 PNG

    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(cdp_url)
        page = browser.contexts[0].pages[0]
        messages = [{"role": "user", "content": task}]

        while True:
            response = client.messages.create(
                model="claude-opus-4-8",
                max_tokens=4096,
                tools=[{
                    "type": "computer_20250124",
                    "name": "computer",
                    "display_width_px": 1280,
                    "display_height_px": 800,
                }],
                messages=messages,
            )

            if response.stop_reason == "end_turn":
                return next(b.text for b in response.content if hasattr(b, "text"))

            tool_uses = [b for b in response.content if b.type == "tool_use"]
            tool_results = []

            for tu in tool_uses:
                act = tu.input
                action_type = act["action"]

                if action_type == "screenshot":
                    result = [{"type": "image", "source": {
                        "type": "base64", "media_type": "image/png", "data": screenshot(),
                    }}]
                elif action_type == "left_click":
                    x, y = act["coordinate"]
                    page.mouse.click(x, y)
                    result = [{"type": "text", "text": "Clicked."}]
                elif action_type == "type":
                    page.keyboard.type(act["text"])
                    result = [{"type": "text", "text": "Typed."}]
                elif action_type == "key":
                    page.keyboard.press(act["key"])
                    result = [{"type": "text", "text": "Key pressed."}]
                elif action_type == "scroll":
                    x, y = act["coordinate"]
                    page.mouse.wheel(
                        x, y,
                        delta_x=act.get("delta_x", 0),
                        delta_y=act.get("delta_y", 0),
                    )
                    result = [{"type": "text", "text": "Scrolled."}]
                else:
                    result = [{"type": "text", "text": f"{action_type} done."}]

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tu.id,
                    "content": result,
                })

            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

Retries and Cleanup

def run_with_retry(task: str, profile_id: str | None = None, max_attempts: int = 3) -> str:
    for attempt in range(max_attempts):
        session = create_session(profile_id)
        try:
            return run_cua_agent(task, session)
        except Exception as e:
            if attempt == max_attempts - 1:
                raise
            wait = 2 ** attempt
            print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait}s…")
            time.sleep(wait)
        finally:
            requests.delete(
                f"https://api.anchorbrowser.io/v1/sessions/{session['id']}",
                headers={"anchor-api-key": ANCHOR_KEY},
            )

The finally block ensures sessions are cleaned up even on failure. Without it, idle sessions accumulate and inflate your bill.

Where to Go Next

This pattern handles single-task, single-agent workflows. A few natural extensions:

  • Session profiles — skip login flows on authenticated sites by reusing saved browser state
  • Anchor VPN — give your agent a consistent network identity for geo-sensitive tasks or rate-limited sites
  • Model selection by cost — use claude-sonnet-4-6 for high-throughput tasks and reserve claude-opus-4-8 for complex multi-step reasoning
  • Webhooks — get notified when long-running tasks complete instead of blocking on the response

You can watch the agent work in real time via the live_view_url returned by create_session. It's useful during development and invaluable when debugging unexpected behavior in production.

Ready to try it? Start with Anchor's free tier and have a working CUA loop running in under 20 minutes.

Stay ahead in browser automation

We respect your inbox. Privacy policy

Welcome aboard! Thanks for signing up
Oops! Something went wrong while submitting the form.