smolagents + Anchor: Building Production Browser Agents in Python

Hands On
Jun 15
by Idan Raman
smolagents + Anchor: Building Production Browser Agents in Python

HuggingFace's smolagents is the minimalist agent library that's quietly become one of the most-starred AI projects on GitHub. Its core idea is elegantly simple: agents that think in code. A CodeAgent writes Python snippets to perform actions—which means fewer LLM calls, less prompt engineering, and far better composability than JSON-based tool calling.

Combine that with Anchor's managed cloud browsers—session isolation, anti-bot resilience, and built-in fingerprint management—and you have a stack that handles real-world browser automation without any infrastructure overhead.

Why smolagents?

  • Code agents reduce steps: writing Python directly cuts task steps by ~30% vs. JSON tool-calling patterns
  • Model-agnostic: swap between OpenAI, Anthropic, or any HuggingFace model in a single line
  • Minimal surface area: the entire core is under 1,000 lines—easy to audit, easy to extend
  • Built-in tracing: every generated code step is logged, making debugging and replay straightforward

Setup

pip install smolagents anchor-browser playwright
playwright install chromium
export ANCHOR_API_KEY=your_anchor_key
export OPENAI_API_KEY=your_openai_key

Creating an Anchor Session

Anchor provisions a fresh browser with its own fingerprint and residential proxy. The session exposes a WebSocket endpoint that Playwright connects to over CDP.

import os
from anchor_browser import AnchorClient
from playwright.sync_api import sync_playwright

anchor = AnchorClient(api_key=os.environ["ANCHOR_API_KEY"])
session = anchor.sessions.create(
    proxy_country="us",
    options={"adblock": True}
)

playwright = session.get_playwright()
page = playwright.chromium.connect_over_cdp(
    session.ws_endpoint
).contexts[0].pages[0]

Defining Browser Tools

Wrap the Playwright page in typed tools that smolagents can call from generated code:

from smolagents import tool
import base64

@tool
def navigate(url: str) -> str:
    'Navigate the browser to a URL and return the page title.'
    page.goto(url)
    return page.title()

@tool
def get_text(selector: str) -> str:
    'Extract visible text from a CSS selector on the current page.'
    return page.locator(selector).inner_text()

@tool
def screenshot() -> str:
    'Take a screenshot of the current page, returns base64-encoded PNG.'
    return base64.b64encode(page.screenshot()).decode()

Building the Agent

from smolagents import CodeAgent, OpenAIServerModel

model = OpenAIServerModel(model_id="gpt-4o-mini")

agent = CodeAgent(
    tools=[navigate, get_text, screenshot],
    model=model,
    max_steps=10,
)

Running a Real Task

result = agent.run(
    "Go to news.ycombinator.com, find the top 5 posts about AI agents, "
    "and return their titles and point counts as a JSON list."
)
print(result)
# [
#   {"title": "Show HN: smolagents 2.0 ...", "points": 412},
#   {"title": "Agent frameworks compared ...", "points": 298},
#   ...
# ]

Because the agent writes Python rather than calling predefined steps, it can inspect the page structure at runtime and adapt when the DOM changes—no brittle XPath selectors or recorded click-coordinates.

Swapping the Model

One line to switch to a local HuggingFace model for cost-sensitive or air-gapped workloads:

from smolagents import HfApiModel

model = HfApiModel("Qwen/Qwen2.5-72B-Instruct")
agent = CodeAgent(tools=[navigate, get_text, screenshot], model=model)

Production Tips

  • Reuse sessions: create the Anchor session once and share the page object across tools—don't open a new browser per step
  • Structured output: return data as a JSON string from your tools and let the agent reason over it; smolagents handles parsing naturally in generated code
  • Tracing: smolagents logs every generated code step to stdout—pipe it to your observability stack for full replay and cost accounting
  • Session cleanup: call session.close() in a finally block to release the cloud browser resource promptly
  • Parallelism: spawn N Anchor sessions simultaneously to run independent agent tasks concurrently—see our parallel agents guide for patterns

What's Next

The smolagents CodeAgent is a natural fit for multi-session parallelism: each agent gets its own isolated Anchor browser, and results are aggregated in the orchestrating process. Start with a single session to validate your tools, then scale out.

Ready to build? Get your Anchor API key

Stay ahead in browser automation

We respect your inbox. Privacy policy

Welcome aboard! Thanks for signing up
Oops! Something went wrong while submitting the form.