We've covered building vision agents with Claude 4 and Gemini 2.5. Today we complete the trifecta: the OpenAI Agents SDK paired with Anchor Browser for production-grade web automation.
The Agents SDK ships with native tool support, built-in tracing, and a clean handoff model for multi-agent workflows. Anchor supplies the browser infrastructure—persistent sessions, residential proxies, and stable CDP endpoints. The two fit together well.
Why the Agents SDK?
Three things make it compelling for browser work:
- Tool definitions are first-class. Browser actions are typed Python functions; the model decides when and how to call them.
- Built-in tracing. Every tool call, model turn, and handoff is automatically recorded—critical for debugging flaky automation.
- Guardrails API. Input and output validators prevent the agent from taking destructive actions (form submissions, purchases) without an explicit confirmation step.
Setup
pip install openai openai-agents playwright python-dotenv
playwright install chromium
ANCHOR_API_KEY=your_anchor_key
OPENAI_API_KEY=your_openai_key
Defining Browser Tools
The Agents SDK discovers tools from decorated async functions. Anchor's CDP endpoint is the browser handle—one line to connect.
import os, asyncio, base64
from playwright.async_api import async_playwright
from agents import Agent, Runner, function_tool
CDP_URL = f"wss://connect.anchorbrowser.io?apiKey={os.environ['ANCHOR_API_KEY']}"
_page = None # shared across tool calls within a run
async def get_page():
global _page
if _page is None:
pw = await async_playwright().start()
browser = await pw.chromium.connect_over_cdp(CDP_URL)
ctx = browser.contexts[0]
_page = ctx.pages[0] if ctx.pages else await ctx.new_page()
return _page
@function_tool
async def navigate(url: str) -> str:
"""Navigate to a URL and return the page title."""
page = await get_page()
await page.goto(url, wait_until="domcontentloaded")
return await page.title()
@function_tool
async def screenshot() -> str:
"""Take a screenshot and return it as a base64-encoded PNG string."""
page = await get_page()
data = await page.screenshot(type="png")
return base64.b64encode(data).decode()
@function_tool
async def click(selector: str) -> str:
"""Click the first element matching the CSS selector."""
page = await get_page()
await page.click(selector)
return f"Clicked {selector}"
@function_tool
async def type_text(selector: str, text: str) -> str:
"""Type text into an input field matching the CSS selector."""
page = await get_page()
await page.fill(selector, text)
return f"Typed into {selector}"
@function_tool
async def get_text(selector: str) -> str:
"""Return the visible inner text of an element matching the CSS selector."""
page = await get_page()
return await page.inner_text(selector)
Building the Agent
browser_agent = Agent(
name="WebAgent",
instructions="""You are a reliable browser automation agent.
Always take a screenshot before interacting with a new page.
Confirm the current URL matches expectations before acting.
If an element is not found, describe what you see and ask for guidance.""",
tools=[navigate, screenshot, click, type_text, get_text],
model="gpt-4o",
)
Running a Task
async def run_task(task: str):
result = await Runner.run(browser_agent, input=task)
print(result.final_output)
asyncio.run(run_task(
"Go to news.ycombinator.com and return the titles of the top 5 stories."
))
Adding Guardrails
Stop the agent from submitting any form without human confirmation:
from agents import input_guardrail, GuardrailFunctionOutput
@input_guardrail
async def no_form_submit(ctx, agent, input_data):
risky = any(w in str(input_data).lower() for w in ("submit", "checkout", "purchase"))
return GuardrailFunctionOutput(
output_info={"flagged": risky},
tripwire_triggered=risky,
)
safe_agent = Agent(
name="WebAgent",
instructions="...",
tools=[navigate, screenshot, click, type_text, get_text],
input_guardrails=[no_form_submit],
model="gpt-4o",
)
Multi-Agent Handoffs
The Agents SDK's handoff model lets you split scraping, analysis, and reporting into dedicated agents that pass context cleanly—so you're not paying for expensive browser-capable models on tasks that don't need them.
from agents import handoff
analyst_agent = Agent(
name="Analyst",
instructions="Analyse the data gathered by the web agent and summarise key findings.",
model="gpt-4o-mini", # cheaper model for non-browser work
)
browser_agent = Agent(
name="WebAgent",
instructions="Gather data from the web. When done, hand off to the Analyst.",
tools=[navigate, screenshot, get_text],
handoffs=[handoff(analyst_agent)],
model="gpt-4o",
)
Production Tips
- Warm up sessions. Connect to Anchor and load the start page before your task loop to avoid cold-start latency on the first action.
- Use structured output. Set
output_typeto a Pydantic model so downstream code gets typed results rather than free text to parse. - Enable tracing in CI. Set
OPENAI_AGENTS_TRACE=1to dump full traces to stdout—invaluable in headless environments where you can't watch the browser. - Scope sessions to tasks. Create a fresh Anchor session per agent run in production; session isolation prevents state leaking between concurrent jobs.
Anchor handles session lifecycle, proxy routing, and browser fingerprinting. The Agents SDK handles model orchestration, tool dispatch, and tracing. Together they cover the full stack for production browser automation—without glue code.
Try Anchor free and run your first OpenAI Agents SDK workflow in minutes → anchorbrowser.io



