AG2 + Anchor: Multi-Agent Browser Research in Python

AG2 (the open-source continuation of Microsoft AutoGen) has become the go-to framework for orchestrating multi-agent workflows where specialized agents collaborate, debate, and delegate work. When you pair AG2's conversation-driven model with Anchor's cloud browser infrastructure, you get a research team that can plan, browse the live web, and synthesize findings—all in Python, with no Chromium setup required.

Why AG2?

Most agent frameworks treat the LLM as a single actor. AG2 structures work as a conversation between agents: a Planner that decomposes goals into browsing steps and a Browser Operator that executes them. This mirrors how human research teams actually work—strategy and execution stay in separate lanes.

AG2's ConversableAgent model also lets you run different models per role. The Planner can use GPT-4o for strategic reasoning; the Browser Operator can use a faster model for mechanical actions, cutting costs on high-volume workflows.

Setup

pip install ag2 anchorpy openai

export OPENAI_API_KEY="sk-..."
export ANCHOR_API_KEY="your-anchor-api-key"

Defining Browser Tools

AG2 agents call registered Python functions as tools. We define three browser primitives—navigate, extract, and screenshot—backed by a live Anchor session:

import os
import asyncio
import anchorpy
from ag2 import ConversableAgent, register_function

anchor = anchorpy.AnchorClient(api_key=os.environ["ANCHOR_API_KEY"])
_session = None

async def get_session():
    global _session
    if _session is None:
        _session = await anchor.sessions.create()
    return _session

async def browser_navigate(url: str) -> str:
    # Navigate the browser to a URL and return the page title.
    session = await get_session()
    await session.goto(url)
    title = await session.title()
    return f"Navigated to {url}. Page title: {title}"

async def browser_extract(selector: str) -> str:
    # Extract visible text from a CSS selector on the current page.
    session = await get_session()
    text = await session.inner_text(selector)
    return text[:4000]  # keep within context window

async def browser_screenshot() -> str:
    # Take a screenshot of the current page.
    session = await get_session()
    await session.screenshot(path="/tmp/screen.png")
    return "Screenshot saved to /tmp/screen.png"

Building the Research Team

Two agents with distinct system prompts define the division of labor:

llm_config = {"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}

planner = ConversableAgent(
    name="Planner",
    system_message=(
        "You are a research strategist. Break the user's goal into numbered "
        "browsing steps. After the BrowserOperator finishes each step, "
        "synthesize the findings and issue the next instruction. "
        "Reply TERMINATE when research is complete."
    ),
    llm_config=llm_config,
    human_input_mode="NEVER",
)

browser_operator = ConversableAgent(
    name="BrowserOperator",
    system_message=(
        "You are a web research agent with access to a live Chrome browser. "
        "Execute the Planner's instructions using your registered tools and "
        "report findings concisely after each action."
    ),
    llm_config=llm_config,
    human_input_mode="NEVER",
)

for fn in (browser_navigate, browser_extract, browser_screenshot):
    register_function(
        fn,
        caller=browser_operator,
        executor=browser_operator,
        description=fn.__doc__,
    )

Running a Research Session

result = planner.initiate_chat(
    recipient=browser_operator,
    message=(
        "Research the top three cloud browser platforms for AI agents in 2026. "
        "For each: find the pricing page, list key features, and identify the "
        "target user. Compile a comparison table."
    ),
    max_turns=20,
)

print(result.summary)

The Planner decomposes the goal into steps, the Browser Operator fetches each page and reports structured findings, and the Planner synthesizes results until it emits TERMINATE. A 20-turn cap prevents runaway loops on ambiguous tasks.

Scaling to Three Agents with GroupChat

When your workflow needs more than two roles—say, a Planner, a Browser Operator, and a Report Writer that formats the final output—AG2's GroupChat routes messages automatically:

from ag2 import GroupChat, GroupChatManager

report_writer = ConversableAgent(
    name="ReportWriter",
    system_message=(
        "You receive research findings and format them into a Markdown report "
        "with an executive summary and a comparison table. Do not browse the web."
    ),
    llm_config=llm_config,
    human_input_mode="NEVER",
)

groupchat = GroupChat(
    agents=[planner, browser_operator, report_writer],
    messages=[],
    max_round=30,
)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)

planner.initiate_chat(
    manager,
    message="Research developer-facing AI agent platforms and produce a report.",
)

Production Tips

Isolate sessions per conversation. Reset _session = None before each initiate_chat call so parallel research threads don't share browser state.
Add a Critic agent. A third ConversableAgent with a fact-checking prompt can review the Planner's synthesis before output, reducing hallucinated citations.
Mix models by role. Run the Planner on gpt-4o and the Browser Operator on gpt-4o-mini to cut costs without sacrificing planning quality.
Use Anchor's session replay. Every session is recorded by default—replay recordings to debug agent navigation decisions without re-running the full conversation.

What's Next

Try Anchor free →

AG2 + Anchor: Multi-Agent Browser Research in Python

Why AG2?

Setup

Defining Browser Tools

Building the Research Team

Running a Research Session

Scaling to Three Agents with GroupChat

Production Tips

What's Next

Recent articles

Claude Fable 5 Is the New SOTA Browser Agent — Here's How to Run It in Production

Browser Agents Crash Mid-Task. Here's How to Make Them Resumable.

Browser Agents Have an Amnesia Problem. Here's the Fix.

Stay ahead in browser automation