Vercel AI SDK + Anchor: Browser Agents in TypeScript

Hands On
Jun 13
by Idan Raman
Vercel AI SDK + Anchor: Browser Agents in TypeScript

Most browser agent guides default to Python. The Vercel AI SDK changes that — it brings a provider-agnostic TypeScript interface to LLM tool calling, streaming, and agentic loops that works equally well in a Node.js script or a Next.js API route.

Pair it with Anchor's cloud browser infrastructure and you get a fully typed TypeScript agent that navigates the web, extracts structured data, and handles multi-step tasks without managing headless browser infrastructure yourself.

This guide builds a TypeScript browser agent from scratch using AI SDK v6. By the end you'll have a production-ready pattern ready to drop into any TypeScript project.

Why the Vercel AI SDK?

Three properties make it the right choice for browser agents over raw API calls:

  • Provider-agnostic tool definitions. Define your browser tools once and switch between Claude, GPT-4o, and Gemini by changing a single import — no refactoring required.
  • Built-in agentic loops. generateText with stopWhen: stepCountIs(n) handles multi-step tool use automatically. No while loops, no manual step tracking.
  • First-class streaming. streamText surfaces token output, tool invocations, and step completions as an async iterator — ready to pipe into a Next.js route handler.

Setup

npm install ai @ai-sdk/anthropic anchorbrowser playwright zod
npx playwright install chromium
export ANCHOR_API_KEY="ak-..."
export ANTHROPIC_API_KEY="sk-ant-..."

Defining Browser Tools

The tool() helper takes a Zod schema for inputs and an async execute function. Each call creates a fresh, isolated Anchor session — no shared state, no cookie leakage between tool invocations.

// tools.ts
import Anchorbrowser from 'anchorbrowser';
import { chromium } from 'playwright';
import { tool } from 'ai';
import { z } from 'zod';

const anchor = new Anchorbrowser({ apiKey: process.env.ANCHOR_API_KEY! });

export const navigateTool = tool({
  description: 'Navigate to a URL and return the visible page text.',
  inputSchema: z.object({
    url: z.string().url().describe('The page URL to visit'),
  }),
  execute: async ({ url }) => {
    const session = await anchor.sessions.create({
      session: { maxDuration: 120, idleTimeout: 30 },
    });
    try {
      const browser = await chromium.connectOverCDP(session.data.cdpUrl);
      const page = browser.contexts()[0].pages()[0];
      await page.goto(url, { waitUntil: 'domcontentloaded' });
      const text = await page.innerText('body');
      await browser.close();
      return { url, content: text.slice(0, 8000) };
    } finally {
      await anchor.sessions.terminate(session.id);
    }
  },
});

export const screenshotTool = tool({
  description: 'Take a full-page screenshot and return it as a base64 PNG.',
  inputSchema: z.object({
    url: z.string().url(),
  }),
  execute: async ({ url }) => {
    const session = await anchor.sessions.create({
      session: { maxDuration: 60, idleTimeout: 20 },
    });
    try {
      const browser = await chromium.connectOverCDP(session.data.cdpUrl);
      const page = browser.contexts()[0].pages()[0];
      await page.goto(url, { waitUntil: 'networkidle' });
      const png = await page.screenshot({ fullPage: true, type: 'png' });
      await browser.close();
      return { screenshot: png.toString('base64') };
    } finally {
      await anchor.sessions.terminate(session.id);
    }
  },
});

export const clickTool = tool({
  description: 'Click an element on a page by CSS selector and return the resulting page text.',
  inputSchema: z.object({
    url:      z.string().url(),
    selector: z.string().describe('CSS selector of the element to click'),
  }),
  execute: async ({ url, selector }) => {
    const session = await anchor.sessions.create({
      session: { maxDuration: 60, idleTimeout: 20 },
    });
    try {
      const browser = await chromium.connectOverCDP(session.data.cdpUrl);
      const page = browser.contexts()[0].pages()[0];
      await page.goto(url, { waitUntil: 'domcontentloaded' });
      await page.click(selector);
      await page.waitForLoadState('networkidle');
      const text = await page.innerText('body');
      await browser.close();
      return { clicked: selector, content: text.slice(0, 8000) };
    } finally {
      await anchor.sessions.terminate(session.id);
    }
  },
});

Running a Browser Agent

generateText with stopWhen: stepCountIs lets the model plan and execute tool calls until it has enough information to write a final response. The agent decides when it's done — you set the ceiling.

// agent.ts
import { generateText, stepCountIs } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { navigateTool, screenshotTool } from './tools';

const { text, steps } = await generateText({
  model: anthropic('claude-sonnet-4-6'),
  tools: {
    navigate:   navigateTool,
    screenshot: screenshotTool,
  },
  stopWhen: stepCountIs(10),
  prompt: `
    Research the pricing and main differentiators of three cloud browser
    automation APIs. Return a markdown table with columns: Provider,
    Starting Price, Key Differentiator, Best For.
  `,
});

console.log(text);
console.log(`Completed in ${steps.length} steps`);

Streaming with Next.js

For real-time output in a Next.js route handler, swap generateText for streamText and return the response using toDataStreamResponse().

// app/api/browse/route.ts
import { streamText, stepCountIs } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { navigateTool } from '@/lib/tools';

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const result = streamText({
    model: anthropic('claude-sonnet-4-6'),
    tools: { navigate: navigateTool },
    stopWhen: stepCountIs(8),
    prompt,
  });

  return result.toDataStreamResponse();
}

On the client, hook up useChat from 'ai/react' — it handles streaming state, partial tokens, and tool call display with a few lines of React.

Production Tips

Session warm-up. Anchor sessions take ~300ms to provision. For latency-sensitive paths, pre-warm a pool by calling anchor.sessions.create at server start rather than on the first request.

Structured output. Replace generateText with generateObject and pass a Zod schema when you need a typed JSON object back instead of prose. The result is fully validated and directly consumable by your TypeScript types.

Scoped sessions. Pass proxy and profile options to anchor.sessions.create to route requests through specific regions or reuse authenticated browser profiles — useful when agents need to operate inside logged-in SaaS tools.

Type safety end-to-end. The tool() helper infers input types from your Zod schema. TypeScript catches mismatches between the model's tool call arguments and your execute signature at compile time, not at runtime.

What's Next

This pattern composes cleanly with the rest of the AI SDK ecosystem. Add @ai-sdk/openai or @ai-sdk/google to swap providers without touching your tool definitions. Use generateObject for structured data extraction. Chain multiple agents with the SDK's handoff primitives for research → analysis → synthesis pipelines.

Try Anchor free and run your first Vercel AI SDK browser agent in minutes → anchorbrowser.io

Stay ahead in browser automation

We respect your inbox. Privacy policy

Welcome aboard! Thanks for signing up
Oops! Something went wrong while submitting the form.