Getting started with OpenAI CUA

Hands On

March 12, 2025

Getting started with OpenAI CUA

Automating web interactions has never been easier with OpenAI’s Computer Use Agent (CUA) and Anchor Browser. By leveraging CUA’s AI-powered browser control and Anchor Browser’s scalable cloud-based infrastructure, developers can deploy seamless and efficient web automation solutions. This guide walks you through setting up and customizing your CUA agent for maximum flexibility and control.

Prerequisites

Before getting started, ensure you have the following:

OpenAI API key with Computer Use Agent access
Anchor Browser account and API key
Python 3.8+

Basic Integration

This basic setup will get you up and running with a CUA agent using Anchor Browser as the underlying browser automation platform.

Steps to Integrate

1. Clone the Repository

To begin, you need to clone the sample repository that contains the necessary code and configuration files. This will serve as the foundation for integrating OpenAI's CUA with Anchor Browser.

Once cloned, navigate into the project directory to proceed with the setup.

2. Install the Required Packages

The repository includes a requirements.txt file that lists all the dependencies needed for this integration. Use the following command to install them:

This ensures that your Python environment has all the necessary libraries, including those required for interacting with OpenAI’s API and Anchor Browser.

3. Set the Environment Variables

Before running the agent, you need to configure your API keys. Set the following environment variables with your credentials:

4. Run the Agent

After completing the setup, you can now run the CUA agent using the command below:

This command initiates the agent, instructing it to use Anchor Browser as its browsing environment and execute the provided input. The example above tells the agent to play Tic-Tac-Toe, demonstrating how it interacts with a web-based application.

Customizing the CUA Agent

You can customize the CUA agent by modifying the CLI flags:

--input: The initial input to the agent (optional: the CLI will prompt you for input if not provided).
--debug: Enable debug mode.
--show: Display screenshots during execution.
--start-url: Start the browsing session with a specific URL (default: https://bing.com).

For the full set of capabilities, refer to the OpenAI API documentation.