/library / template-computer-use-agent
templateAI agents

Anthropic Computer Use agent (web automation in a sandbox)

A Claude-driven agent that controls a sandboxed browser to complete real-world web tasks. Docker isolation, screenshot + click + type loop, task templates, run history. Use for QA bots, scraping, form filling, or anything 'go to a website and do a thing for me'.

use whenYou need an agent that can actually USE websites the way a human would. Login, fill forms, click through flows, screenshot results. Not just parse HTML.

May 19, 20262,360 bytesanthropiccomputer-useagentautomationsandbox

[Agent Name] · Computer Use

A Claude-powered web agent. Takes a goal in English, drives a sandboxed Chromium to achieve it, returns a transcript + screenshots.

Source of truth

GitHub. The Docker image is what runs in production. Agent state per run is stored in Postgres (transcript, screenshots, final result). Sessions never share a browser.

Tech stack

Python 3.13 + Anthropic SDK (claude-sonnet-4-6 minimum; Opus for harder tasks). Playwright (Chromium) inside Docker. FastAPI for the run-orchestration API. PostgreSQL for run history + transcripts. S3-compatible storage for screenshots. The agent uses Anthropic's computer_use_20250124 tool spec for screenshot/click/type/key/scroll.

Deploy

Build the Docker image, push to a runner that supports nested containers (Fly Machines, ECS Fargate with privileged tasks, or a beefy EC2 with Docker-in-Docker). Each task spins up an isolated Chromium.

File map

  • agent/loop.py main agent loop: screenshot -> Claude -> action -> screenshot
  • agent/tools.py computer use tool implementations (Playwright wrappers)
  • agent/prompts/ system prompts per task category
  • api/main.py FastAPI: POST /runs to start, GET /runs/{id} for status
  • db/schema.sql runs, transcripts, screenshots tables
  • Dockerfile Python + Playwright + Chromium + Xvfb
  • eval/ task suite with expected outcomes

.env keys

  • ANTHROPIC_API_KEY
  • DATABASE_URL
  • S3_BUCKET, S3_ACCESS_KEY, S3_SECRET
  • MAX_AGENT_STEPS default 50
  • SCREENSHOT_INTERVAL_MS default 1500

Hard rules

  • Every run gets a fresh browser context. NEVER share cookies, localStorage, or auth state across runs unless the user explicitly provides a session cookie for that run.
  • The agent runs in a network-namespaced container with allowlisted domains. No outbound to arbitrary hosts unless declared in the task config.
  • Hard timeout per run: 5 minutes default. Hard step cap: 50 actions. Cost cap: $1.00 of Anthropic spend per run.
  • Capture a screenshot AND the DOM snapshot at every step. The screenshot is what Claude sees; the DOM is for debugging when it fails.
  • NEVER let the agent enter credentials, payment info, or other sensitive data unless the user pre-authorized it in the task config with explicit scope.
  • Log every action with reasoning. You'll need this when 1 in 50 runs goes off the rails.

Recent significant changes

  • 2026-05-19: Scaffolded. Locked: Playwright over Selenium (faster, modern), Docker isolation per run (no shared state), explicit step + cost caps (computer-use can spend fast).

Next session: start here

  1. Build the Docker image. Confirm Playwright + Chromium + Xvfb work in headless mode.
  2. Wire ANTHROPIC_API_KEY. Run python -m agent.loop --task 'go to example.com and tell me the H1'.
  3. Inspect screenshots after each step. The first few runs always reveal prompt gaps.
  4. Build the eval suite BEFORE adding more capabilities. Without evals you can't tell if a prompt change helps or hurts.
  5. Expose /runs API. Auth required (this thing can spend $).

Get the next CLAUDE.md in your inbox.

One new template every week, plus occasional case studies.