Anthropic Computer Use agent (web automation in a sandbox)

A Claude-driven agent that controls a sandboxed browser to complete real-world web tasks. Docker isolation, screenshot + click + type loop, task templates, run history. Use for QA bots, scraping, form filling, or anything 'go to a website and do a thing for me'.

use whenYou need an agent that can actually USE websites the way a human would. Login, fill forms, click through flows, screenshot results. Not just parse HTML.

May 19, 20262,360 bytesanthropiccomputer-useagentautomationsandbox

download .md↓

[Agent Name] · Computer Use

A Claude-powered web agent. Takes a goal in English, drives a sandboxed Chromium to achieve it, returns a transcript + screenshots.

Source of truth

GitHub. The Docker image is what runs in production. Agent state per run is stored in Postgres (transcript, screenshots, final result). Sessions never share a browser.

Tech stack

Python 3.13 + Anthropic SDK (claude-sonnet-4-6 minimum; Opus for harder tasks). Playwright (Chromium) inside Docker. FastAPI for the run-orchestration API. PostgreSQL for run history + transcripts. S3-compatible storage for screenshots. The agent uses Anthropic's computer_use_20250124 tool spec for screenshot/click/type/key/scroll.

Deploy

Build the Docker image, push to a runner that supports nested containers (Fly Machines, ECS Fargate with privileged tasks, or a beefy EC2 with Docker-in-Docker). Each task spins up an isolated Chromium.

File map

agent/loop.py main agent loop: screenshot -> Claude -> action -> screenshot
agent/tools.py computer use tool implementations (Playwright wrappers)
agent/prompts/ system prompts per task category
api/main.py FastAPI: POST /runs to start, GET /runs/{id} for status
db/schema.sql runs, transcripts, screenshots tables
Dockerfile Python + Playwright + Chromium + Xvfb
eval/ task suite with expected outcomes

.env keys

ANTHROPIC_API_KEY
DATABASE_URL
S3_BUCKET, S3_ACCESS_KEY, S3_SECRET
MAX_AGENT_STEPS default 50
SCREENSHOT_INTERVAL_MS default 1500

Hard rules

Every run gets a fresh browser context. NEVER share cookies, localStorage, or auth state across runs unless the user explicitly provides a session cookie for that run.
The agent runs in a network-namespaced container with allowlisted domains. No outbound to arbitrary hosts unless declared in the task config.
Hard timeout per run: 5 minutes default. Hard step cap: 50 actions. Cost cap: $1.00 of Anthropic spend per run.
Capture a screenshot AND the DOM snapshot at every step. The screenshot is what Claude sees; the DOM is for debugging when it fails.
NEVER let the agent enter credentials, payment info, or other sensitive data unless the user pre-authorized it in the task config with explicit scope.
Log every action with reasoning. You'll need this when 1 in 50 runs goes off the rails.

Recent significant changes

2026-05-19: Scaffolded. Locked: Playwright over Selenium (faster, modern), Docker isolation per run (no shared state), explicit step + cost caps (computer-use can spend fast).

Next session: start here

Build the Docker image. Confirm Playwright + Chromium + Xvfb work in headless mode.
Wire ANTHROPIC_API_KEY. Run python -m agent.loop --task 'go to example.com and tell me the H1'.
Inspect screenshots after each step. The first few runs always reveal prompt gaps.
Build the eval suite BEFORE adding more capabilities. Without evals you can't tell if a prompt change helps or hurts.
Expose /runs API. Auth required (this thing can spend $).

← older

Realtime voice agent (OpenAI Realtime API + Twilio)

newer →

RAG search service (Pinecone + OpenAI embeddings + Postgres)

Get the next CLAUDE.md in your inbox.

One new template every week, plus occasional case studies.