Discovery

Discovery is the process BrowserWire uses to turn your browsing session into a typed API. You navigate a website normally; BrowserWire watches what’s on the page and what you interact with, then compiles that observation into a manifest — a machine-readable contract your agent can call.

Discovery runs locally. Your page snapshots and interaction data are sent only to your configured LLM provider and to the BrowserWire CLI running on your machine. Nothing is uploaded to external BrowserWire servers.

How a discovery session works

Start an exploration session

After installing the Chrome extension and starting the BrowserWire CLI, open any website and click Start Exploring in the BrowserWire sidepanel. From this point, BrowserWire begins watching.

The extension captures page snapshots

As you navigate and interact with the site, the Chrome extension captures snapshots of the live page — the DOM structure, visible text, element roles, and the interactions you perform (clicks, form fills, navigation). Each captured interaction is recorded as a trace event tied to the current session.

A vision LLM perceives the page

BrowserWire sends page snapshots to your configured vision LLM (OpenAI, Anthropic, Gemini, or Ollama). The LLM identifies:

Entities — what domain objects are present (a login form, a product listing, a search bar)
Actions — what operations are available on each entity (fill a field, click a button, submit a form)
Views — what structured data can be extracted (a list of results, a price, a status indicator)

The LLM also assigns human-readable semantic names to each discovered item, so your agent gets descriptions like “Add to Cart button” rather than raw CSS selectors.

Locators are synthesized

For each identified element, BrowserWire generates a set of locator strategies — multiple independent ways to target the same element. Strategies include ARIA role + name, CSS selector, XPath, data-testid attribute, and DOM path. Each strategy is assigned a confidence score.Having multiple strategies means the runtime can fall back gracefully if the page layout changes slightly between sessions.

The manifest is compiled

BrowserWire assembles all discovered entities, actions, views, and locators into a versioned manifest. Each item carries a confidence score and a provenance record so you can inspect exactly how it was learned.Once compiled, the manifest is immediately available via the REST API and OpenAPI spec.

Confidence levels

Every discovered item — entity, action, or view — carries a confidence score reflecting how reliably BrowserWire identified it.

Level	Score range	What it means
`high`	0.85 – 1.0	The item was identified with strong signal. Safe to use in automated workflows.
`medium`	0.50 – 0.84	The item was identified but some signals were ambiguous. Review before relying on it in production.
`low`	0.0 – 0.49	The item was tentatively identified. Manual review is recommended.

You can inspect confidence scores by fetching the raw manifest at /api/sites/:slug/manifest. Low-confidence items may improve if you run another discovery session and interact more thoroughly with those parts of the page.

Provenance tracking

BrowserWire records where each manifest item came from through provenance metadata. Every entity, action, and view carries:

Field	Description
`source`	Who or what produced the definition: `human`, `agent`, or `hybrid`
`sessionId`	The exploration session that captured it
`traceIds`	The specific interaction traces from that session
`capturedAt`	When the item was captured (ISO 8601 timestamp)

The three provenance sources have distinct meanings:

human — The item was identified from an interaction you performed directly (e.g. you filled in a form field)
agent — The item was inferred entirely by the LLM from a page snapshot without direct user interaction
hybrid — The item combined human interaction data with LLM inference

Provenance lets you understand the quality and origin of each API definition without re-running the full discovery pipeline.

The role of the LLM provider

BrowserWire requires a vision-capable LLM to perceive page content. The LLM is used at two stages of the pipeline: identifying what’s on the page from screenshots, and assigning semantic names to discovered items. You configure your provider once, and all discovery sessions use it:

export BROWSERWIRE_LLM_PROVIDER=openai
export BROWSERWIRE_LLM_API_KEY=sk-...

Supported providers and their default models:

Provider	Default model
`openai`	`gpt-4o`
`anthropic`	`claude-sonnet-4-20250514`
`gemini`	`gemini-2.5-flash`
`ollama`	`llama3`

Ollama runs locally and requires no API key, but discovery quality depends on the vision capabilities of the model you have installed.

The quality and thoroughness of your exploration session directly influences the quality of the manifest. Visiting more pages, interacting with more elements, and triggering different states (logged in vs. logged out, empty vs. populated lists) all produce a richer, higher-confidence manifest.

Get Started

Core Concepts

Configuration

Guides

Reference

How a discovery session works

Confidence levels

Provenance tracking

The role of the LLM provider

Get Started

Core Concepts

Configuration

Guides

Reference

​How a discovery session works

​Confidence levels

​Provenance tracking

​The role of the LLM provider

How a discovery session works

Confidence levels

Provenance tracking

The role of the LLM provider