Skip to main content
Discovery is the process BrowserWire uses to turn your browsing session into a typed API. You navigate a website normally; BrowserWire watches what’s on the page and what you interact with, then compiles that observation into a manifest — a machine-readable contract your agent can call.
Discovery runs locally. Your page snapshots and interaction data are sent only to your configured LLM provider and to the BrowserWire CLI running on your machine. Nothing is uploaded to external BrowserWire servers.

How a discovery session works

1

Start an exploration session

After installing the Chrome extension and starting the BrowserWire CLI, open any website and click Start Exploring in the BrowserWire sidepanel. From this point, BrowserWire begins watching.
2

The extension captures page snapshots

As you navigate and interact with the site, the Chrome extension captures snapshots of the live page — the DOM structure, visible text, element roles, and the interactions you perform (clicks, form fills, navigation). Each captured interaction is recorded as a trace event tied to the current session.
3

A vision LLM perceives the page

BrowserWire sends page snapshots to your configured vision LLM (OpenAI, Anthropic, Gemini, or Ollama). The LLM identifies:
  • Entities — what domain objects are present (a login form, a product listing, a search bar)
  • Actions — what operations are available on each entity (fill a field, click a button, submit a form)
  • Views — what structured data can be extracted (a list of results, a price, a status indicator)
The LLM also assigns human-readable semantic names to each discovered item, so your agent gets descriptions like “Add to Cart button” rather than raw CSS selectors.
4

Locators are synthesized

For each identified element, BrowserWire generates a set of locator strategies — multiple independent ways to target the same element. Strategies include ARIA role + name, CSS selector, XPath, data-testid attribute, and DOM path. Each strategy is assigned a confidence score.Having multiple strategies means the runtime can fall back gracefully if the page layout changes slightly between sessions.
5

The manifest is compiled

BrowserWire assembles all discovered entities, actions, views, and locators into a versioned manifest. Each item carries a confidence score and a provenance record so you can inspect exactly how it was learned.Once compiled, the manifest is immediately available via the REST API and OpenAPI spec.

Confidence levels

Every discovered item — entity, action, or view — carries a confidence score reflecting how reliably BrowserWire identified it.
LevelScore rangeWhat it means
high0.85 – 1.0The item was identified with strong signal. Safe to use in automated workflows.
medium0.50 – 0.84The item was identified but some signals were ambiguous. Review before relying on it in production.
low0.0 – 0.49The item was tentatively identified. Manual review is recommended.
You can inspect confidence scores by fetching the raw manifest at /api/sites/:slug/manifest. Low-confidence items may improve if you run another discovery session and interact more thoroughly with those parts of the page.

Provenance tracking

BrowserWire records where each manifest item came from through provenance metadata. Every entity, action, and view carries:
FieldDescription
sourceWho or what produced the definition: human, agent, or hybrid
sessionIdThe exploration session that captured it
traceIdsThe specific interaction traces from that session
capturedAtWhen the item was captured (ISO 8601 timestamp)
The three provenance sources have distinct meanings:
  • human — The item was identified from an interaction you performed directly (e.g. you filled in a form field)
  • agent — The item was inferred entirely by the LLM from a page snapshot without direct user interaction
  • hybrid — The item combined human interaction data with LLM inference
Provenance lets you understand the quality and origin of each API definition without re-running the full discovery pipeline.

The role of the LLM provider

BrowserWire requires a vision-capable LLM to perceive page content. The LLM is used at two stages of the pipeline: identifying what’s on the page from screenshots, and assigning semantic names to discovered items. You configure your provider once, and all discovery sessions use it:
export BROWSERWIRE_LLM_PROVIDER=openai
export BROWSERWIRE_LLM_API_KEY=sk-...
Supported providers and their default models:
ProviderDefault model
openaigpt-4o
anthropicclaude-sonnet-4-20250514
geminigemini-2.5-flash
ollamallama3
Ollama runs locally and requires no API key, but discovery quality depends on the vision capabilities of the model you have installed.
The quality and thoroughness of your exploration session directly influences the quality of the manifest. Visiting more pages, interacting with more elements, and triggering different states (logged in vs. logged out, empty vs. populated lists) all produce a richer, higher-confidence manifest.