Skip to content

Architecture Overview

System Architecture

W3Pilot uses a dual-protocol architecture connecting to a single Chrome browser via both WebDriver BiDi and Chrome DevTools Protocol (CDP):

┌─────────────────────────────────────────────────────────────────────────┐
│                              User Layer                                  │
├─────────────────┬─────────────────┬─────────────────┬──────────────────┤
│    Go Client    │   MCP Server    │      CLI        │  Script Runner   │
│      SDK        │   (75+ tools)   │    (vibium)     │  (w3pilot run)  │
├─────────────────┴─────────────────┴─────────────────┴──────────────────┤
│                           w3pilot Core                                │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐     │
│  │  Pilot   │ │ Element  │ │ Keyboard │ │  Mouse   │ │  Touch   │     │
│  │ (page)   │ │ (DOM)    │ │ (input)  │ │ (input)  │ │ (input)  │     │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘     │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐     │
│  │ Context  │ │  Clock   │ │ Tracing  │ │  Route   │ │   CDP    │     │
│  │(session) │ │ (time)   │ │(capture) │ │(network) │ │(profiling)│    │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘     │
├─────────────────────────────────────────────────────────────────────────┤
│                      Dual Protocol Layer                                │
│  ┌─────────────────────────────────┐  ┌────────────────────────────┐   │
│  │        BiDi Client              │  │       CDP Client           │   │
│  │  (page automation, DOM, events) │  │ (profiling, emulation)     │   │
│  └─────────────────┬───────────────┘  └─────────────┬──────────────┘   │
│                    │                                │                   │
├────────────────────┼────────────────────────────────┼───────────────────┤
│                    ▼                                ▼                   │
│         VibiumDev Clicker                   Chrome DevTools             │
│     (vibium:* + WebDriver BiDi)            (CDP WebSocket)              │
├─────────────────────────────────────────────────────────────────────────┤
│                    Chrome / Chromium                                    │
│             (single browser instance)                                   │
└─────────────────────────────────────────────────────────────────────────┘

Protocol Responsibilities

Protocol Layer Features
WebDriver BiDi Via clicker Page automation, element interactions, screenshots, tracing, events
Chrome DevTools Protocol Direct Heap profiling, network response bodies, CPU/network emulation

Component Descriptions

Go Client SDK

The core programmatic API for browser automation:

  • Pilot: Page-level operations (navigation, screenshots, JS evaluation)
  • Element: DOM element interactions (click, type, fill, state queries)
  • Input Controllers: Low-level keyboard, mouse, touch control
  • Context: Isolated browser sessions with separate cookies/storage
  • Network: Request interception and modification
  • CDP: Direct Chrome DevTools Protocol access for profiling and emulation

CDP Client

Direct Chrome DevTools Protocol access for advanced features:

  • Heap Profiler: Capture V8 heap snapshots for memory analysis
  • Network Emulation: Simulate Slow 3G, Fast 3G, 4G, or custom conditions
  • CPU Emulation: Throttle CPU for performance testing (2x, 4x, 6x slowdown)
  • Direct Commands: Send any CDP command for advanced use cases

MCP Server

Model Context Protocol server for AI assistant integration:

  • 75+ browser automation tools
  • Session management with test reporting
  • Script recording capability
  • Structured error messages with suggestions

CLI

Command-line interface for scripted automation:

  • Subcommand structure (w3pilot launch, w3pilot click, etc.)
  • Session persistence between commands
  • YAML/JSON script execution

Script Runner

Deterministic test execution:

  • JSON/YAML script format with JSON Schema
  • Variable interpolation
  • Assertions and data extraction
  • Error handling with continueOnError

Data Flow

MCP Tool Call Flow

Claude                    MCP Server              Vibe              Browser
  │                           │                    │                   │
  │──── navigate ────────────▶│                    │                   │
  │                           │──── Go(url) ──────▶│                   │
  │                           │                    │── BiDi request ──▶│
  │                           │                    │◀── BiDi event ────│
  │                           │◀─── url, title ────│                   │
  │◀─── NavigateOutput ───────│                    │                   │

Session Recording Flow

Claude                    MCP Server              Recorder
  │                           │                      │
  │── start_recording ───────▶│                      │
  │                           │──── Start() ────────▶│
  │                           │                      │
  │──── navigate ────────────▶│                      │
  │                           │── RecordNavigate() ─▶│
  │                           │                      │
  │──── click ───────────────▶│                      │
  │                           │── RecordClick() ────▶│
  │                           │                      │
  │──── export_script ───────▶│                      │
  │                           │◀── ExportJSON() ────│
  │◀─── JSON script ──────────│                      │

Feature Origin

Component Origin Notes
BiDi client Upstream WebDriver BiDi protocol
Vibe API Upstream Parity with JS/Python
Element API Upstream Parity with JS/Python
Input controllers Upstream Parity with JS/Python
MCP server Go-specific AI assistant integration
CLI Go-specific Command-line automation
Script runner Go-specific Deterministic replay
Session recording Go-specific LLM action capture
JSON Schema Go-specific Script validation
Test reporting Go-specific Structured diagnostics

Key Design Decisions

Dual Protocol Architecture

W3Pilot uses both WebDriver BiDi and Chrome DevTools Protocol (CDP):

WebDriver BiDi (via VibiumDev clicker) for:

  • Standardization across browsers
  • Bidirectional events (no polling)
  • Future-proof design
  • Page automation (navigation, DOM, interactions)

Chrome DevTools Protocol (CDP) for:

  • Heap profiling (not available in BiDi)
  • Network response bodies (not exposed in BiDi)
  • CPU/network emulation presets
  • Any Chrome-specific DevTools feature

Both protocols connect to the same Chrome browser instance, discovered via the DevToolsActivePort file in Chrome's user data directory.

Protocol-Agnostic Methods

Some SDK methods are protocol-agnostic - they try BiDi first and automatically fall back to CDP when BiDi doesn't support the feature:

User calls SetOffline(true)
┌─────────────────────────────┐
│ Try BiDi command            │
│ vibium:network.setOffline   │
└─────────────┬───────────────┘
              ▼ (fails: "unknown command")
┌─────────────────────────────┐
│ Detect unsupported command  │
│ IsUnsupportedCommand(err)   │
└─────────────┬───────────────┘
┌─────────────────────────────┐
│ Fall back to CDP            │
│ EmulateNetwork(Offline)     │
└─────────────┬───────────────┘
         Success

Protocol-agnostic methods:

Method BiDi Command CDP Fallback
SetOffline() vibium:network.setOffline EmulateNetwork(NetworkOffline)
ConsoleMessages() vibium:console.messages ConsoleEntries()
ClearConsoleMessages() vibium:console.clear consoleDebugger.Clear()

This design ensures:

  • Stable API - Users don't need to change code when BiDi support is added
  • Best available backend - Automatically uses the most capable implementation
  • Graceful degradation - Clear errors when neither protocol supports the feature

Custom Commands

W3Pilot extends BiDi with vibium:* commands for:

  • High-level actions (fill, check, selectOption)
  • Actionability checks (wait for visible, enabled, stable)
  • Page-level operations (screenshot, PDF, evaluate)

MCP Architecture

The MCP server uses the Model Context Protocol for:

  • Standardized tool definitions
  • Structured input/output
  • Easy AI assistant integration

Session Recording

Recording captures tool calls (not raw BiDi) for:

  • Human-readable scripts
  • Portability (same format as CLI scripts)
  • Easy editing and customization