Skip to content

CI/CD Integration

Run deterministic test scripts in CI/CD pipelines without an LLM.

Clicker Binary Required

E2E tests require the W3Pilot clicker binary, which is not yet publicly distributed. The examples below assume you have access to the clicker binary. See Prerequisites for details.

Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                         Development Workflow                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   Developer writes         LLM explores &         Script saved           │
│   Markdown test plan  ──▶  records actions   ──▶  to repo               │
│                            (with MCP)                                    │
│                                                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                           CI/CD Pipeline                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   git push  ──▶  CI runs  ──▶  w3pilot run test.json  ──▶  Pass/Fail    │
│                  headless         (no LLM needed)                        │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Benefits

Benefit Description
No LLM costs Scripts run without API calls
Deterministic Same inputs → same outputs
Fast No LLM latency
Auditable Scripts are version-controlled
Parallelizable Run multiple scripts concurrently

GitHub Actions

Basic Workflow

name: E2E Tests

on:
  workflow_dispatch:
    inputs:
      clicker_url:
        description: 'URL to download clicker binary'
        required: true
        type: string

jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.22'

      - name: Download Clicker
        run: |
          curl -L -o clicker "${{ github.event.inputs.clicker_url }}"
          chmod +x clicker
          echo "W3PILOT_CLICKER_PATH=$PWD/clicker" >> $GITHUB_ENV

      - name: Install W3Pilot CLI
        run: go install github.com/plexusone/w3pilot/cmd/w3pilot@latest

      - name: Run E2E Tests
        env:
          W3PILOT_HEADLESS: "1"
        run: |
          w3pilot run tests/login.json
          w3pilot run tests/checkout.json

Manual Trigger

Until the clicker is publicly distributed, E2E workflows should use workflow_dispatch with a clicker_url input rather than automatic triggers on push/PR.

Matrix Strategy

Run tests in parallel:

jobs:
  e2e:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        test: [smoke, auth, checkout, search]

    steps:
      - uses: actions/checkout@v4

      - name: Download Clicker
        run: |
          curl -L -o clicker "${{ github.event.inputs.clicker_url }}"
          chmod +x clicker
          echo "W3PILOT_CLICKER_PATH=$PWD/clicker" >> $GITHUB_ENV

      - name: Setup
        run: go install github.com/plexusone/w3pilot/cmd/w3pilot@latest

      - name: Run ${{ matrix.test }} tests
        env:
          W3PILOT_HEADLESS: "1"
        run: |
          for script in tests/${{ matrix.test }}/*.json; do
            w3pilot run "$script"
          done

Upload Artifacts on Failure

      - name: Run tests
        run: w3pilot run tests/e2e.json

      - name: Upload screenshots on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: screenshots
          path: screenshots/
          retention-days: 7

Scheduled Runs

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours
  workflow_dispatch:        # Manual trigger

GitLab CI

e2e:
  image: golang:1.22

  variables:
    CLICKER_URL: "$CLICKER_URL"  # Set in CI/CD variables
    W3PILOT_HEADLESS: "1"

  before_script:
    - curl -L -o clicker "$CLICKER_URL"
    - chmod +x clicker
    - export W3PILOT_CLICKER_PATH=$PWD/clicker
    - go install github.com/plexusone/w3pilot/cmd/w3pilot@latest

  script:
    - w3pilot run tests/smoke.json
    - w3pilot run tests/auth.json

  artifacts:
    when: on_failure
    paths:
      - screenshots/
    expire_in: 1 week

  rules:
    - when: manual  # Manual trigger until clicker is public

CircleCI

version: 2.1

jobs:
  e2e:
    docker:
      - image: cimg/go:1.22
    steps:
      - checkout
      - run:
          name: Download Clicker
          command: |
            curl -L -o clicker "$CLICKER_URL"
            chmod +x clicker
            echo "export W3PILOT_CLICKER_PATH=$PWD/clicker" >> $BASH_ENV
      - run:
          name: Install W3Pilot CLI
          command: go install github.com/plexusone/w3pilot/cmd/w3pilot@latest
      - run:
          name: Run E2E Tests
          environment:
            W3PILOT_HEADLESS: "1"
          command: |
            w3pilot run tests/smoke.json
            w3pilot run tests/auth.json
      - store_artifacts:
          path: screenshots
          destination: screenshots

workflows:
  test:
    jobs:
      - e2e:
          # Manual approval until clicker is public
          type: approval

Jenkins

pipeline {
    agent any

    parameters {
        string(name: 'CLICKER_URL', description: 'URL to download clicker binary')
    }

    environment {
        W3PILOT_HEADLESS = '1'
    }

    stages {
        stage('Setup') {
            steps {
                sh '''
                    curl -L -o clicker "${CLICKER_URL}"
                    chmod +x clicker
                '''
                sh 'go install github.com/plexusone/w3pilot/cmd/w3pilot@latest'
            }
        }

        stage('E2E Tests') {
            environment {
                W3PILOT_CLICKER_PATH = "${WORKSPACE}/clicker"
            }
            steps {
                sh 'w3pilot run tests/smoke.json'
                sh 'w3pilot run tests/auth.json'
            }
        }
    }

    post {
        failure {
            archiveArtifacts artifacts: 'screenshots/**', fingerprint: true
        }
    }
}

Azure Pipelines

trigger: none  # Manual trigger until clicker is public

parameters:
  - name: clickerUrl
    displayName: 'Clicker Download URL'
    type: string

pool:
  vmImage: 'ubuntu-latest'

steps:
  - task: GoTool@0
    inputs:
      version: '1.22'

  - script: |
      curl -L -o clicker "${{ parameters.clickerUrl }}"
      chmod +x clicker
      echo "##vso[task.setvariable variable=W3PILOT_CLICKER_PATH]$(pwd)/clicker"
    displayName: 'Download Clicker'

  - script: go install github.com/plexusone/w3pilot/cmd/w3pilot@latest
    displayName: 'Install W3Pilot CLI'

  - script: |
      export W3PILOT_HEADLESS=1
      w3pilot run tests/smoke.json
      w3pilot run tests/auth.json
    displayName: 'Run E2E Tests'

  - task: PublishBuildArtifacts@1
    condition: failed()
    inputs:
      pathToPublish: 'screenshots'
      artifactName: 'screenshots'

Test Organization

Recommended structure:

tests/
├── e2e/
│   ├── smoke/
│   │   ├── homepage.json
│   │   └── navigation.json
│   ├── auth/
│   │   ├── login.json
│   │   ├── logout.json
│   │   └── password-reset.json
│   ├── checkout/
│   │   ├── add-to-cart.json
│   │   └── purchase.json
│   └── search/
│       └── basic-search.json
├── plans/
│   ├── smoke.md           # Markdown test plans
│   ├── auth.md
│   └── checkout.md
└── README.md

Environment Variables

Variable Description Default
W3PILOT_HEADLESS Run headless false
W3PILOT_DEBUG Enable debug logs false
W3PILOT_CLICKER_PATH Path to clicker Auto-detect
W3PILOT_TIMEOUT Default timeout 30s

Best Practices

1. Use Headless Mode

Always set W3PILOT_HEADLESS=1 in CI:

env:
  W3PILOT_HEADLESS: "1"

2. Set Appropriate Timeouts

CI environments may be slower:

{
  "name": "CI Test",
  "timeout": "60s",
  "steps": [...]
}

3. Capture Screenshots on Failure

Add screenshot steps for debugging:

{
  "steps": [
    {"action": "navigate", "url": "https://example.com"},
    {"action": "screenshot", "file": "screenshots/step1.png"},
    {"action": "click", "selector": "#submit"},
    {"action": "screenshot", "file": "screenshots/step2.png"}
  ]
}

4. Use continueOnError for Non-Critical Steps

{
  "action": "click",
  "selector": "#optional-banner-close",
  "continueOnError": true
}

5. Parallelize Independent Tests

Use matrix strategies to run tests concurrently.

6. Version Control Test Scripts

  • Store scripts in Git alongside code
  • Review script changes in PRs
  • Track test evolution over time

Debugging CI Failures

Enable Debug Logging

env:
  W3PILOT_DEBUG: "1"

Download Artifacts

Screenshots and logs uploaded as artifacts help debug failures.

Run Locally

Reproduce CI failures locally:

W3PILOT_HEADLESS=1 w3pilot run tests/failing-test.json

Accessibility Testing in CI/CD

For WCAG 2.2 accessibility testing in CI/CD, use agent-a11y:

name: Accessibility

on:
  workflow_dispatch:
    inputs:
      clicker_url:
        description: 'URL to download clicker binary'
        required: true

jobs:
  wcag:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Download Clicker
        run: |
          curl -L -o clicker "${{ github.event.inputs.clicker_url }}"
          chmod +x clicker
          echo "W3PILOT_CLICKER_PATH=$PWD/clicker" >> $GITHUB_ENV

      - name: Setup
        run: go install github.com/agentplexus/agent-a11y/cmd/agent-a11y@latest

      - name: Run WCAG 2.2 AA Evaluation
        env:
          W3PILOT_HEADLESS: "1"
        run: |
          agent-a11y vpat https://staging.example.com \
            --format json --output wcag-results.json

      - name: Upload WCAG Results
        uses: actions/upload-artifact@v4
        with:
          name: wcag-results
          path: wcag-results.json

agent-a11y combines:

  • Automated testing (~40% coverage) - axe-core rule-based checks
  • Specialized automation (~25% coverage) - keyboard, focus, reflow tests
  • LLM-as-a-Judge (~25% coverage) - semantic evaluation (optional)

See the agent-a11y documentation for details.

Example: Complete Workflow

See .github/workflows/e2e.yaml in this repository for a complete example.