🎯

playwright-testing

🎯Skill

from chongdashu/phaserjs-oakwoods

What it does

Automates frontend testing across unit, integration, E2E, visual, and accessibility layers using Playwright and other testing frameworks.

📦

Part of

chongdashu/phaserjs-oakwoods(2 items)

playwright-testing

Installation

PythonRun Python server

python scripts/imgdiff.py baseline.png current.png --out diff.png

📖 Extracted from docs: chongdashu/phaserjs-oakwoods

Need more details? View full documentation on GitHub →

6Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Overview

# Frontend Testing

Unlock reliable confidence fast: enable safe refactors by choosing the right test layer, making the app observable, and eliminating nondeterminism so failures are actionable.

Philosophy: Confidence Per Minute

Frontend tests fail for two reasons: the product is broken, or the test is lying. Your job is to maximize signal and minimize "test is lying".

Before writing a test, ask:

What user risk am I covering (money, progression, auth, data loss, crashes)?
What's the narrowest layer that catches this bug class (pure logic vs UI vs full browser)?
What nondeterminism exists (time, RNG, async loading, network, animations, fonts, GPU)?
What "ready" signal can I wait on besides setTimeout?
What should a failure print/screenshot so it's diagnosable in CI?

Core principles:

Test the contract, not the implementation: assert stable user-meaningful outcomes and public seams.
Prefer determinism over retries: make time/RNG/network controllable; remove flake at the source.
Observe like a debugger: console errors, network failures, screenshots, and state dumps on failure.
One critical flow first: a reliable smoke test beats 50 flaky tests.

Test Layer Decision Tree

Pick the cheapest layer that provides needed confidence:

| Layer | Speed | Use For |

|-------|-------|---------|

| Unit | Fastest | Pure functions, reducers, validators, math, pathfinding, deterministic simulation |

| Component | Medium | UI behavior with mocked IO (React Testing Library, Vue Testing Library) |

| E2E | Slowest | Critical user flows across routing, storage, real bundling/runtime |

| Visual | Specialized | Layout/pixel regressions; for canvas/WebGL, only after locking determinism |

Quick Start: First Smoke Test

Define 1 critical flow: "page loads → user can start → one key action works"
Add a test seam to the app (see below)
Choose runner: Playwright MCP for E2E, unit tests for logic
Fail loudly: treat console errors and failed requests as test failures
Stabilize: seed RNG, freeze time, fix viewport, disable animations

Concrete MCP Workflow: Testing a Game

Step-by-step sequence for testing a Phaser/canvas game:

```

mcp__playwright__browser_navigate

→ http://localhost:3000?test=1&seed=42

mcp__playwright__browser_evaluate

→ () => new Promise(r => { const c = () => window.__TEST__?.ready ? r(true) : setTimeout(c, 100); c(); })

(Wait for game ready)

mcp__playwright__browser_console_messages

→ level: "error"

(Fail if any errors)

mcp__playwright__browser_snapshot

→ Get UI state and refs

mcp__playwright__browser_click

→ element: "Start Button", ref: [from snapshot]

mcp__playwright__browser_evaluate

→ () => window.__TEST__.state()

(Assert game state is correct)

mcp__playwright__browser_press_key

→ key: "ArrowRight" (or WASD for movement)

mcp__playwright__browser_evaluate

→ () => window.__TEST__.state().player.x

(Verify movement happened)

mcp__playwright__browser_take_screenshot

→ filename: "gameplay-state.png"

(Visual evidence after deterministic setup)

```

Recommended Test Seams

Add to the app for testability (read-only, stable, minimal):

```javascript

window.__TEST__ = {

ready: false, // true after first interactive frame

seed: null, // current RNG seed

sceneKey: null, // current scene/route

state: () => ({ // JSON-serializable snapshot

scene: this.sceneKey,

player: { x, y, hp },

score: gameState.score,

entities: entities.map(e => ({ id: e.id, type: e.type, x: e.x, y: e.y }))

}),

commands: { // optional mutation commands

reset: () => {},

seed: (n) => {},

skipIntro: () => {}

}

};

```

Rule: Expose IDs + essential fields, not raw Phaser/engine objects.

Anti-Patterns to Avoid

❌ Testing the wrong layer: E2E tests for pure logic

Why tempting: "Let's just test everything through the browser"

Better: Unit tests for logic; reserve E2E for integration contracts

❌ Testing implementation details: Asserting DOM structure/classnames

Why tempting: Easy to assert what you can see in DevTools

Better: Assert user-meaningful outputs (text, score, HP changes)

❌ Sleep-driven tests: wait 2s then click

Why tempting: Simple and "works on my machine"

Better: Wait on explicit readiness (DOM marker, window.__TEST__.ready)

❌ Uncontrolled randomness: RNG/time in assertions

Why tempting: "The game uses random, so the test should too"

Better: Seed RNG (?seed=42), freeze time, assert stable invariants

❌ Pixel snapshots without determinism: Canvas screenshots that flake

Why tempting: "I'll catch visual bugs automatically"

Better: Deterministic mode first; then screenshot at known stable frames

❌ Retries as a strategy: "Just bump retries to 3"

Why tempting: Quick fix that makes CI green

Better: Fix the flake source; retries hide real problems

Debugging Failed Tests

When a test fails, gather evidence in this order:

Console errors: mcp__playwright__browser_console_messages({ level: "error" })
Network failures: mcp__playwright__browser_network_requests() → check for non-2xx
Screenshot: mcp__playwright__browser_take_screenshot() → visual state at failure
App state: mcp__playwright__browser_evaluate({ function: "() => window.__TEST__.state()" })
Classify the flake (see references/flake-reduction.md):

- Readiness? → add explicit wait

- Timing? → control animation/physics

- Environment? → lock viewport/DPR

- Data? → isolate test data

Graduation Criteria: When Is Testing "Enough"?

Minimum viable test suite:

[ ] 1 smoke test that proves the app loads and primary action works
[ ] Test seam exists (window.__TEST__ with ready flag and state)
[ ] Deterministic mode for canvas/games (?test=1 enables seeding)
[ ] Console errors fail tests (no silent failures)
[ ] CI runs tests on every push

Level up when:

Critical paths (auth, payment, save/load) have dedicated E2E
Unit tests cover complex logic (pathfinding, damage calc, state machines)
Visual regression on key screens (menu, HUD) with locked determinism

Visual Regression with imgdiff.py

For pixel comparison of screenshots:

```bash

# Compare baseline to current

python scripts/imgdiff.py baseline.png current.png --out diff.png

# Allow small tolerance (anti-aliasing differences)

python scripts/imgdiff.py baseline.png current.png --max-rms 2.0

```

Exit codes: 0 = identical, 1 = different, 2 = error

UI Slicing Regressions (Nine-Slice / Ribbons / Bars)

Canvas UI issues (panel seams, segmented ribbons, invisible HUD fills) are best caught with a dedicated UI harness instead of the full gameplay flow.

Build a simple test.html/scene that loads only the UI assets.
Render raw slices next to assembled panels (multi-size), and include ribbon/bars with both “raw crop + scale” and “stitched multi-slice” views.
Expose window.__TEST__ with .commands.showTest(n) so Playwright can toggle each mode deterministically.
Capture targeted screenshots (panels, ribbons, bars) and diff them in CI.

See references/phaser-canvas-testing.md for the deterministic setup + screenshot workflow.

Variation Guidance

Adapt approach based on context:

DOM app: Standard Playwright selectors, wait for text/elements
Canvas game: Test seams mandatory, wait via window.__TEST__.ready
Hybrid: DOM for menus, test seams for gameplay
CI-only GPU: May need software rendering flags or skip visual tests
UI slicing regressions: For nine-slice/ribbon/bar artifacts, prefer a small UI harness scene/page with deterministic modes and targeted screenshots (references/phaser-canvas-testing.md).

Bundled Resources

Read these when needed:

references/playwright-mcp-cheatsheet.md: Detailed MCP tool patterns
references/phaser-canvas-testing.md: Deterministic mode for Phaser games
references/flake-reduction.md: Flake classification and fixes

Remember

You can make almost any frontend (including canvas/WebGL games) testable by adding a tiny, stable seam for readiness + state. One reliable smoke test is the foundation. Aim for tests that are boring to maintain: deterministic, explicit about readiness, and rich in failure evidence. The goal is confidence, not coverage numbers.

More from this repository1

🎯

phaser-gamedev🎯Skill

Develops 2D browser games using Phaser 3, enabling scene management, sprite creation, physics simulation, and interactive game mechanics.