Building Agentic End-to-End Testing with Playwright and Claude Hooks

End-to-end tests are useful on their own, but they become much more powerful when they are part of an agentic coding loop.

Recently I added Playwright to RetroGo. RetroGo is an online retrospective tool. It does not need any sign up or login. You can create a room, add cards, merge similar cards, vote, discuss the results, and then end the room.

This flow is important to me. If I change something in the room logic, voting logic, or frontend, I want to know that the main flow still works.

The first goal was to add tests for this flow. The second goal was to make those tests useful for AI coding.

Starting with Playwright

The first step was simple: add Playwright to the project.

Playwright is a good fit here because it tests the app from the browser. It opens the page, clicks buttons, fills inputs, and checks what the user can see.

That is what I need for RetroGo. The important behavior is not only inside one function or one API endpoint. I want to test the full flow:

Create a room
Add cards
Merge cards
Move to the voting stage
Cast votes
Move to the discussion step
End the room

This type of test gives me more confidence because it checks how the parts work together.

Running the API and Frontend Together

RetroGo has both an API server and a frontend. So the Playwright tests need both of them running before the browser starts.

Playwright supports this with webServer. I added two web servers to the config. One starts the Go API and the other starts the frontend.

webServer: [
  {
    command: 'go run .',
    cwd: path.join(__dirname, 'api'),
    url: 'http://localhost:8080/health',
    name: 'API',
    timeout: 30 * 1000,
    reuseExistingServer: !process.env.CI,
    env: { ALLOWED_ORIGIN: 'http://localhost:4200' },
    stdout: 'pipe',
    stderr: 'pipe',
  },
  {
    command: 'npm start',
    cwd: path.join(__dirname, 'ui'),
    url: 'http://localhost:4200',
    name: 'Frontend',
    timeout: 90 * 1000,
    reuseExistingServer: !process.env.CI,
    stdout: 'pipe',
    stderr: 'pipe',
  },
],

Getting this right took a few tries.

The 404 Problem

My first attempt used a room endpoint as the health check URL:

url: 'http://localhost:8080/api/rooms/404NoRoom',

The idea was simple. If the server was running, it would handle the request and return 404 because that room does not exist.

But Playwright does not treat 404 as a successful health check. For webServer.url, Playwright expects a successful response, usually 2xx or 3xx. A 404 means the server may have handled the request, but Playwright still treats it as not ready.

So the server was running, but Playwright kept waiting until the timeout:

[API] 2026/06/19 13:13:30 retro api listening on :8080
Error: Timed out waiting 30000ms from config.webServer.

The fix was to use a dedicated health check URL that returns 200. After that, Playwright could detect that the API was ready immediately.

The WebSocket Origin Problem

The second issue was more subtle. The API server started fine, the frontend started fine, the tests started running, but the app never connected. The WebSocket handshake was being rejected.

The Go server checks the Origin header on WebSocket connections. In production the allowed origin is https://retrogo.online. That is the default if the ALLOWED_ORIGIN environment variable is not set.

When Playwright runs the tests, the frontend is at http://localhost:4200. That origin does not match retrogo.online, so every WebSocket connection was refused.

The fix was to pass the environment variable to the API server through the Playwright config:

env: { ALLOWED_ORIGIN: 'http://localhost:4200' },

This is easy to miss because the server starts without errors. The failure only appears later when the app tries to connect.

Adding Simple Test Commands

I added two scripts to package.json:

{
  "scripts": {
    "test": "npx playwright test",
    "test:ui": "npx playwright test --ui"
  }
}

The normal test command is for the terminal. The test:ui command opens Playwright UI, which is useful when I want to debug a test or watch the flow step by step.

Testing the Real RetroGo Flow

Once the environment was ready, I wrote tests for the complete RetroGo flow.

The test creates a room, adds cards, merges related cards, moves the room to voting, casts votes, moves to discussion, and ends the room.

This is the happy path, but it is a very important happy path. If this flow is broken, the product is broken.

I wrote the selectors using getByRole and getByLabel instead of CSS classes. This is more resilient because it tests what the user actually sees, not implementation details of the HTML.

const wellSection = page.locator('section').filter({
  has: page.getByRole('heading', { name: 'What went well?' }),
});

await wellSection.getByLabel('What went well?').fill('Deploys were smooth');
await wellSection.getByRole('button', { name: /Add card/ }).click();
await expect(wellSection.getByRole('article')).toHaveCount(1);

If I rename a CSS class the test still passes. If I remove the button or change its label, the test fails. That is the right behavior.

Making the Loop Agentic

After the tests were ready, I wanted to make them part of the AI coding workflow.

The idea is simple:

Agent edits code
Tests run automatically
Agent reads the failure
Agent fixes the issue
Tests run again

The key word is reads. This only works if the test output is specific enough that the agent knows what to fix without guessing.

When a Playwright test fails, the list reporter gives output like this:

  ok 1 - Room creator flow - Step 1: Create a room (1.2s)
  ok 2 - Room creator flow - Step 2: Join as facilitator (0.8s)
  x  3 - Room creator flow - Step 3: Add cards in the collecting phase (5.0s)

    Error: expect(received).toHaveCount(expected)

    Expected: 1
    Received: 0

    Locator: locator('section').filter(...).getByRole('article')

That is actionable. The agent knows which step failed, what the assertion was, and what the locator is. It can look at the code that was just changed and understand why the count is 0.

Compare that to a generic timeout message with no context. The agent would start guessing and the guesses are usually wrong.

The list reporter is configured intentionally for this reason:

// list gives readable terminal output
reporter: [['list'], ['html', { open: 'never' }]],

The html report is still generated for humans to browse, but the list reporter is what goes into the terminal and into the hook output that the agent reads.

Piping Server Output to the Agent

There is one more part that matters for agent visibility: stdout: 'pipe' and stderr: 'pipe' on both web servers.

When the API or frontend fails to start, the error shows up in the test output prefixed with the server name:

[API] api/server.go:15:2: undefined: MissingType
Error: Process from config.webServer exited early.

Without this, the agent sees only the Playwright error. With it, the agent sees the actual cause. A Go compile error, a missing environment variable, a port conflict. All of that appears directly in the output the agent is reading.

This matters more than it sounds. If the agent cannot see why the server failed, it will start changing test code or app code to work around the problem. With the real output, it knows exactly what to fix.

Adding a Claude Hook

To connect tests to the coding loop, I added a hook that runs after the agent edits or writes a file:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm run test"
          }
        ]
      }
    ]
  }
}

The hook runs npm run test after every edit. The test output feeds back to the agent as context for the next step. Now the agent does not need to be told to run tests. It gets the result automatically and can use it to decide what to do next.

Adding Test Instructions for the Agent

The hook gives the agent access to test results. The instructions tell the agent how to use them.

## Testing

Use Playwright tests as the main feedback loop for user-facing changes.

When you change the API, frontend, room flow, voting flow, or anything that can affect the RetroGo user journey, run:

```bash
npm run test
```

The Playwright config starts both the API and frontend before running the tests.

If tests fail:

- Read the terminal output carefully before editing more code.
- Use the Playwright failure, server stdout, and server stderr as feedback.
- Fix the real issue instead of changing or bypassing the test.
- Do not remove assertions only to make the test pass.
- Do not skip tests unless the user explicitly asks for it.
- After a fix, run the tests again.

If the failure is caused by server startup:

- Check the API and frontend output from stdout and stderr.
- Look for build errors, port conflicts, or missing environment variables.
- Fix the startup problem first, then run the tests again.

The most important rule is the last one in each section: after a fix, run again. The loop is not useful if it only runs once.

Final Thoughts

For me, agentic testing means giving the coding agent a feedback loop with enough signal to act on.

First I added Playwright. Then I spent time getting the environment right: the health check endpoint and the WebSocket origin configuration. Then I wrote tests using role-based selectors. Finally I connected everything to Claude through a hook.

The setup cost was real. Getting two servers to start reliably, in the right order, with the right environment, took longer than writing the tests themselves. But the payoff is that npm run test is now a single command that works predictably, and every code edit automatically tells the agent whether the main flow still works.

That is closer to how I want to use AI for development. Not only asking it to write code, but giving it a system that can tell it when something is broken.