Snapback

Snapback is an online card game that a friend and I adapted from a game we have played for years. We had always talked about making it playable online, partly so we could play each other more easily and partly to see whether other people would enjoy it too.

The product direction is heavily inspired by Chess.com: accounts, profiles, replayable games, and a rating system. I chose Elo early on, then added a readable notation format so a game can be replayed by pasting the notation back into the engine. That became useful quickly, especially when debugging exact sequences of moves.

This is also my main experiment in agentic development. The game logic is complex enough that it gives the agents real work to do, rather than becoming a simple CRUD app with a game skin.

Problem/Context

I had tried to build Snapback more than once before and always hit the same wall: the rules are detailed, game state changes quickly, and a small mistake can break the whole experience. This version started with the rules. I confirmed them with the co-creator, then used that as the base for product requirement documents.

The work is split into MVPs, epics, and tasks. MVPs mark visible milestones, epics group functional areas, and tasks keep the agent work small enough to review.

The MVP plan is:

Make the game playable against a bot that understands legal moves, with account management included
Add player-versus-player games with the Elo system integrated
Support games with up to four players, plus spectators
Polish the UI for launch, including player guides and tutorials

I also have separate epics that can happen out of order, depending on time:

Add audio to the game
Add translations for common languages
Improve observability for game engine issues
Add Storybook and UI testing, and work towards WCAG 2.2
Add game customisation, including card faces, colours, and preferences

Role/Contribution

I defined the project guidelines, set the success criteria for each MVP, and broke the work into tasks. I gave the agents room to make implementation decisions, but I still pushed for specific choices where I cared about the outcome. I asked for TypeScript, Next.js, and Resend, among others. The agent chose the shape of the game engine. I originally expected Phaser to be useful, but the agent argued it was unnecessary, and I accepted that as part of the experiment.

I also pushed the agents towards CLI-based tooling where possible. Vercel's CLI worked particularly well. Vercel Blob and Vercel Analytics still needed some manual setup through the UI, but most of the deployment workflow could be driven from the terminal.

Through Cursor Skills, I built a workflow where the agent picks up the first incomplete task, checks whether the task still matches the codebase, and reports only critical issues before starting. Earlier versions of the Skill encouraged too many suggestions, so I tightened the instructions. After implementation, the Skill asks for manual testing notes across happy paths and edge cases, and gives me a commit message to use after I verify the work.

I commit changes manually. That gives me a natural review point and stops the codebase filling up with changes I have not checked. I also tried not to inspect or edit the code directly during the build, because the point was to learn how much of the work I could hand over.

Agentic Development

Test-driven development was the rule from the start. I wanted the tests to catch things I might miss after reading a lot of generated code. They also became a proof point: I could understand what had been built, see it pass, and use that as a review aid.

I used different agents for different kinds of work. Opus 4.6 handled UI and design tasks. GPT-5.5 handled planning, housekeeping, content, and deeper functionality. Codex 5.3 worked through plans created by GPT-5.5. Composer 2 was useful for subagents because it was quick and the delegated work was usually small.

Prettier and ESLint were non-negotiable. They caught small issues, including dependency array problems in useEffect. I used Fallow before releases to find dead code and bloat, and commitlint kept commit scopes consistent.

CLI

Over time, I built a list of commands the agents could run without asking me. By the end, only a few categories stayed in my "Ask Every Time" list:

Anything python related, because I wanted to understand any Python scripts before they ran
Git push commands, because I wanted to control when GitHub Actions started
PNPM add commands, because I wanted to research packages first

UI And Animation

There was no design system at the start. I gave the agents broad control of the UI, with Chess.com as the quality reference and orange as the core colour. I asked for tailwind, shadcn, and Radix so the app started from accessible primitives and prebuilt components.

Animation needed the most feedback. Agents often missed z-index issues, chose easing values without asking, or placed cards almost correctly but not quite where they needed to land. Framer animation debugging was difficult because the agent could not see the animation properly. Browser MCP helped less than expected, but Chrome DevTools profiling was useful. I recorded the interaction, triggered the animation, and shared screenshots from the performance trace so the agent had concrete visual context.

For server-dependent animations, such as loading spinners, I throttled my own connection while profiling. I also liked how often the agents reached for modern CSS tools such as clamp, oklch, and svh when they made sense.

CI/CD

Agents created pull requests through the GitHub CLI and wrote their own PR summaries. Cursor Bugbot ran on pull requests into main. I often had to run Bugbot many times before it was satisfied, so I used screenshots of its feedback as input for another agent with a Skill for triaging Bugbot comments.

That workflow made each comment deliberate. The agent first judged whether the feedback was valid. If it was, it moved into planning, I checked the plan, and then the fix was queued. I wanted separate commits for each piece of feedback so the history stayed easy to follow.

Vercel preview deployments were created automatically for manual QA. CI covered TypeScript, linting, formatting, dead code checks through Fallow, unit tests, E2E tests, and smoke tests. Most of that existed to catch errors I might miss while reviewing agent-generated work.

Architecture

I pushed for Next.js because the marketing and account parts benefit from server-side rendering, while the game itself can hydrate on the client. I also insisted on TypeScript. With agents writing a lot of code, I wanted types to catch incorrect calls early.

GPT-5.5 suggested Neon, PostgreSQL, and Drizzle when I asked for a serverless backend outside AWS. Neon looked cost-effective for the traffic I expect, PostgreSQL gives the project room to grow, and Drizzle keeps the data layer typed while staying close to SQL.

Because the app is built on Next.js, it leans into Vercel services. Vercel Blob stores player avatars, Vercel Analytics handles usage data, and Turborepo caching keeps the build pipeline quicker.

Testing

The agents used Vitest for unit and component tests, Playwright for browser coverage, and Fast Check for property testing. I pushed for a Page Object Model in Playwright so the suite can grow without becoming hard to maintain.

Playwright runs against a separate Neon database, keeping test data away from production while staying close to the real user flow. The suite includes smoke tests and E2E tests. Smoke tests run in CI for dev merges, the E2E suite runs nightly, and main merges require the broader checks.

Unit and component tests are split across the game engine, server code, application components, libraries, and scripts. They run in CI for every branch.

Observability

Vercel Analytics tracks general usage, and Sentry captures errors. MVP 1 also includes a bug reporting system in the app. Players can report a bug from Settings or from inside a game.

In-game reports include game notation and metadata such as move history and time played. That means I can recreate the scenario quickly. I later added a button that exports the bug details into an agent-friendly prompt, so research and fixes can start from the report itself.

There is also an internal admin area for issues. It shows reported bugs, tracks progress, supports assignment, and can link records to GitHub Issues and Sentry issue IDs.

Security

Because Snapback includes accounts, I wanted to understand the security decisions clearly. Authentication uses NextAuth with JWT sessions. Passwords are hashed and encrypted, and Zod validates sign-in server-side.

The app supports user roles so I can enrol people into testing programmes and control which UI certain users see. Rate limiting is handled in memory for endpoints, with extra limits configured in Vercel.

Accessibility

Accessibility still needs more work, especially before the mobile and assistive-device push. Even so, the game is largely keyboard-controllable, and sr-only announcements provide context without cluttering the interface.

Performance

The site loads well and the game does not feel sluggish on mobile. React Compiler is enabled in the Next.js build, so a lot of memoisation is handled automatically.

I have started recording metrics to benchmark future improvements. Performance was not the main target for MVP 1, but it becomes more important as PvP work begins.

Outcome

The first MVP cost around £200 in tokens and took about two weeks. I deliberately used higher-end models because the point was to test where agentic development currently stands. Building the same application by hand would have taken months, so the time saved has been significant.

The game is currently in open beta. After enough game-engine testing, the next major step is moving from the bot experience to player-versus-player games. I also want to explore moving task management into a tool like Linear or Dex, so agents can interact with the work queue through an MCP server instead of relying on internal task files.

Tech stack

Languages

Frontend

Backend

Infrastructure

State & Data

Build & Tooling

Testing

Quality & Linting

Auth & Integrations

Link

Summary

Problem/Context

Role/Contribution

Agentic Development

CLI

UI And Animation

CI/CD

Architecture

Testing

Observability

Security

Accessibility

Performance

Outcome