Skip to main content

A Backend Testing Framework: What to Test, How, and Where

·39 min read
testingvitestbackendengineering

Testing framework

What to test, how to test it, how to structure tests. For both the human developer and the AI agent writing tests.


0. Core principle: test behavior, not implementation

Every test answers one question: "Does this unit of behavior produce the correct output for a given input?"

A "unit of behavior" is not a function. It is a contract. The contract might be:

  • "Given a task in PENDING status, it transitions to IN_PROGRESS and the database reflects the new state."
  • "Given an unauthenticated request to a protected route, the response is 401 with the correct error shape."
  • "Given a Focus Session with three pause/resume cycles, the computed duration equals the sum of active intervals."

If a test breaks when you refactor internals but the behavior stays the same, the test is testing implementation. Delete it and rewrite it against the contract.


1. The risk matrix: what to test first

Before writing any test, classify the code by two axes: damage if it breaks silently and likelihood of breakage.

                        High Damage
                            │
         ┌──────────────────┼──────────────────┐
         │                  │                  │
         │   TIER 1         │   TIER 2         │
         │   Test first     │   Test second    │
         │                  │                  │
         │   One-way-door   │   Validation &   │
         │    transitions   │    guard logic   │
         │   Source-of-truth │   Data shaping  │
         │    writes        │    at boundaries │
         │   External system │   Conditional   │
         │    boundaries    │    access logic  │
         │   Money or access │                  │
         │    paths         │                  │
         │                  │                  │
  Low ───┼──────────────────┼──────────────────┤─── High
  Change │                  │                  │    Change
  Rate   │   TIER 4         │   TIER 3         │    Rate
         │   Skip or defer  │   Test when      │
         │                  │    it breaks     │
         │   Config files   │                  │
         │   Static helpers │   UI formatting  │
         │   Type defs      │   Log messages   │
         │   Constants      │   Copy/templates │
         │                  │                  │
         └──────────────────┼──────────────────┘
                            │
                        Low Damage

Tier 1: non-negotiable. Silent failure here corrupts data, loses money, or locks users out. Write these tests before shipping. Recognize Tier 1 by these characteristics:

  • One-way-door transitions: any status change that cannot be undone or that gates subsequent behavior. Task status moving to COMPLETED (irreversible), Focus Session receiving a terminal event (SESSION_CANCELLED), subscription state machine transitions.
  • Source-of-truth writes: the single place a value is computed or stored that downstream logic depends on. If this value is wrong, everything built on top is wrong. Focus Session duration computed from the event log, daily rollover gate that determines which tasks carry forward.
  • External system boundaries: where your code meets a system whose data shape you do not control. The provider can change without warning; your code must still handle the old and new shape. Polar webhook payloads, Google OAuth token responses, email provider API contracts.
  • Money or access paths: any logic in the chain between a user action and gaining or losing access to a paid feature, an account, or data. Checkout flow, subscription activation, password reset token validation, refresh token rotation.

Tier 2: write before real users. Failure here is visible (the user gets a 500, a wrong error, or is incorrectly blocked) but does not silently corrupt state. Write these before the first paying customer.

  • Validation and guard logic: input parsing, auth checks, ownership verification, request schema enforcement. Zod schema validation at route boundaries, userId ownership scoping on task queries.
  • Data shaping at boundaries: mappers, DTOs, and response formatting where a wrong or missing field is discoverable through normal usage. Mapping Prisma rows to response DTOs, formatting Compass Entry responses.
  • Conditional access logic: feature gates, plan-based restrictions, role checks. Plan-gated feature checks, auth middleware that distinguishes free vs. paid users.

Tier 3: write on demand. Code that changes often but where breakage is annoying, not catastrophic. Write a test after the first bug, not before.

Tier 4: skip. Code that almost never changes and has no meaningful failure mode. A test here costs more in maintenance than it will ever catch.


2. Test categories

2.1 Unit tests

A single function, class, or module tested in isolation. No database, no network, no filesystem.

Use for pure logic: calculations, mappings, validations, state machines, parsers, formatters, domain rules.

A good unit test:

  • Runs in under 50ms.
  • No setup beyond constructing the input.
  • Deterministic. Same input, same output, every time. No Date.now(), no Math.random() unless injected.
  • Tests one behavior per test case. If a test has two assertions testing two different behaviors, split it.

File naming:

{source-filename}.test.ts    # colocated in the feature's tests/ directory

Structure:

describe("{ModuleName}", () => {
  describe("{methodOrBehavior}", () => {
    it("{does specific thing when specific condition}", () => {
      // Arrange: build the input
      // Act: call the function
      // Assert: check the output
    });
  });
});

Good test names read like documentation:

✓ allows PENDING task to transition to IN_PROGRESS
✓ blocks COMPLETED task from reverting to PENDING
✓ computes session duration from three pause/resume intervals
✓ returns active Compass Entry for the requested scope
✓ rejects Focus Switch when another session is already RUNNING

Bad test names are vague or repeat the function name:

✗ should work
✗ test transition
✗ handles input correctly
✗ mapStatus works

2.2 Integration tests

Multiple modules working together through a shared resource, typically a database. The "unit" here is a use case, not a function.

Use for any flow where the interesting bugs live in the wiring between modules, not inside any single one: a service that reads, transforms, and persists across multiple tables; a multi-step operation that must leave the database in a consistent state.

A good integration test:

  • Uses a real test database (not mocks).
  • Seeds its own data, cleans up after itself.
  • Tests the public entry point of the flow (service method), not internal methods.
  • Does not depend on test execution order.

File naming:

{source-filename}.integration.test.ts    # colocated in the feature's tests/ directory

Database management between tests:

// Truncate all tables between tests, do NOT drop/recreate
// Truncation is fast. Schema migrations are slow.
beforeEach(async () => {
  await db.$executeRawUnsafe(`
    TRUNCATE TABLE tasks, focus_sessions, session_events, users CASCADE
  `);
});

// Re-seed reference data that every test needs
beforeEach(async () => {
  await seedReferenceData();
});

2.3 Contract tests (external provider boundaries)

Verify your code correctly handles the shape of data from an external provider. They never call the provider. They use captured real payloads as fixtures.

Use for any external system where the data shape is not under your control: payment provider webhooks, OAuth token responses, email API payloads.

How they work:

  1. Capture a real payload from the provider (sandbox or production logs).
  2. Save it as a JSON fixture in a shared fixtures directory.
  3. Write a contract test that passes the fixture through your mapper/parser.
  4. When the provider changes their payload shape, the contract test fails and tells you exactly which field broke.

Contract tests are technically unit tests (no database, no network). The distinction is conceptual: they test an external boundary using real captured payloads, not constructed inputs.

Example:

import tokenResponse from "@test-support/fixtures/google/oauth-token-response.json";

describe("GoogleOAuthMapper contract", () => {
  it("parses a real Google OAuth token response", () => {
    const result = mapper.toUserProfile(tokenResponse);

    expect(result.email).toBe(tokenResponse.email);
    expect(result.providerUserId).toBe(tokenResponse.sub);
    expect(result.name).toBeTruthy();
  });
});

This test has zero mocking. It uses the exact payload the provider sent. When the provider renames a field or changes the token format, this test catches it immediately.

2.4 Route tests (HTTP level)

Send HTTP requests to your app and assert on the response status, body, and headers. They test the full request lifecycle: parsing, auth, validation, business logic, serialization.

Use for every API route that is publicly documented, has auth requirements, or has error cases that matter to the frontend.

A good route test:

  • Uses the framework's in-process injection method (e.g., Fastify's inject(), Supertest with Express). No real HTTP server. This is in-process and fast.
  • Tests the happy path and every documented error case.
  • Verifies response shape, not just status code.

Route tests without a database are unit-speed and run with unit tests. Route tests that hit a real database are integration tests and should use the .integration.test.ts suffix.

File naming:

{route-filename}.test.ts                  # no database needed
{route-filename}.integration.test.ts      # hits real database

Example:

describe("POST /api/tasks", () => {
  it("returns 201 with created task for valid request", async () => {
    const user = await seedUser();

    const res = await app.inject({
      method: "POST",
      url: "/api/tasks",
      headers: authHeaders(user),
      payload: { title: "Review pull request", plannedDurationSeconds: 1800 },
    });

    expect(res.statusCode).toBe(201);
    const body = JSON.parse(res.body);
    expect(body.data.id).toBeTruthy();
    expect(body.data.status).toBe("PENDING");
  });

  it("returns 400 for missing title", async () => {
    const user = await seedUser();

    const res = await app.inject({
      method: "POST",
      url: "/api/tasks",
      headers: authHeaders(user),
      payload: { plannedDurationSeconds: 1800 },
    });

    expect(res.statusCode).toBe(400);
  });

  it("returns 401 without auth header", async () => {
    const res = await app.inject({
      method: "POST",
      url: "/api/tasks",
      payload: { title: "Review pull request" },
    });

    expect(res.statusCode).toBe(401);
  });
});

2.5 Security boundary tests

These are a subset of route tests, called out separately because they protect against the failures that actually cause data breaches: IDOR, privilege escalation, missing auth checks. Easy to miss in normal feature development.

Write these for every endpoint that serves user-specific data or mutates user-owned resources.

Authentication: unauthenticated requests are rejected.

it("returns 401 without auth token", async () => {
  const res = await app.inject({
    method: "GET",
    url: "/api/tasks",
  });
  expect(res.statusCode).toBe(401);
});

it("returns 401 with expired token", async () => {
  const res = await app.inject({
    method: "GET",
    url: "/api/tasks",
    headers: authHeaders({ expired: true }),
  });
  expect(res.statusCode).toBe(401);
});

Authorization: users cannot access resources they do not own.

it("returns 404 when accessing another user's task", async () => {
  const owner = await seedUser();
  const attacker = await seedUser();
  const task = await seedTask({ userId: owner.id });

  const res = await app.inject({
    method: "GET",
    url: `/api/tasks/${task.uid}`,
    headers: authHeaders(attacker),
  });

  // 404, not 403. Do not reveal that the resource exists
  expect(res.statusCode).toBe(404);
});

Ownership chains: for nested resources, verify the full chain.

it("rejects session access when task belongs to another user", async () => {
  const owner = await seedUser();
  const attacker = await seedUser();
  const task = await seedTask({ userId: owner.id });
  const session = await seedSession({ taskId: task.id });

  const res = await app.inject({
    method: "GET",
    url: `/api/tasks/${task.uid}/sessions/${session.uid}`,
    headers: authHeaders(attacker),
  });

  expect(res.statusCode).toBe(404);
});

Why 404 over 403: Returning 403 confirms the resource exists. An attacker who can enumerate valid UIDs and distinguish 403 from 404 can map which resources exist even if they cannot access them. Return 404 for resources the user does not own.

Security tests are Tier 1. IDOR and privilege escalation bugs are how data breaches happen. Write these for every endpoint that serves user-scoped data, before shipping.

2.6 End-to-end tests

At small to medium scale: almost never as automated tests. E2E tests are slow, flaky, and expensive to maintain. Manual QA checklists work better until the team is large enough to justify the maintenance cost.

The one exception: If you have a critical path that breaks repeatedly and is hard to catch any other way, write a single E2E test for that path and run it in CI. Not more than 3–5 total for the whole project.


3. What to mock, what to keep real

Mock at the boundary of your system. Keep everything inside real.

Your system boundary is where your code talks to something you don't control: an external API, a third-party SDK, a transactional email service. Mock those.

Your database is NOT an external system for integration tests. It is part of your system. Use a real test database.

Boundary map

YOUR CODE (keep real in tests)          EXTERNAL (mock these)
──────────────────────────────          ─────────────────────
State machine logic                     Third-party provider SDKs
Payload mappers                         Email service API
Repository (SQL queries)                External HTTP calls
Route handlers                          OAuth provider endpoints
Service orchestration                   Clock (inject time, don't use Date.now())
Middleware                              Random generators (inject seed)

How to mock external services

Use dependency injection. Your service constructor takes the external client as a parameter. In tests, pass a mock.

// Production
const auth = new AuthService({
  oauthProvider: new GoogleOAuthClient({ clientId: process.env.GOOGLE_CLIENT_ID }),
  mailer: new EmailClient(process.env.EMAIL_API_KEY),
  repo: new AuthRepository(db),
});

// Test
const mockOAuthProvider = {
  verifyToken: vi.fn(),
  getUserProfile: vi.fn(),
};
const mockMailer = {
  emails: { send: vi.fn() },
};

const auth = new AuthService({
  oauthProvider: mockOAuthProvider as any,
  mailer: mockMailer as any,
  repo: new AuthRepository(testDb), // real DB, real queries
});

Never mock the repository. The most common bugs in backend code are wrong SQL queries, missing WHERE clauses, incorrect JOIN conditions, and missing indexes. If you mock the repository, you are testing that your code calls .findById() with the right arguments. You are NOT testing that .findById() returns the right data. Use a real database.

Testing external service failure modes

Contract tests verify that your code handles the shape of external data. You also need to verify that your code handles external failures: timeouts, 5xx responses, malformed bodies, and rate limiting.

These are unit tests. Mock the adapter and force the failure:

describe("when OAuth provider times out", () => {
  it("throws a retriable error", async () => {
    mockOAuthProvider.verifyToken.mockRejectedValue(new Error("Request timed out"));

    await expect(service.authenticateWithGoogle({ token })).rejects.toThrow(
      "OAUTH_PROVIDER_UNAVAILABLE"
    );
  });
});

describe("when email service returns 500", () => {
  it("does not create the password reset record", async () => {
    mockMailer.emails.send.mockRejectedValue(new ProviderError(500, "Internal Server Error"));

    await expect(service.requestPasswordReset({ email })).rejects.toThrow();

    const reset = await repo.findPendingReset({ email });
    expect(reset).toBeNull();
  });
});

describe("when OAuth provider returns unexpected body", () => {
  it("throws a descriptive parse error", async () => {
    mockOAuthProvider.getUserProfile.mockResolvedValue({
      // missing expected fields
      sub: null,
      email: undefined,
    });

    await expect(service.authenticateWithGoogle({ token })).rejects.toThrow(/email/);
  });
});

For every external service your code depends on, test at least: timeout, server error (5xx), and malformed/unexpected response body. These are Tier 1 when the service is in a money or access path.


4. Test data management

4.1 Builder functions (for unit tests)

Builder functions create valid default objects that you override per test. This eliminates boilerplate and makes each test show only what is relevant.

// test-support/helpers/builders.ts

export function buildTaskInput(overrides: Partial<CreateTaskInput> = {}): CreateTaskInput {
  return {
    title: "Review pull request",
    description: null,
    plannedDurationSeconds: 1800,
    dueDateIso: null,
    ...overrides,
  };
}

export function buildSessionEvent(
  type: SessionEventType,
  overrides: Partial<SessionEvent> = {}
): SessionEvent {
  return {
    id: BigInt(Math.floor(Math.random() * 1000000)),
    sessionId: BigInt(1),
    eventType: type,
    clientTimestamp: new Date(),
    serverTimestamp: new Date(),
    pauseReason: null,
    cancelReason: null,
    ...overrides,
  };
}

Key principle: The builder returns a valid, complete object. Each test overrides only the field that makes it special. A reader should be able to look at a test and immediately see what condition is being tested without scrolling to a shared setup block.

// Good: the override IS the test
it("rejects task with empty title", () => {
  const input = buildTaskInput({ title: "" });
  expect(() => validateTaskInput(input)).toThrow();
});

// Bad: the important detail is hidden in setup
const input = { ...baseInput }; // what makes this test different?
delete input.title;

4.2 Seed functions (for integration tests)

Seed functions insert rows into the test database and return the created entities. They handle foreign key dependencies automatically.

// test-support/helpers/seeds.ts

export async function seedUser(overrides: Partial<User> = {}): Promise<User> {
  return db.user.create({
    data: {
      uid: `test_uid_${randomId()}`,
      email: `test_${randomId()}@example.com`,
      name: "Test User",
      ...overrides,
    },
  });
}

export async function seedTask(overrides: Partial<Task> & { userId?: bigint } = {}): Promise<Task> {
  const userId = overrides.userId ?? (await seedUser()).id;

  return db.task.create({
    data: {
      userId,
      uid: `test_task_${randomId()}`,
      title: "Review pull request",
      status: "PENDING",
      plannedDurationSeconds: 1800,
      ...overrides,
    },
  });
}

export async function seedUserWithTask(
  taskOverrides: Partial<Task> = {}
): Promise<{ user: User; task: Task }> {
  const user = await seedUser();
  const task = await seedTask({
    userId: user.id,
    ...taskOverrides,
  });
  return { user, task };
}

4.3 Reference data seeding

Plans, roles, permission sets, and other reference data that every integration test needs should be seeded once in the global setup, not per test.

// test-support/helpers/reference-data.ts

export async function seedReferenceData(): Promise<void> {
  // Seed all lookup/reference tables that integration tests assume exist.
  // Add new reference entities here as the schema grows.
  await db.plan.createMany({
    data: [
      { id: "free", name: "Free", priceCents: 0, isActive: true },
      { id: "plus", name: "Plus", priceCents: 600, isActive: true },
    ],
    skipDuplicates: true,
  });
}

5. Test organization

5.1 Placement: colocated by feature

Test files live inside a tests/ directory within each feature. This is the only rule for test placement. There are no centralized tests/unit/, tests/integration/, or tests/routes/ directories.

Tests live next to the code they cover. When you open a feature directory, the tests/ folder is right there, no parallel directory tree to keep in sync. When you delete a feature, the tests go with it.

A subfolder keeps things clean. A feature with 8 source files and 8 test files flat is 16 entries in one listing. The tests/ subfolder avoids that without moving tests out of the feature boundary.

The suffix is what matters, not the folder. Unit tests end in .test.ts, integration tests in .integration.test.ts. The runner uses the suffix to decide how to discover and run them.

5.2 File structure example

src/
  features/
    task/
      task.domain.ts
      task.mapper.ts
      task.service.ts
      task.repository.ts
      routes/
        create-task.route.ts
        complete-task.route.ts
      tests/
        task.domain.test.ts                          # unit (state machine, pure logic)
        task.mapper.test.ts                          # unit
        task.service.integration.test.ts             # integration (needs DB)
        create-task.route.test.ts                    # route (inject, no DB)
        create-task.route.integration.test.ts        # route (inject + real DB)
        complete-task.route.test.ts

test-support/                                        # shared test infrastructure
  helpers/
    builders.ts                                      # builder functions for unit tests
    seeds.ts                                         # seed functions for integration tests
    reference-data.ts                                # shared reference data seeding
    db.ts                                            # test DB client + query helpers
    app.ts                                           # build app instance for route tests
    auth.ts                                          # authHeaders helper
  fixtures/
    {provider}/                                      # real captured payloads from sandbox
      oauth-token-response.json
      webhook-subscription-created.json
      README.md                                      # when each fixture was captured

5.3 Placement rules

No ambiguity. Follow mechanically.

  • Test files go in src/features/{feature}/tests/. The filename matches the source file: state-machine.ts is tested by tests/state-machine.test.ts. If it hits a real database, use .integration.test.ts.
  • Shared test infrastructure goes in test-support/: builders, seeds, database helpers, app factory, auth helpers, fixtures. If two features need the same helper, it lives here.
  • Feature-scoped helpers can live in that feature's tests/ directory. Move to test-support/ when a second feature needs it.
  • Fixtures always go in test-support/fixtures/{provider}/. They are shared by nature since multiple features might parse the same provider's data.
  • Never put .test.ts files in test-support/. That directory is infrastructure, not tests.

6. Writing patterns

6.1 The AAA pattern (Arrange, Act, Assert)

Every test follows this structure. No exceptions.

it("transitions PENDING to IN_PROGRESS", () => {
  // Arrange: set up the preconditions
  const from: TaskStatus = "PENDING";
  const to: TaskStatus = "IN_PROGRESS";

  // Act: perform the action
  const result = () => assertValidTransition(from, to);

  // Assert: verify the outcome
  expect(result).not.toThrow();
});

If Arrange is more than 5 lines, extract it to a builder or seed function. If Act is more than 1–2 lines, your function might have too many responsibilities. If Assert is more than 3 lines, you might be testing multiple behaviors.

6.2 One behavior per test

// Bad: two behaviors in one test
it("handles task completion", async () => {
  const task = await seedTask({ status: "IN_PROGRESS" });
  await service.completeTask(task.uid);

  const updated = await getTask(task.uid);
  expect(updated.status).toBe("COMPLETED");

  const session = await getActiveSession(task.id);
  expect(session).toBeNull();
});

// Good: split into focused tests
it("transitions task status to COMPLETED", async () => {
  const task = await seedTask({ status: "IN_PROGRESS" });
  await service.completeTask(task.uid);

  const updated = await getTask(task.uid);
  expect(updated.status).toBe("COMPLETED");
});

it("cancels active Focus Session when task is completed", async () => {
  const task = await seedTask({ status: "IN_PROGRESS" });
  await seedFocusSession({ taskId: task.id, status: "RUNNING" });

  await service.completeTask(task.uid);

  const session = await getActiveSession(task.id);
  expect(session).toBeNull();
});

When a focused test fails, the name tells you exactly what broke. When a kitchen-sink test fails, you have to read the whole test to find the failure.

6.3 Test the edges, not just the happy path

For every behavior, think about these categories:

HAPPY PATH:    Valid input, expected output.
EDGE CASES:    Boundary values, empty inputs, null fields, maximum lengths.
ERROR CASES:   Invalid input, missing required fields, unauthorized access.
STATE CASES:   Different starting states that change the behavior.
CONCURRENCY:   Two requests at the same time (integration tests only).
IDEMPOTENCY:   Same request sent twice. Does it fail or no-op?

Example for a task creation route:

Happy:       Valid title, user authenticated              -> 201 + task
Edge:        Title at maximum length (255 chars)          -> 201 (accepted)
Edge:        User already has maximum daily tasks         -> 409 (daily limit)
Error:       Missing title                                -> 400
Error:       Empty title                                  -> 400
Error:       No auth header                               -> 401
State:       Task with IN_PROGRESS status                 -> allows COMPLETED
State:       Task with COMPLETED status                   -> blocks revert to PENDING
Idempotency: Completing an already-COMPLETED task         -> No-op, same response

6.4 Parameterized tests for exhaustive cases

When testing the same behavior across many inputs, use it.each:

describe("valid transitions", () => {
  it.each([
    ["PENDING", "IN_PROGRESS"],
    ["PENDING", "COMPLETED"],
    ["PENDING", "DELETED"],
    ["IN_PROGRESS", "PENDING"],
    ["IN_PROGRESS", "COMPLETED"],
    ["IN_PROGRESS", "DELETED"],
    ["COMPLETED", "DELETED"],
  ])("allows %s -> %s", (from, to) => {
    expect(() => assertValidTransition(from as TaskStatus, to as TaskStatus)).not.toThrow();
  });
});

This is exhaustive, readable, and each case is individually named in the test output.

6.5 Testing error shapes, not just status codes

The frontend depends on specific error response shapes. Test them.

it("returns structured error when task is already completed", async () => {
  const { user, task } = await seedUserWithTask({ status: "COMPLETED" });

  const res = await app.inject({
    method: "PATCH",
    url: `/api/tasks/${task.uid}`,
    headers: authHeaders(user),
    payload: { status: "PENDING" },
  });

  expect(res.statusCode).toBe(409);

  const body = JSON.parse(res.body);
  expect(body).toEqual({
    code: "INVALID_STATUS_TRANSITION",
    message: expect.any(String),
  });
});

6.6 Testing time-dependent behavior

Code that depends on "now" (expiration checks, grace periods, trial-end calculations, staleness detection) breaks in subtle ways when tested against the real clock. The fix: pass "now" as a parameter.

Pattern: inject now as a parameter.

// Production code
export function isSessionStale(
  session: FocusSession,
  gracePeriodSeconds: number,
  now: Date = new Date()
): boolean {
  const deadlineMs =
    session.lastEventAt.getTime() + (session.plannedDurationSeconds + gracePeriodSeconds) * 1000;
  return session.status === "RUNNING" && now.getTime() > deadlineMs;
}

// Test: full control over time
it("returns true when session exceeds planned duration plus grace period", () => {
  const session = buildFocusSession({
    status: "RUNNING",
    plannedDurationSeconds: 1800,
    lastEventAt: new Date("2026-03-15T10:00:00Z"),
  });

  const now = new Date("2026-03-15T10:40:00Z"); // 40 min > 30 min + 5 min grace
  expect(isSessionStale(session, 300, now)).toBe(true);
});

it("returns false when session is within planned duration", () => {
  const session = buildFocusSession({
    status: "RUNNING",
    plannedDurationSeconds: 1800,
    lastEventAt: new Date("2026-03-15T10:00:00Z"),
  });

  const now = new Date("2026-03-15T10:20:00Z"); // 20 min < 30 min
  expect(isSessionStale(session, 300, now)).toBe(false);
});

For services that need "now" in multiple places, inject a clock function via dependency injection:

type Clock = () => Date;

class FocusSessionService {
  constructor(private deps: { clock: Clock; repo: FocusSessionRepository }) {}

  async detectStale(sessionId: string): Promise<boolean> {
    const session = await this.deps.repo.findById(sessionId);
    const now = this.deps.clock();
    const elapsedSeconds = differenceInSeconds(now, session.lastEventAt);
    return elapsedSeconds > session.plannedDurationSeconds + GRACE_PERIOD_SECONDS;
  }
}

// Test
const fixedClock = () => new Date("2026-03-15T10:30:00Z");
const service = new FocusSessionService({ clock: fixedClock, repo });

Never rely on "this test runs fast enough that Date.now() won't change." That assumption fails on slow CI runners, under load, and across daylight saving transitions.

6.7 Testing async and concurrent behavior

You cannot unit-test a race condition. You need two operations competing for the same row, with the database as the arbiter.

Pattern: Promise.all with competing operations.

it("prevents duplicate active Focus Sessions via unique constraint", async () => {
  const { user, task } = await seedUserWithTask();

  const [result1, result2] = await Promise.allSettled([
    service.focusSwitch({ userId: user.id, taskId: task.id }),
    service.focusSwitch({ userId: user.id, taskId: task.id }),
  ]);

  const successes = [result1, result2].filter((r) => r.status === "fulfilled");
  const failures = [result1, result2].filter((r) => r.status === "rejected");

  expect(successes).toHaveLength(1);
  expect(failures).toHaveLength(1);
});

Pattern: idempotent state transitions.

it("produces the same end state when completing a task twice", async () => {
  const { user, task } = await seedUserWithTask({ status: "IN_PROGRESS" });

  await service.completeTask({ userId: user.id, taskUid: task.uid });
  await service.completeTask({ userId: user.id, taskUid: task.uid }); // replay

  const tasks = await repo.findByUserId({ userId: user.id });
  const completed = tasks.filter((t) => t.status === "COMPLETED");
  expect(completed).toHaveLength(1); // not duplicated
});

When to write concurrency tests: Any time your code relies on a unique constraint, a state machine guard, or an idempotency key to prevent duplicates. These mechanisms are your concurrency contract. Test that they hold.

When NOT to write concurrency tests: For operations that are naturally serialized (single-user workflows, sequential queue consumers). Testing concurrency where it cannot occur is noise.

6.8 Testing external event processing

These patterns apply whenever your code processes events from outside your system: webhooks, queue messages, polling results. Four behaviors consistently break in production and are cheap to test:

1. Happy path: Valid event → correct state change.

it("applies the correct state transition on a valid event", async () => {
  const user = await seedUser();
  await seedEntityInState({ userId: user.id, status: "INITIAL_STATE" });

  const event = buildExternalEvent("entity.updated", {
    status: "new_state",
    metadata: { user_id: user.uid },
  });

  await service.processAndFinalize(event);

  const entity = await repo.findByUserId({ userId: user.id });
  expect(entity!.status).toBe("NEW_STATE");
});

2. Idempotency: Same event processed twice → same end state, no duplicates.

3. Stale event rejection: An event older than what the DB already knows is recorded but does not overwrite.

it("does not overwrite newer state with an older event", async () => {
  const user = await seedUser();
  await seedEntityInState({
    userId: user.id,
    status: "CURRENT_STATE",
    updatedAt: new Date("2026-03-15T12:00:00Z"),
  });

  const staleEvent = buildExternalEvent("entity.updated", {
    status: "outdated_state",
    created_at: "2026-03-10T12:00:00Z", // older than current state
  });

  await service.processAndFinalize(staleEvent);

  const entity = await repo.findByUserId({ userId: user.id });
  expect(entity!.status).toBe("CURRENT_STATE"); // unchanged
});

4. Unknown/unhandled event types: The handler records the event but does not crash.

it("skips unknown event types without error", async () => {
  const event = buildExternalEvent("some.unknown.event");

  await expect(service.processAndFinalize(event)).resolves.not.toThrow();

  const stored = await eventRepo.findByExternalEventId(event.externalEventId);
  expect(stored!.isSkipped).toBe(true);
});

7. Anti-patterns

7.1 Testing framework code

// Bad: you are testing that your ORM works, not that your code works
it("creates a user in the database", async () => {
  await db.user.create({ data: { email: "test@example.com" } });
  const user = await db.user.findFirst({ where: { email: "test@example.com" } });
  expect(user).not.toBeNull();
});

Your ORM already has tests. Your test should verify YOUR logic using the ORM, not the ORM itself.

7.2 Snapshot tests for dynamic data

// Bad: breaks every time a date, ID, or order changes
it("returns correct response", async () => {
  const res = await getTasksByUser(userId);
  expect(res).toMatchSnapshot();
});

Snapshot tests are useful for static HTML templates. They are harmful for API responses with timestamps, IDs, and computed fields.

7.3 Over-mocking

// Bad: mocking everything means you're testing that your code calls functions,
// not that those functions do the right thing
const mockRepo = {
  findById: vi.fn().mockResolvedValue(fakeEntity),
  update: vi.fn().mockResolvedValue(undefined),
};

// This test passes even if your SQL query has a bug

If you find yourself mocking more than 2 dependencies in a single test, that test wants to be an integration test with a real database.

7.4 Test interdependence

// Bad: test 2 depends on state created by test 1
it("creates a task", async () => {
  /* creates task */
});
it("completes the task", async () => {
  /* assumes task exists from above */
});

Every test must set up its own preconditions. Tests must pass when run individually and in any order.

7.5 Testing private methods

If you feel the need to test a private method, it usually means one of two things:

  1. The method contains important logic that should be extracted into its own module with a public API. Extract and test the module.
  2. The behavior is already testable through the public API. Test it there.

8. Test isolation

Each test runs in a clean world. Truncating tables and seeding your own data is the obvious part. The less obvious leaks:

  • Module-level singletons: a cache or connection pool that accumulates state across tests. Reset or recreate it in beforeEach.
  • Mocked clocks: if you mock Date.now() or use fake timers, restore the real clock in afterEach. A leaked fake clock corrupts every subsequent test.
  • Environment variables: if a test modifies process.env, restore the original in afterEach. Better: inject config through the DI container.
  • Global event listeners: if a test registers a listener on a shared emitter, remove it in afterEach. Leaked listeners cause phantom failures that only show up in the full suite.
  • File system artifacts: if a test writes temp files, delete them in afterEach.

The principle

If a test passes when run alone but fails (or causes failures) when run with the full suite, the problem is always leaked state. The diagnostic steps: (1) bisect which preceding test introduces the failure, (2) identify the shared resource, (3) add the cleanup.


9. Flaky test policy

A flaky test (one that passes and fails without code changes) is worse than no test. Once the team learns to ignore a red CI badge, real bugs slip through.

Diagnosis

Most flaky tests fall into one of four categories:

  1. Timing-dependent. The test assumes an operation completes within a fixed time. Fails on slow CI runners or under load.
  2. Order-dependent. The test depends on state created by a previous test. Passes in full-suite order, fails when run alone or in a different order.
  3. Resource leak. A database connection, file handle, or mock is not cleaned up. The failure appears in a later, unrelated test.
  4. Non-deterministic input. The test uses Math.random(), Date.now(), or an auto-generated ID in a way that occasionally produces a collision or boundary value.

Policy

When a flaky test is detected:

  1. Quarantine immediately. Move the test to a skip or todo state with a comment: // FLAKY: [category] - [date quarantined]. A quarantined test does not block CI.
  2. Fix within 48 hours for Tier 1 tests (one-way-door transitions, source-of-truth writes, external boundaries). These are the tests that protect against the most expensive bugs. Their flakiness is itself a high-severity issue.
  3. Fix within one week for Tier 2–3 tests. If the fix is not obvious within that window, delete the test and open a ticket to rewrite it with better isolation.
  4. Never retry-to-green. Re-running CI until a flaky test passes trains the team to distrust the suite. If a test needs retries to pass, it is broken.
  5. Track flaky tests visibly. Keep a running count. If more than 5% of the suite is quarantined, stop feature work and fix the suite. The investment is paying negative returns.

Prevention

  • Inject time, randomness, and external dependencies. Never read them from the environment.
  • Use beforeEach cleanup that is aggressive enough to handle partial failures (a test that throws mid-execution still needs cleanup to run).
  • Run the suite in random order periodically to surface order-dependent flakes before they reach CI.
  • Set timeouts tight enough to catch hangs but loose enough for the slowest CI runner (typically 2–3x local execution time).

10. Test performance budget

When the suite takes too long, developers stop running it locally. Bugs get caught in CI instead of at the keyboard, and the suite quietly stops earning its keep.

Targets

| Suite | Target | Hard ceiling | | ------------------------------- | ---------------- | ------------ | | Unit tests | Under 15 seconds | 30 seconds | | Integration tests | Under 1 minute | 3 minutes | | Full suite (unit + integration) | Under 2 minutes | 5 minutes |

These targets are for the entire test suite, not individual tests. Individual test budgets: 50ms for unit tests, 500ms for integration tests.

When the suite gets slow

When any target doubles, that is the signal to investigate. Not to bump the timeout. Common causes:

  1. Integration tests masquerading as unit tests. A test file uses .test.ts but imports a database client. Move it to .integration.test.ts so it runs with the integration suite, not the unit suite.
  2. Too many integration tests for pure logic. A state machine with 30 transitions does not need 30 database round-trips. Test the transition table as a unit test; write 2–3 integration tests for the full flow.
  3. Slow seeds. A seed function that creates 10 related entities for a test that only needs 1. Builder functions exist to prevent this.
  4. No parallelism. If your test runner supports parallel execution for tests that do not share a database, use it for unit tests.
  5. Schema migrations running per-file instead of per-suite. Migrations should run once in global setup, not before each test file.

The ratchet

Once the suite meets a target, do not let it regress. Add a CI check that fails if the suite exceeds the hard ceiling. Without this, you get the slow accumulation of 50ms per test that turns a 30-second suite into a 5-minute suite over six months, and nobody notices until it is too late.


11. Coverage strategy

What to measure

Line coverage tells you which lines were executed. It does not tell you if they were tested correctly. A line can be "covered" by a test that never asserts anything about it.

Branch coverage is more useful. It tells you which conditional paths were taken.

Targets

Do not set a global coverage target like "80% coverage." It incentivizes writing bad tests to hit a number.

Instead, set per-module targets based on the risk matrix:

Tier 1 (one-way-door transitions, source-of-truth writes, external boundaries):  > 90% branch coverage
Tier 2 (validation, guard logic, data shaping):                                   > 70% branch coverage
Tier 3 (UI helpers, formatting):                                                   No target
Tier 4 (config, constants):                                                        No target

Coverage as a ratchet

Once you reach a coverage level for a module, never let it decrease. Configure your test runner to enforce per-file thresholds:

src/task/domain/task.domain.ts:              branches >= 95%
src/focus-session/duration.ts:               branches >= 90%

These thresholds only go up, never down. When you add new branches to a Tier 1 file, you must also add the tests.


12. Property-based testing

Example-based tests verify cases you thought of. Property-based tests generate random inputs and verify that invariants hold. They catch the edge cases you missed.

When to use

State machines. Generate random sequences of transitions and verify: (1) every valid transition succeeds, (2) every invalid transition is rejected, (3) terminal states have no outbound transitions, (4) the state after any valid sequence is a reachable state.

Parsers and mappers. Generate random valid payloads and verify: (1) parse(serialize(value)) === value (round-trip), (2) no valid input throws, (3) every output satisfies the schema.

Numeric calculations. Generate random inputs and verify: (1) totals are non-negative, (2) percentages are between 0 and 100, (3) arithmetic identities hold (e.g., subtotal + tax === total).

When NOT to use

  • CRUD operations. There is no interesting invariant to verify beyond "it saves and retrieves."
  • Route tests. The input space is already constrained by Zod schemas. Parameterized tests with it.each are more readable.
  • Anything where reading the test is harder than reading the code. Property-based tests must justify their cognitive overhead.

Example

import fc from "fast-check";

describe("FocusSessionStateMachine properties", () => {
  const allStates: SessionState[] = ["IDLE", "RUNNING", "PAUSED", "COMPLETED", "CANCELLED"];
  const terminalStates: SessionState[] = ["COMPLETED", "CANCELLED"];
  const allEventTypes: SessionEventType[] = [
    "SESSION_STARTED",
    "SESSION_PAUSED",
    "SESSION_RESUMED",
    "SESSION_EXTENDED",
    "SESSION_COMPLETED",
    "SESSION_CANCELLED",
    "SESSION_STALE",
  ];

  it("terminal states have no valid outbound transitions", () => {
    fc.assert(
      fc.property(
        fc.constantFrom(...terminalStates),
        fc.constantFrom(...allEventTypes),
        (state, event) => {
          expect(() => transition(state, event)).toThrow();
        }
      )
    );
  });

  it("every valid transition produces a reachable state", () => {
    fc.assert(
      fc.property(
        fc.constantFrom(...allStates),
        fc.constantFrom(...allEventTypes),
        (state, event) => {
          try {
            const next = transition(state, event);
            expect(allStates).toContain(next);
          } catch {
            // invalid transition, property does not apply
          }
        }
      )
    );
  });
});

13. Testing database migrations

Migrations are one-way doors. A broken migration can corrupt production data, and rolling back a bad ALTER TABLE on a large table is not something you want to do at 2 AM. Migration correctness is Tier 1.

Schema migrations

The integration test global setup runs migrate deploy against an empty test database. This is your minimum migration test: it verifies that every migration file applies cleanly in sequence. If a migration has a syntax error, a missing dependency, or a wrong column type, it fails here before reaching production.

Data migrations (backfills and transforms)

Data migrations (scripts that modify existing rows) are harder to test because they depend on the shape of real data. Test them by:

  1. Seeding the "before" state. Create rows that represent the data as it exists before the migration.
  2. Running the migration logic (not the migration file itself; extract the transform into a testable function).
  3. Asserting the "after" state. Verify that the transformed data matches expectations.
describe("backfill: add default timezone to users", () => {
  it("sets UTC for users with no timezone", async () => {
    await db.user.create({ data: { email: "a@test.com", timezone: null } });

    await backfillUserTimezones(db);

    const user = await db.user.findFirst({ where: { email: "a@test.com" } });
    expect(user!.timezone).toBe("UTC");
  });

  it("preserves existing timezone", async () => {
    await db.user.create({ data: { email: "b@test.com", timezone: "Asia/Tokyo" } });

    await backfillUserTimezones(db);

    const user = await db.user.findFirst({ where: { email: "b@test.com" } });
    expect(user!.timezone).toBe("Asia/Tokyo");
  });
});

Rollback testing

If your project maintains rollback scripts (down.sql), test them the same way: apply the up migration, then the down migration, then verify the schema is back to its previous state. This is especially important for migrations that drop columns or change types.


14. Testing observability

Do not test log output. Testing that logger.info('task_completed') was called couples your tests to log formatting. When you change a log message, no test should break.

The exception: when a log or event emission is a contract with an external system. An audit trail that compliance depends on, an analytics event that a dashboard consumes, a structured event that a downstream system processes. That is not testing "logging." That is testing a system integration that happens to use the event channel.

// Only when the event is a contract
it("emits audit event on password change", async () => {
  const spy = vi.spyOn(auditService, "log");

  await authService.changePassword({ userId, newPassword });

  expect(spy).toHaveBeenCalledWith(
    expect.objectContaining({
      eventType: "PASSWORD_CHANGED",
      userId,
    })
  );
});

15. When to add a test after the fact

Not every bug justifies a test. Add a regression test when:

  1. The bug was silent. It could have gone unnoticed for days. Example: a service that returns early without persisting its side effect.
  2. The bug affected money or access. Any bug in a money path, access path, or source-of-truth computation gets a regression test.
  3. The bug will recur. The code is in a hot path that changes often.
  4. The bug was in a boundary. An external provider changed their payload shape, a date format shifted, a null field appeared where it was previously always set.

Do NOT add a test for:

  • A typo in a log message.
  • A one-time data migration issue.
  • A bug in a dependency that was fixed by upgrading.

When adding a regression test, the test should:

  1. Reproduce the exact failure. Build the exact input that caused the bug.
  2. Fail before the fix. Run the test against the broken code first to confirm it catches the bug.
  3. Pass after the fix. Then apply the fix and verify.
  4. Name the test with context. it('handles missing email in OAuth response (bug: 2026-04-05)') so future developers know why this edge case is tested.

16. When to delete a test

Tests are code. They cost maintenance. Delete one when the cost exceeds the value.

Delete when:

  1. The feature it covers has been removed. Dead code tests are worse than no tests because they give the illusion of coverage for something that no longer exists.
  2. The test is permanently flaky and the cost of fixing exceeds the cost of rewriting. A flaky test quarantined for more than 30 days with no clear fix is a candidate for deletion and rewrite.
  3. The test is testing implementation, not behavior. If a valid refactor broke the test but not the feature, the test was wrong. Delete and replace with a behavior test.
  4. The test duplicates another test. Two tests asserting the same behavior with different setup are maintenance cost for zero additional coverage.
  5. The test covers Tier 4 code that has never caught a bug. A test on a static helper that has not changed in a year and has never failed is not earning its keep.

Do NOT delete when:

  • A test is slow but catches real bugs. Fix the speed problem instead.
  • A test is hard to understand but covers Tier 1 logic. Rewrite it for clarity instead.
  • You are refactoring and "the tests are in the way." If the tests are in the way of a valid refactor, they are testing implementation, so rewrite them to test behavior. If the tests are in the way of a behavior change, that is the tests doing their job.

17. Working with an AI agent

What to delegate

The AI agent is good at:

  • Generating exhaustive parameterized test cases from a spec or type definition.
  • Writing builder and seed functions from your schema.
  • Generating the boilerplate: describe blocks, beforeEach setup, import statements.
  • Writing route tests when given the route spec (method, path, request/response shapes, error cases).
  • Converting a captured JSON payload into a fixture file.
  • Filling in coverage gaps for Tier 2 and Tier 3 code.

What to keep human

The human should:

  • Decide WHAT to test (the risk matrix classification).
  • Review that tests actually test the right behavior, not just exercise code.
  • Write the first test for any new module to establish the pattern.
  • Review AI-generated tests for false positives (tests that pass for the wrong reason).
  • Decide when a test is testing implementation vs. behavior.
  • Name test cases clearly. The AI can generate it('should handle case 1'). The human renames it to it('blocks COMPLETED task from reverting to PENDING').

Workflow for AI-assisted test writing

Step 1. Human classifies the module in the risk matrix and decides which test category applies.

Step 2. Human writes 1–2 tests to establish the pattern, naming convention, and test structure for that module.

Step 3. Human provides the AI agent with:

  • The source file being tested.
  • The 1–2 example tests.
  • The spec or type definitions that describe the expected behavior.
  • This framework document as context.

Step 4. AI agent generates the remaining test cases, following the established pattern.

Step 5. Human reviews:

  • Are the test names descriptive? Can I understand what broke from the name alone?
  • Does each test have a single clear behavior?
  • Are edge cases and error cases covered?
  • Are there any false positives? (Tests that pass but don't actually verify the behavior.)
  • Are there any tests that would break on a valid refactor? (Testing implementation.)
  • Are there any security boundary tests for user-scoped endpoints?

Step 6. AI agent fixes issues from review.

Step 7. Human runs the test suite and verifies all tests pass against the real codebase.

Prompt template for AI agent

When handing a module to the AI agent for test generation, provide this context:

Here is the source file: [paste or reference file path]
Here is the spec it implements: [paste or reference spec section]
Here are 1-2 example tests I wrote: [paste]
Here is the testing framework we follow: [reference this document]

Write tests for this module following these rules:
1. Place the test file in src/features/{feature}/tests/
2. Use the builder/seed helpers from the shared test support directory
3. Follow the AAA pattern
4. Test one behavior per test case
5. Cover: happy path, edge cases, error cases, state cases
6. Use descriptive test names that read as documentation
7. For state machine transitions, use it.each for exhaustive coverage
8. Do not mock the repository or database
9. Mock only external services (third-party providers, email service)
10. Use .integration.test.ts suffix if the test hits the database
11. Include security boundary tests (auth, authz, ownership)
12. Test error response shapes, not just status codes

18. Checklist: applying this framework to a new module

When writing tests for any new module, work through this list:

[ ] 1. Classify the module in the risk matrix (Tier 1/2/3/4)
[ ] 2. Decide the test category (unit / integration / contract / route)
[ ] 3. Create tests/ directory in the feature if it doesn't exist
[ ] 4. Identify the behaviors to test:
      [ ] Happy path(s)
      [ ] Edge cases
      [ ] Error cases
      [ ] State-dependent cases
      [ ] Idempotency (if applicable)
      [ ] Security boundaries (auth, authz, ownership)
      [ ] Time-dependent behavior
      [ ] External service failure modes
[ ] 5. Write 1-2 tests to establish the pattern
[ ] 6. Delegate remaining cases to AI agent with context
[ ] 7. Review AI-generated tests for:
      [ ] Descriptive names
      [ ] One behavior per test
      [ ] No implementation testing
      [ ] No false positives
      [ ] No test interdependence
      [ ] Correct suffix (.test.ts vs .integration.test.ts)
      [ ] Security boundary coverage
[ ] 8. Run tests and verify all pass
[ ] 9. Check coverage for Tier 1/2 modules meets threshold
[ ] 10. Add module to coverage ratchet if Tier 1

Update this when you discover a new pattern that works, or when you find a test that was not worth writing.