The ROI of AI Coding Tools Is Harness Engineering

Over the past few weeks, we've been doing a lot of harness engineering in Claude Code. We went from "helpful autocomplete" to an AI development partner that can ship code, run tests, debug production issues, review its own work, and operate inside our team's real workflows.

Here's what that looked like:

Context

We wrote a comprehensive CLAUDE.md that teaches the agent our monorepo structure, coding conventions, workflow expectations, and recurring pitfalls. We also added sub-directory context files for module-specific patterns, so the agent does not have to relearn the same local rules every session. It's basically onboarding documentation for an AI teammate.

Skills

We built 25+ reusable, versioned skills for common workflows like /push, /test-with-mocks, /deploy-lambda, and /debug-serving. These encode multi-step processes so the agent can execute them consistently instead of improvising every time. For example, /push handles build verification, linting, unit tests, e2e tests, git push, and PR creation, while intelligently skipping unnecessary steps for low-risk changes like docs-only edits.

Guardrails

We added a shared allowlist of permitted commands in .claude/settings.json, checked into version control, so the whole team gets the same permissions automatically. A PreToolUse hook blocks shell patterns that would trigger interactive permission prompts, keeping multi-step workflows from stalling. And CLAUDE.md instructions teach the agent to avoid destructive operations upfront. Together: the context doc sets expectations, the allowlist gates execution, and the hook catches what slips through.

Parallel infrastructure

We built a worktree slot system so multiple developers, or multiple AI sessions, can run full e2e flows in parallel without colliding on ports, databases, or JWT state. The default case is zero-config, which matters a lot when you want parallelism to actually get used instead of becoming one more thing engineers have to manage.

Feedback loops

We changed our dev environment so logs are written to structured files the agent can read directly. We also added automated plan review workflows that evaluate work through CEO, engineer, PM, and UX lenses before code is written. On top of that, the agent updates its own documentation when architectural changes happen, so context improves instead of drifting.

The impact has been real

Execution flow got much smoother, with far less back-and-forth on routine decisions.

Once the harness had better defaults, better workflow encoding, and better guardrails, the agent no longer had to keep pausing on common judgment calls. That removed a surprising amount of friction and made longer end-to-end tasks feel much more natural.

Push-to-PR went from 8+ manual steps to a single command.

Instead of manually building, linting, testing, pushing, and opening a PR, the team can rely on a standardised workflow that executes the right checks in the right order. That compresses a lot of repetitive coordination into one repeatable path.

New engineers ramp faster because the harness carries onboarding knowledge.

A lot of tribal knowledge is now encoded into context docs, workflow skills, and shared defaults. That means new team members get leverage sooner, and the system is less dependent on senior engineers repeating the same guidance over and over.

The agent can do domain-specific work, not just generic code generation.

Skills like /investigate-conversions and /current-experiments let it operate with product and analytics context that used to require bouncing between multiple tools and mentally stitching things together. That opens up a very different level of usefulness.