AI-Native Software Engineering, Part 2: Harness Engineering and Correctness

Why the most important part of AI-native software engineering may not be generation, but constraint.

This is Part 2 of the AI-Native Software Engineering series.

It continues from AI-Native Software Engineering, Part 1: Mental Models in Agentic Coding.

The previous question was:

If understanding no longer comes mainly from writing code,
how do humans build mental models in an agentic workflow?

The next question is:

If implementation is delegated,
where does correctness come from?

The conversation around AI-assisted software development often focuses on one thing:

How much more software can we generate?

Faster coding.

Longer context.

Autonomous agents.

Multi-file edits.

Self-healing workflows.

But after building projects with AI agents for a while, I started arriving at an opposite conclusion:

The most important engineering problem is not how to make AI more autonomous.

It is how to make AI more constrained.

Because unrestricted generation is rarely the bottleneck.

Correctness is.

In this article, constraints are not about creativity or search space yet.

They are about correctness:

How do we make generated implementation safe to trust?

The Classical Software Assumption Link to heading

Traditional software development assumes something simple:

Human
-> Implementation
-> Verification

The person writing the system also builds understanding.

Verification is often implicit.

You trust implementation because you produced it.

This scales surprisingly well for small teams.

Until complexity increases.

AI Breaks This Assumption Link to heading

Agentic coding changes the structure.

Now the loop becomes:

Human
-> Intent
AI
-> Implementation
Human
-> Review

At first glance this seems efficient.

But there is a hidden issue.

Review is much cheaper than implementation.

And because review is cheaper:

People approve changes they do not fully understand.

Especially when:

multiple files changed
abstractions evolved
framework conventions shifted
generated code appears reasonable

The danger is subtle.

The system may work.

Until it doesn’t.

Generation Is Cheap. Correctness Is Expensive. Link to heading

Once implementation cost collapses, something else becomes dominant:

Verification.

This is where harness engineering becomes important.

A harness is not a test suite.

A harness is an executable definition of acceptable behavior.

Instead of saying:

Build me a notification system.

You define:

Given:
1000 requests

Expect:
p95 < 200ms

Failure:
Retry <= 3 times

Invariant:
No duplicate delivery

Now correctness exists outside implementation.

Harnesses Are Contracts Link to heading

A good harness acts like a contract.

It defines:

Inputs
What conditions exist?

Outputs
What outcomes are acceptable?

Constraints
What cannot happen?

Failure Modes
What should happen under stress?

Recovery
How does the system return?

Example:

Feature:
Upload image

Accept:
<2s response

Reject:
Files >10MB

Guarantee:
Metadata consistency

AI may generate ten implementations.

The harness selects one.

Constraints Create Search Space Link to heading

This changed how I think about AI.

Most people imagine AI coding as:

More freedom
-> More capability

I increasingly think:

More constraints
-> Better capability

Because software is not creativity.

Software is controlled exploration.

Harnesses reduce the search space.

They prevent:

architectural drift
hidden complexity
accidental behavior
local optimization

The harness becomes the boundary.

The Engineering Shift Link to heading

I think software roles may slowly evolve.

Old model:

Engineer
=
Design
+
Implementation
+
Verification

Emerging model:

Engineer
=
Intent
+
Constraint
+
Evaluation

AI becomes implementation infrastructure.

Humans remain responsible for correctness.

My Current Workflow Link to heading

For every feature:

SPEC
-> Acceptance
-> Harness
-> Agent
-> Evaluation
-> Merge

And every feature must answer:

Why does this exist?

What must never break?

How do we measure success?

How do we debug failure?

If these cannot be written, AI should not start coding.

Closing Thoughts Link to heading

AI does not remove engineering.

It removes implementation scarcity.

Which means the scarce resource becomes:

Correctness.

And correctness is rarely generated.

It is designed.

Harness engineering is not about empowering AI.

It is about making AI safe to trust.