Pair-programming with an LLM is not the same workflow as pair-programming with a human, and the developers who treat it like a junior pair get better outcomes than the ones who treat it like a senior. The junior framing is not about ability — the model is often capable in surprising directions — it is about where the judgment lives. The judgment stays with you.
The three positions the LLM can play
Human pairing has driver and navigator. AI pairing has three positions, and naming them is the first move toward a workflow you can run repeatably.
Driver — model types, you navigate
The model produces the code, you direct the high-level shape and accept or reject each chunk. This is the position where the velocity gain is highest and the failure rate is highest. The discipline is reading every line the model produces before accepting the next chunk. The most common mistake is letting the model run two or three chunks ahead of your reading.
Navigator — you type, model suggests
You write the code, the model offers completions, alternative phrasings, or warnings about edge cases. This is the lower-risk position and where the model's contribution is closest to the stereotype of a thoughtful pair — naming things you missed, pointing at the boundary case you forgot. Use this position for code that you would not want to delegate but would benefit from a second perspective on.
Reviewer — after the fact
You finish the code, then ask the model to review it. "What could go wrong in this function?" "What edge cases am I missing?" "Read this and tell me which assumption is load-bearing." The reviewer position is the easiest to add to an existing workflow and the one with the most consistent payoff per minute spent.
The checkpoint loop
Whatever position the model is playing, the workflow needs checkpoints — small batches, review before continuing, named decision points. The shape of the loop is the same every time.
- State the next small unit. One function. One file. One refactor with a named scope. If you cannot say in a sentence what success looks like, the unit is too big.
- Run the prompt. With the constraints from the prompting patterns post — small scope, named conventions, explicit constraints.
- Read the output line by line. Not skim. Read. Every method call, every parameter, every assumption.
- Run the code. Not just the type checker — the code, with at least one input you chose, not one the model suggested.
- Decide: accept, revise, or discard. The decision goes in the commit message. "Accepted the model's implementation, added the leap-year case it missed" is a real artefact that helps the next review.
- State the next small unit. Loop.
The discipline of the loop is that every step must complete before the next one starts. The failure mode is collapsing the loop — running three prompts in a row, accepting all three, then reviewing all three at once. By the time you reach the third review, fatigue has set in and the depth of attention drops.
Where the workflow goes wrong
Three failure shapes account for most of the velocity-loss observed when teams adopt AI pair-programming.
Accepting whole files without reading. The model generates a 200-line file, you scan the top, the bottom looks fine, you accept. The middle 150 lines contain assumptions you have not verified. This shows up as production incidents two weeks later that the team cannot trace back to a specific decision.
Asking for too-large units of work. "Build the user dashboard" is not a checkpoint-loop unit. "Add the avatar component to the user dashboard header with these props" is. The size of the unit is the single biggest lever on whether the workflow produces reviewable diffs.
Letting the model invent specs. When the model starts answering questions about what the code should do — "I will add validation on the email field too, assuming you want that" — pause. Decide whether that is actually wanted. The model treats every gap in the spec as permission to fill in the gap with whatever is statistically likely. Some of those guesses are right; some are not what you intended.
The diff discipline
The single sentence that holds the whole workflow together: every accepted change should be a diff you would have signed off in a code review. If you would have asked questions on the PR, ask them now. If you would have requested changes, request them now. If you would have rejected the approach, reject it now. The diff does not care that you generated it; the code will run in production whether the author was a human or a model.
The corollary: if you are accepting AI-generated code at a higher tempo than you would accept human PRs, something is wrong. Either you are reviewing at a depth you would not accept from a colleague, or you are accepting at a quality bar you would not accept from a colleague. Both are workflow bugs worth surfacing.
What stays human
Some parts of the work do not delegate well to the model, and recognising them is half the workflow.
- Picking the problem to solve. The model can help execute against a problem statement but cannot tell you which problem is worth solving this week. That decision is structural and stays with the human.
- Deciding when to stop. The model will happily keep generating refinements past the point of diminishing returns. Knowing when "good enough" is reached — when the next iteration costs more than it pays back — is a judgment call.
- Naming variables that future readers will read. The model can produce passable names but tends toward generic choices. A name like
processedUserListis the kind of name that future readers will struggle with;activeUsersForBillingcommunicates intent. The naming review is a place where human attention pays back disproportionately.
The habit that compounds
AI pair-programming is the workflow where most developers' productivity gains evaporate if they skip the review step. The gains feel large at first because the typing is fast; they shrink as the rework from accepted-but-wrong diffs accumulates. The teams who hold the gains are not the ones with the cleverest tool integrations. They are the ones who never let the checkpoint loop collapse — small batches, real reviews, a diff discipline they would defend to a colleague. That is the habit that compounds.
Related reading
The pair-programming workflow leans heavily on the prompting and review skills covered in prompting patterns that produce reviewable code and the five ways AI-generated code goes wrong. The human-pairing patterns it inherits from are worth reading alongside it — best practices for remote pair programming covers ping-pong, strong-style, and mob patterns that transfer well. All of these sit inside the ai-assisted-development topic.
About the writers
Developer educator at ShareCode. Writes the tutorial track — Python, JavaScript debugging, coding-interview prep, and the everyday code-quality habits that hold up in real codebases.
More from Kajal
Founder of ShareCode. Writes the engineering deep-dives on this site — WebRTC, Firebase Auth, real-time sync, and the production patterns behind the editor itself.
More from Kishan
Running a checkpoint loop with a teammate?
Open a shared code space, let one person drive the prompts while the other reviews each output before it is accepted, and rotate after every checkpoint. Most of the failure modes get caught in the rotation.
Open a code space →