How I Review LLM-generated Code

I’ve been tricked by LLM-generated code too many times. A diff looks reasonable, it has the sorts of changes that I associate with solving a problem, so I assume that it works. It doesn’t, but I don’t realize that until far later.

I still struggle to review LLM code. It’s all too tempting to skim and assume that the LLM got the details right, and I find focused code review exhausting. That said, I think rigorous code review is essential if you wanted to end up with code that does what you actually want.

I’ve made the following four adjustments to how I approach code review, and it’s helped me find value in LLM-generated code a bit more often (although I’m still leery of using LLMs to generate code):

Lean into trunk-based development
Look at the files, not just the diff
Use a good diff viewer, like delta, locally
Have a different LLM conduct a review first

Trunk-based Development

Trunk-based development—avoiding long-lived branches by working in a way where it’s always safe to merge small changes into your main branch—isn’t a code-review technique per se, but I think it’s essential for good code reviews, no matter who’s writing the code. If a branch lives for several days, it’s easy for that branch to become almost impossible to review well.

If you’re used to longer-lived branches, it might feel like you can’t merge your changes into your trunk yet because the feature isn’t ready yet. Almost any change you imagine can be structured in a way where you merge code into your main branch early:

Check your environment: if not is_production: ...
Check a feature-switch or rollout: if is_feature_on("my_feature")...
Rename an existing method to legacy_do_thing() and then migrate chunks to use do_thing().

Look at the files, not just the diff

When reviewing human code, I trust the diff. Most of the time, I don’t open up my editor to look at the file context or types. I’m not reviewing human code with an eye towards catching mistakes—I’m reviewing it to share knowledge of the codebase and ensure more than one person knows the code in case something goes wrong. (In fact, I quite like post-merge code review: reviewing code after it’s already been deployed.)

I don’t trust LLM-generated diffs. A diff doesn’t show the places that the LLM failed to update; it doesn’t show the in-editor comments for the functions that the LLM is calling; and it doesn’t show the full context of the file for the changes.

I’ve started to check out branches locally so that it’s easier to do an active review of the changes. I want to be able to navigate around the file, trace through logic, hover over type definitions and method comments, and just generally engage with the solution more deeply.

Use a good diff viewer

Because I now review code locally, a good diff viewer is essential. delta is a syntax-aware diff viewer with a nice side-by-side view (side-by-side = true) that highlights the parts of a line that changed. Having delta open in one terminal and helix open in another makes actively reviewing code more ergonomic.

deff looks intriguing, but I have yet to try it.

💡

If you use git as your SCM tool, How Core Git Developers Configure Git might be a useful read! It covers options like diff.algorithm=histogram and merge.conflictstyale=zdiff3 that can make git diffs easier to understand)

LLM Code Review

My experience has been that LLMs aren’t yet great at reviewing code. I’d estimate that 90% of the feedback I’ve gotten from LLM code review has been wrong or useless, but I’ve still found that remaining 10% valuable. When I use an LLM to generate code, I always include using a /review-fresh skill to review the code and address any problems before I look at it.

The one crucial trick to useful LLM code review is to use a different agent than the one that did the work to do the review! If you have the one that did the work do the review, its context will tell it that it solved the solution in the right way. (You can even use multiple agents with different LLMs and then have a different agent summarize the results, but I haven’t found much of a difference with that approach).

In Review

I’m sure we aren’t headed to a world where we can fully trust LLMs to generate code based on a written description of a problem and ignore the code. I’ve failed to communicate intent when working with other people and only realized the miscommunication when looking at the code many times! My co-workers and I are far better at understanding the context of a problem or project than an LLM is, and we can still get it wrong. As long as LLMs aren’t able to read minds, code review is going to matter. I think that means that the same best practices that people have always recommended for code review—small focused diffs and running code locally—will be crucial for effective software engineering.

How I Review LLM-generated Code: trunk-based dev, opening files locally, and the delta pager

Trunk-based Development

Look at the files, not just the diff

Use a good diff viewer

LLM Code Review

In Review