It's 2026, and I'm still leery of using LLMs to generate code

LLMs can be fantastic at generating code.^[1] They can occasionally generate perfect solutions so quickly it feels like magic. When an LLM works well, it can be awe-inspiring and terrifying.^[2]

I'm still leery of using them to generate code.

None of my reasons for being cautious of LLM-generated code are unique, but I still want to write them out because writing helps me think deeply and assess my own thoughts, which leads into the first reason I'm cautious about using LLMs to generate code. The process of writing code sparks ideas and forces me to grapple with the details of whatever I'm trying to do: What should the primary key actually be? What happens when a user deletes their account after making a post? Given my knowledge of the system, will we need to worry about performance? How will we verify that it's working correctly? Even adding a boolean field to a system is the sort of thing that can sometimes have surprising edge cases that someone needs to think through.

When I review LLM-generated code, I do have a chance to think through edge cases and details to make sure the LLM has made reasonable calls. This is helpful, but I think less deeply when reviewing code than I do when writing code. It's hard to review the bits of the solution that aren't there, both because it's more effort to actually open up the changed files to see the full context of the diff, and because I can't see the avenues that the LLM didn't pursue—I can only see the diff. It's easy for a diff to hide that some functions that should have been updated weren't or that there was a better way to accomplish a task or that a race condition exists. I've seen plenty of reasonable-looking LLM PR diffs that failed to hold up when I actually investigated a problem myself. Over time, I've gotten better at the skill of reviewing LLM-generated code, but I still struggle to think as deeply when reading as I do when writing.

Attempting to write code is also how I learn a codebase or tool. I run into rough edges, discover the approaches that don't work to accomplish my goals, learn the tools that people use in the codebase, and build the background knowledge that's necessary to solve the problems that LLMs can't. Building that understanding of how things fit together is a first step to being able to actually review code. Once I have that understanding, I'm more confident using an LLM to solve problems in the codebase—I can write clear documentation for how things work, identify the edge cases ahead of time, and see the gaps in potential solutions. LLMs can make it easier to quickly understand a codebase—having a tool that can intelligently(ish) search the codebase and git history to find answers is amazing—but relying on them to generate code feels like it hampers my ability to understand how things fit together.

In general, I like writing code. As much as LLMs help me do that—acting as a complement to stackoverflow that has access to my codebase or a fast local code-reviewer—they make me happier. I'm most productive when I'm able to find flow, and I struggle to find flow when I act as an orchestrator and code-reviewer for multiple LLM agents. Even if that workflow is in theory more productive, and I don't believe it is, it's not more productive for me—I'm not an automaton who's able to find deep focus and flow on arbitrary tasks. A day where I'm able to find four hours of focused attention is going to be more productive than a day where I'm only able to find one hour of attention, even if that one hour of focus uses "more powerful" tools. Finding flow and enjoying the work I do leads to long-term speed; I find it pretty hard to be productive when I'm depressed.

There are situations where I truly do think it makes sense to use an LLM to generate code: times when the problem is entirely understood and the solution is easy to verify. Depending on the domain that you're working in, those situations might never come up, or they might come up all the time. In a codebase I worked in for over a decade, LLM-generated code felt more useful than a codebase I had only worked in for a week.

But there are also many situations where using an LLM to generate code doesn't actually help over the long-term. Doing so can inhibit thought, limit your ability to learn a system, and make it hard to find flow. Coding is thinking, not typing, so being able to quickly generate code isn't always helpful.

Claude Opus 4.5 and Claude Sonnet 4.5 (but I've tried other tools / models too). ↩︎
It feels "awful" in the original sense of the word: "full of awe; terrible and wonderful." ↩︎

It's 2026, and I'm still leery of using LLMs to generate code

Do you like fantasy?