AI Coding: Lessons Learned

an image of a human and robot shaking hands

I've been developing a mobile app that wil soon be available for you to try. I started it months ago as an exercise to better learn React Native and to try out Supabase as a back end. It was going along swimmingly, but kind-of in fits and spurts as I had time. Fast forward to a couple of weeks ago, and I decided to figure out how to make LLMs* do more of the lifting for me more effectively. Up to this point I had certainly made plenty of use of Claude and ChatGPT to fill in some gaps in my knowledge and to explore concepts and libraries that were unfamiliar to me. But I wanted to see how they would perform doing the majority, if not all, of the code generation. So I started the project over from scratch and committed to taking it as far as I could and evaluate the outcomes as objectively as possible.

To be clear: This is not a novel tutorial on LLM-assisted coding. A lot of what you'll read is just my version of what lots of developers have already published. I do think the details of my approach may offer some twists that may be worth exploring and iterating on, but the goal of this post is more to reflect on the process.

Know Thyself #

I'm a software developer by trade. When I have an idea, my first instinct is to write code to figure out how to bring the idea to life. In this exercise, that's likely the worst thing I could have done. My experience with LLMs is that they tend to write code that doesn't keep in mind the big picture. It's like giving coding tasks to Amelia Bedelia - very point-solution focused, and often times using quasi-outdated information. A lot of folks have equated it to writing code with a junior developer, but I think that may be a little generous. A good junior developer learns at an exponential rate and can ask good questions. LLMs project the confidence that comes with knowing enough to be dangerous but not enough to be humble. They often make the same mistakes over and over and rarely write any code that isn't the path of least resistance. What I mean by that last sentence is that they don't spot opportunities to refactor and create abstraction unless very specifically directed. So, to make the most of the capabilities of the LLM, understanding the shortcomings, I decided to act way more like a product manager and chief architect at the outset of the project. I spent a significant amount of time really focused on planning.

Plan, Plan, Plan #

The first thing I did was create a press release that would get people excited about the functionality I wanted to bring to life. I gave ChatGPT the name of my app and a bulleted list of functionality and had it write up a quick blurb. I then took that blurb and fed it to an AI UI generator that spat out a decent representative React app. I can't even remember the product I used. That's how commoditized the space is already. Now that I had a press release and a mocked-up React app, I started to gather artifacts into a project directory on my laptop. I created a directory in the project directory called planning and dropped in what I had so far.

Next up, I created user stories that I would use to direct my robot overlords. I know from experience that good user stories make a development team run smoothly and create the best products. It turns out that communication is the hard part of almost anything we do. I created all the user stories to describe the functionality of my app. It was a tedious exercise describing everything that could possibly happen in the app in words, and I didn't nail it 100% on the first try. But that's OK. The main idea was to get the stories laid out in the order they needed to be executed and I could add or modify later stories. I created a YAML file for my user stories that looked something like this:

user stories:
  sign in:
    story: As a user I should be able to login with my phone number. When I do, I should receive a text with a one-time use code that will log me into the app.
    acceptance criteria:
      - I should be able to login with my phone number
      - I should receive a text with a one-time use code that will log me into the app
      - I should be able to login with the code in the text

The final bit of planning that I did was create another YAML file called architecture_decisions.yml. This is where I put the guardrails on the project as the engineer in charge. I believe this is what separates the output of this whole exercise from that of vibe-coders without a software development background. By giving the LLM this very specific technical direction as context for tasks - particularly early on - I was able to establish a much more manageable code base - certainly not as clean as I would produce, but WAY less of a rat's nest than I've had in previous attempts. Here's a quick flavor of what the file looked like:

technical design:
  tech stack:
    database: supabase
    api: supbase
    authentication: supabase
    authorization: supabase
    sms provider: twilio
    frontend: react native
    state management: jotai
    swiping gestures: react-native-reanimated
    base ui component library: mui
  data model:
    ...
  authorization rules:
    - ...
  technical direction:
    - reusable components
    - centralized theming
    - fetching lists from and API should be paged in chunks of 10
    - we avoid re-fetching data when possible (in other words, update local state when possible, and push changes to the database when necessary)
    - API interactions should be organized by resource
    - API interactions should only be invoked from Jotai atom definitions
    - All database changes should be managed through Supabase migrations
    - All code should have 90% test coverage
    - All code should be written in TypeScript
    - All code should be written in a functional style
    - All code should be written in a way that is easy to understand and maintain
    - All code should be written in a way that is easy to test
    - All code should be written in a way that is easy to deploy
    - All code should be written in a way that is easy to scale

The main idea of this planning phase is that I treated the situation as I would any time I'm leading a team: Set it up so developers can make good decisions independently. If a developer understands the big picture, has specific UX direction, and has internalized the "why" behind technical decisions, they have everything they need to evaluate possible choices and choose one that is congruent with the goals of the project. Interestingly the concepts of good leadership remain constant regardless of who or what is doing the work.

OK, Now Code #

I chose to use Cursor for my IDE. The specific IDE isn't terribly important. I paid the $20 to upgrade my account with Cursor so I could make it through the project without running into limits. The main idea is that I used an IDE that would take prompts with context and write code, using web browsing as necessary. The flow started with a prompt for a basic set-up of the app:

Keeping in mind the technical direction in `planning/architecture_decisions.yaml`, create a basic React Native app. Use node 22 by executing nvm use 22 before any other commands.

I then took a minute to set up a Supabase project and Twilio for authentication, configuring the two as needed for my project, and setting up a .env file. This is where being a software developer with experience makes the process much less clumsy and frustrating than it would be otherwise. If this step sounds like giberish, I'm sure an LLM can direct you. I'm also sure it isn't likely to nail it on the first try.

Now that the basics for the app were set up, the flow was simple:

git checkout -b feature-name
Prompt Cursor with the context of the planning directory: Execute the user story labeled "feature name" keeping in mind the look and feel of the mock react app and the architecture decisions listed.
Grab some coffee or do some push-ups while the robot writes code.
Run the app and validate the functionality like any good product manager does. Check for UX improvements or lapses like any good designer does.
If necessary, prompt Cursor with Refinements: ... or I got this error....
Repeat until it's good or until it's apparent that it won't be good.
git add . && git commit -m "some message" # so on and so forth

One thing that I found challenging was, of course, when things didn't end up how I wanted and weren't getting better with iteration. On a couple of occasions I had to ditch the branch I was on, go concoct some wireframes with draw.io (or similar), add those to the planning directory, and start a new version of the feature branch with the extra context. That usually worked. In those moments I was frustrated that the extra steps were required. But it also just shows how a picture really is worth a thousand words and how important UX designers are in the process of making software.

Evaluating the Output #

There are a couple of ways to evaluate the output of this exercise:

The user experience and functionality of the app: Objectively, the functionality of the app is great. The planning and explicit instructions paid off. The UX is mostly good, and it is relatively easy to add functionality with prompts and pictures.

The quality of the code: I would give the quality of the code a C-. It passes only because it's functional. The LLM was able to adhere to most of the direction, but I'd hardly call it clean code. There's a lot of headroom for DRY-ing things up and making the app generally easier to maintain.

I would say that overall, how "good" the outputs of this exercise were depends on your objective. If the goal is simply to get something out the door somewhat quicker and with less keystrokes, this resulted in great output. If the goal is to be able to pivot on a dime later, that's going to be more challenging.

Evaluating the Experience #

The experience was interesting and educational. I'm learning to be more effective with the tools. As a craftsman, the resulting code hurts my heart, and I can see so many potential pitfalls and frustrating situations awaiting whoever wants to iterate on this app. It reminds me a little of the early days of the web with WYSIWYG drag-n-drop HTML generators. Perhaps this is also part of my learning curve as a user. I would also say that it's just not as fun to write an app this way because nothing about it is my own creative problem solving outside of occasionally re-directing an agent with different instructions. The only reason this exercise was at all successful was because I have a bunch of experience working with great product managers and UX-ers and I channeled those people alongside my software development experience. But I chose to be a software developer, not a product manager. I actually enjoy trusting others on my team to take on those responsibilities and being part of their feedback loop. And I like when there are hard problems to solve that require creativity and persistence. Finally, the speed of development was only marginally faster, and that was only because of the level of detail in the planning phase.

Where does that leave us? #

I've said it before, and I'll say it again: Use the right tool for the job. Developing in this way isn't good or bad. It's just another tool. I don't think that it should be considered the right tool for every job, but certainly it has broad application. For simple tasks and POC, it's awesome. For more complex, it becomes less so, but still not useless. For extending unfamiliar code bases, it could be a time-saver, but it could also confidently lead you to a production outage.

The irony, to me, is that using the tool effectively in this exercise depended a LOT on my experience developing so many apps with great teams over 20+ years. So if young developers today lean on LLM-assisted coding heavily, they will never gain the experience that comes from going deep and grinding through the frustrating parts of creative problem-solving. It's a bit of a catch-22 of not-so-blissful ignorance. Maybe you could get the robots to bridge the experience gap, but that depends on an awareness of what "good" looks like.

I'm curious to see how this pattern changes the way we think about and manage the software development lifecycle and how it changes the roles and responsibilities of product development teams. I'm also curious to see how it affects UX expectations of customers. I'd say we're pretty squarely in the "storming" phase of teaming up with the robots - at least I am. We'll get to harmonious productivity by necessity. Every tool and technique has a learning curve. The good ones and the well-marketed have a shiny-object adoption phase. My hope is that this post will help folks navigate this stage of the game a little more thoughtfully and gracefully.

* I use the term "LLM" instead of "AI" very much on purpose. I find the term "AI" to be misleading and borderline dangerous. I used it in the title of this post simply because it's a recognizable term. The tool is most certainly not intelligent. It's a handy tool that does some cool things with producing useful text pretty reliably. But it's definitely not thinking. The term "AI" is useful shorthand and great for marketing, but it implies much more value than is actually there.

Previous: My Trip to Germany
Next: Music