A Structured Approach For Generic AI Codegen

#ai

Both professionally and personally, I’ve had to evaluate a number of different AI agents. Some have robust capabilities that require a deep understanding of all of their features to use them efficiently. Some systems are deep into buzzwords that require layers of learning before we’re effective. Most importantly, I’ve had contacts and coworkers who have been generally underwhelmed with the capabilities of AI agents in general they experience misdelivery and a net-negative time savings through continuous correction and redirection. That experience is the one I aim to counteract here.

If you’re a developer who is still early in your Agentic AI journey and just want to know how to make your agent, any agent, get you off the ground and make your job more effective – today – this modest guide is for you.

A Simple Formula

Your AI agent can work and think faster than you, but fundamentally it is not capable of a thing that a human is not on a long enough timeline.

In a way, your agent is a thinking thing. Like you, it works better when you can structure things. It thinks more clearly when you scope a problem properly. It works most effectively in stages. Don’t abandon the strategies you have for human planning and collaboration. Instead, use the AI to accelerate them.

This simple strategy has massively improved my effectiveness when working with AI agents, useful even with simplistic agentic interfaces with limited ability to discover and “understand” a codebase.

  1. Write your intended system design in natural language, and save it as a Markdown file.
    • State the intent of the change, and how it fits into the current system.
    • Describe the technical requirements and constraints.
    • Describe the desired behavior of the system.
    • Include technical constraints if they exist, but avoid expressing them in a way that constrains the implementation unnecessarily. For example, prefer “Foo and Bar events should be emitted sequentially” over “Use a Mutex to serialize access to Foo and Bar”.
  2. Provide this file to your agent and ask it politely to write an implementation plan.
    • e.g. “Please review @feature-foo-reqs.md the following design document and provide an implementation plan for this change applied against the current codebase, expressed as a Markdown file ‘./feature-foo-impl.md’. If any part of it is too unclear, leave TODOs in the doc and ask me follow-up questions.”
  3. Review that plan, and make any necessary adjustments.
    • For minor corrections, edit the file by yourself.
    • If major corrections are needed, make improvements to your requirements document to correct mistakes or add clarification or constraints. Then start a NEW chat and repeat the request.
    • For corrections to the plan beyond modest complexity, it is generally more effective to improve your source material and try again than to ask for adjustments. The longer your conversation gets, the less efficient it will be, and the more the agent will wrestle with the previous incorrect conclusions and context while trying to address your new corrected ones.
    • Repeat as needed until the plan is acceptable.
  4. Start a new discussion and ask the AI agent to implement the plan. and ask the agent to implement the plan.
  5. Review the generated implementation. If the output looks good, great, you win! Otherwise, return to step 3 and continue refining.

If you try to skip from 1 to 4, you will almost always end up with an unsatisfactory result where you either end up arguing with the bot, or you have to iteratively ask for corrections, with progressively diminishing effectiveness as you flood the agent’s context window with more details and ideas that conflict with its previous conclusions.

Benefits

This approach gives you multiple benefits beyond pure conversational generation:

  1. You declare what matters most. It fills in the details to get there.
  2. Iterating on the plan is much easier and more reliable than iterating on code
    • You can always throw away the implementation and try again if you missed something imporatant.
    • Arguing with and correcting the AI will only result in progressively worse results as your conversation continues. You control the context here; keep things focused.
  3. Plan execution is a different phase with different context than plan creation, keeping context limited.
    • That means your agent can execute with a “clear mind”, focusing on how to achieve what was already agreed, rather than being distracted by everything you discussed, refined, and discarded.
    • (It’s like mapping everything out the night before, and executing in the morning when you’ve had some sleep.)
  4. The plan can be kept for future reference on what was build and why
  5. The plan can be re-executed at a later date if needed
  6. You can use different models or even different agents for planning and implementation phases, leveraging each to maximum strength.

The advantage of working from these intermediate files is that you can review and correct its interpretations. Having the agent turn your fuzzy design into a concrete design happens rapidly. When you’re used to this, you will get a quality yield.

But First: Seed your Context

Your agent will never be effective nor efficient if it has to rediscover your project with every conversation. Your project’s README.md is the starting for humans. Your AI memory file is the starting point for agents.

Claue Code uses CLAUDE.md

If using Claude Code, use /init and let Claude discover everything to make a CLAUDE.md. Claude will continue to maintain this file indefinitely as you work, keeping memory and map of what it needs to know.

Other Agents use AGENTS.md

If using most other agents, ask it politely to refer to https://agentds.md generate an initial AGENTS.md file. Past that point, you’ll have to maintain it by hand.

Create a different Markdown file for the agent to use, laying out basic technical information that it needs to know how to navigate, such as:

  • What kind of project this is
  • Project purpose and scope
  • Key folder structures and component locations
  • Component functional relationships
  • Code conventions and similar constraints
  • How to locate and run tests

Keep it brief. Keep it factual. This is your agent’s “memory”, but it’s one that you can review, modify, maintain, and refine. If mistakes are made, and necessary details are missing, update that memory so it has a smoother time next time. Set your agent up for success.

Q & A

Q: How does this differ from vibe coding?

A: Generally with vibe coding, you’re creating an implementation directly from ideation. That can work great for rapid prototyping, but you’re often pushing things in a circle or don’t have clear validation loops. By creating a document, you have a clear point of reference and something you can validate against and iterate upon.

Q: Aren’t you describing spec-driven development?

A: Pretty much, though much less formally. This is a practical approach I’ve been employing for some time now, before I even did that term. I’m starting to explore github/spec-kit, but that’s a heavy formalization of what’s ultimately a very simple and generic concept, which I’m trying to capture here.

Example

This branch of gsprdev/pi-voice-keyboard was created specifically to illustrate this approach.

Here is a trimmed version of the corresponding converation.

NOTE: I made no attempt to constrain or structure my language for the sake of sharing it here. This is a real, honest interaction, imperfections and all. The bi-directional nature of the discussion and negotiation is important. Where I miss little details in my requirements, it asks for clarification. I find small mistakes it made. This happens rapidly, with the effortlessness which comes from conversation with a technically-minded peer.

ME

The following is an informal design and requirements for a feature to be added. Please formalize this into an implementation plan as one or more markdown files. If any parts are unclear, specify [TODO: CLARIFY] in the document(s) as a placeholder, and ask for clarification.

Currently, the Push to Talk service only accepts a single transcription server in its configuration. This feature would add support for specifying multiple servers in order of preference. Let’s add functionality to accept multiple possible servers to communicate with, and use the health endpoint to check for their uptime. The list of servers should be ordered such that the user can specify them in the order of preference. The current inline health check at the start of recording is of limited value. We want to monitor proactively and select the right server before that event is reached. Access to and selection of a particular server should be gated by this new dedicated health monitor. The servers should be monitored for health at a particular interval, and that interval should also be configurable. The default interval should be 10 seconds with a timeout of 200 milliseconds. The monitor should run in parallel to the actual event handling. Should any given transcription service be unavailable for the time that transcription services are required, the next server in preference should be used. The idea of this feature is to allow for multiple possible services in the local network to be utilized when available, with slower fallbacks activated. On a longer timeline, this will help support the goal of falling back to local, albeit slower and lower quality, transcription services local to the device when all else fails.

CLAUDE

Open questions (TODOs requiring clarification)

  1. Backward compat for PTT_SERVICE_URL — rename or support both?
  2. Startup blocking — block until first healthy server found, > or start immediately with _selected = None?
  3. No-server behavior in start_recording() — skip recording, > or record and fail later?
  4. Transcription fallback granularity — try all remaining > healthy servers, or just one next?

ME

Good questions.

  1. Don’t worry about backward compatibility on the configuration points. We control the default deployment environment.
  2. Don’t block startup, unavailability of the transcription service at service start should be functionally equivalent to later unavailability when a remote service goes down.
  3. Proceeding through the error condition into recording is actually a bug that you found. Let’s be sure to terminate in that case. Also, it’s clear that the error condition s too frequently and should be spaced. While we’re in here making these adjustments, let’s add a gap of 100 milliseconds in between each blink/beep.
  4. I agree that greedy retry through all currenly known-healthy servers would be best, to minimize the chance of losing a reporting event. In the case that all fallbacks fail, issue a two pulse error code as opposed to the triple that you get when a server is unavailable.

CLAUDE

(Produced a new version of the plan)

Ready to implement whenever you are.

ME

I’ve manually reviewed your plan and have some feedbck.

Please reject startup initiation in the case of malformed URLs.

Why is active_server tracked during start_recording()? I understand the idea of a quick sanity check to make sure that there’s at least one active server before you start recording. But the correct server to use after recording finishes is a separate matter. Are we really getting value from external state across these events?

Also, while the design seems functional, I’m not thrilled with the idea of tracking server’s statuses through multiple dictionaries. That seems like it would be difficult to maintain and extend. Can we consider putting them in a list and using an accessor to access the and access the latest healthy one whatever needed?

Finally, it would be best if the server monitors didn’t run on a single thread, but each spawned their own. The number of possible endpoints and therefore the number of monitor threads is known at start of time. Let’s run them independently so that if one of them is delayed or otherwise have complications, the other servers can still be recognized as healthy and recover. This is important to minimize the impact of an unhealthy server on the healthy ones.

CLAUDE

(Makes major improvements to the plan)

ME

Looks great. Please proceed with implementation.

CLAUDE

(Nails it)


See https://github.com/gsprdev/pi-voice-keyboard/compare/main...example/multi-service for the plan document and the resulting implementation generated by Claude within my existing project.