Skip to main content

AI Spec coded Python Flask Applications with and without prompt assistance


Spec coding

Like everyone else, I've been playing with using LLMs as an app generation tool and like most people, I have been disappointed. The code being spit out has not been very good. And like many others, I was interested in finding ways to improve code creation. 

Gemini.md

I had stumbled upon this medium article and then took a look at the Gemini CLI github repo. Much like claude, cursor and github copilot, it is possible to have standing instructions for Gemini. Just write up instructions on how you want your projects laid out, what language and libraries you want to use and how you want the LLM to prioritize things and in theory, Gemini should build an app closer to your spec.

After a using gemini-cli for a few weeks I started to gather a few prompt sets from reading what others have done with claude and cursor. Laying it out in a markdown file was easy enough but I was curious if it really had any real effect.

As a test, I decided to try the same prompt twice, once without a gemini markdown and once with. This is nowhere near a full and exhaustive test. It's just one LLM, one app, one framework, one prompt, one style of app and a single gemini.md file. 

The Prompt

The idea behind Spec coding is that if you can write user stories with functional and non-functional requirements you should be able to prompt an LLM to create an app for you.

Create a library tracking app using flask.

It should track three things, users who have a name, email address and phone number, books which have a title, publish date, author and isbn number and it should track checkouts of books by saving the date the book was checked out, who checked it out and when the book is due (10 days after checkout).

There should be CRUD forms for the users and books.

There should be a checkout form that lets your add a book id and user id to mark a book as checked out. it should not be possible to check out a book that is currently checked out.

There should be a return form to marking a book as returned. it should not be possible to return a book that is not checked out.

It should be possible to list all books that are overdue, what users have books and overdue books.

It should also be possible to list books that have the most and the fewest checkouts.

There should be a pre-loaded set of 10 books in the books table. Use some common books.  

Without prompt assistance

For the version without a gemini.md did not work after generation and it was a mess to deal with. Gemini was unable to repair the template issues. The app.py included models, routes and utility functions. Everything was in the application root directory with the exception of the html templates. The configuration for the database is main file. After loading it up into an IDE and doing a static analysis and style checking and I wasn't impressed at all. It is a flask app but it isn't a good flask app. Having an import for the forms in the middle of app.py file was particularly bad. 

With prompt assistance

Gemini did not provide a working example for this test on the first shot. It took two additional prompts to fix problems.

the endpoint "/user_checkouts" has an error "jinja2.exceptions.UndefinedError: 'now' is undefined" please diagnose and fix

the endpoint "/book_popularity" has an error "TypeError: '<' not supported between instances of 'method-wrapper' and 'method-wrapper'" please diagnose and fix

With those two additional problem, Gemini was able to diagnose and fix both errors. Whit that done, I have a working app. A working database driven webapp via a spec in less than ten minutes with the agent fixing its own bugs.

What does the code look like? Better. Follows pep8. Models, routes and utilities are separated into different files. There is a config.py file. There are missing docstrings but those are easy enough to fill in as you test. There are no unit tests but I refuse to let AI write my tests anyway. If I were to work on this app, I would be adding pytests as I went along and manually refactored and updated the app. Most importantly, it manages dependances via uv and pyproject.toml so I could easily add dev time dependancies for testing and deployment image creation. It didn't use pydantic for validation and there are no type hints for functions but overall, it wasn't bad as it used Flask and db.Model features for type checking. 

One little thing that that impressed me was the seed.py file to populate the database with test data. That was a nice touch.

You can see the code here. A copy of the gemini.md file I used is in library2.

Final thoughts

While I did get it to work and spit out a good starting point for an app, it doesn't seem to be that much better and approach when compared to AirTable, AppSheet or similar no-code tooling for database apps.  



Comments

Popular posts from this blog

Halloween Candy Distribution Robot Chute

I am not a hardware guy and my Brooklyn apartment lacks true workshop space but we were able to put together a reasonably well done candy chute robot able to deliver candy directly into Trick-or-Treat's candy bags.  My wife wanted the robot to blink lights and wave an arm. I decided to use a servo motor driven by a Raspberry pi pico running MicroPython. The pico and MicroPython were chosen because I had them already from prior projects with my son.  Legos, chopsticks and leftover screws. Only the best. Cardboard, aluminum foil and Tupperware to protect the electronics. Those are the bags of candy and we managed to go through all of the candy by dark. This is what it looked like up on our balcony. How do you get the candy down to the trick-or-treaters? A dryer duct. Last time we used plastic sheeting and zip ties. The $25 to get a duct was worth it. We tested it with fun sized chocolate, smarties, double bubble gum, skittles and m&ms. The bagged candy, skittles and M...

Capturing text from any Mac Application into Emacs org-mode with Automator and org-protocol

After decades of using vi and Vim I switched to  Spacemacs  which is an amazing vi keystroke emulation layer running on Emacs and configured with an amazing set of preconfigured layers for different tasks. I decided to give it a try after seeing Org-Mode in action and seeing it was a nice taking system with integrations with almost anything imaginable. A few weeks ago I found out about org-protocol and followed this post  by Jethro on using a bookmarket to capture from the Web to Emacs.  This page assumes a few things You use Emacs on a Mac You are using org and understand how to use capture and capture templates. You need to yank text from random apps into Emacs You don't need to be using Spacemacs and this should work with any install of Emacs that supports org, org-capture and org-protocol. Creating Automator Action Start Automator. It's this icon. I'm guessing many people have had this for years and have never used it.  Open it and pick Quick Action Grab the...

LinkedIn should have an introduction feature

I have been spending more time on LinkedIn the past couple of months and every time I want to facilitate an introduction between people I find myself thinking there should be a better way to do it. It's somewhat annoying that a platform for networking lacks a way to link people in.   Right now, LinkedIn lets you message a person and include in that message, a third person's contact. This would be ok as an introduction but it lacks what I think are two important features that should be part of LinkedIn's communication options-protection of the privacy and seeking the consent of the two people being introduced.  As an example, let's say I wanted to introduce a student I know named Adam to a business strategist who is looking for a part time worker. We'll call him Bob . I know Bob will find Adam to be the perfect fit-a motivated communications student at a local school who is looking for a part time summer gig. Starting from Bob's profile, I would pick the more op...