Skip to main content

AI Spec coded Python Flask Applications with and without prompt assistance


Spec coding

Like everyone else, I've been playing with using LLMs as an app generation tool and like most people, I have been disappointed. The code being spit out has not been very good. And like many others, I was interested in finding ways to improve code creation. 

Gemini.md

I had stumbled upon this medium article and then took a look at the Gemini CLI github repo. Much like claude, cursor and github copilot, it is possible to have standing instructions for Gemini. Just write up instructions on how you want your projects laid out, what language and libraries you want to use and how you want the LLM to prioritize things and in theory, Gemini should build an app closer to your spec.

After a using gemini-cli for a few weeks I started to gather a few prompt sets from reading what others have done with claude and cursor. Laying it out in a markdown file was easy enough but I was curious if it really had any real effect.

As a test, I decided to try the same prompt twice, once without a gemini markdown and once with. This is nowhere near a full and exhaustive test. It's just one LLM, one app, one framework, one prompt, one style of app and a single gemini.md file. 

The Prompt

The idea behind Spec coding is that if you can write user stories with functional and non-functional requirements you should be able to prompt an LLM to create an app for you.

Create a library tracking app using flask.

It should track three things, users who have a name, email address and phone number, books which have a title, publish date, author and isbn number and it should track checkouts of books by saving the date the book was checked out, who checked it out and when the book is due (10 days after checkout).

There should be CRUD forms for the users and books.

There should be a checkout form that lets your add a book id and user id to mark a book as checked out. it should not be possible to check out a book that is currently checked out.

There should be a return form to marking a book as returned. it should not be possible to return a book that is not checked out.

It should be possible to list all books that are overdue, what users have books and overdue books.

It should also be possible to list books that have the most and the fewest checkouts.

There should be a pre-loaded set of 10 books in the books table. Use some common books.  

Without prompt assistance

For the version without a gemini.md did not work after generation and it was a mess to deal with. Gemini was unable to repair the template issues. The app.py included models, routes and utility functions. Everything was in the application root directory with the exception of the html templates. The configuration for the database is main file. After loading it up into an IDE and doing a static analysis and style checking and I wasn't impressed at all. It is a flask app but it isn't a good flask app. Having an import for the forms in the middle of app.py file was particularly bad. 

With prompt assistance

Gemini did not provide a working example for this test on the first shot. It took two additional prompts to fix problems.

the endpoint "/user_checkouts" has an error "jinja2.exceptions.UndefinedError: 'now' is undefined" please diagnose and fix

the endpoint "/book_popularity" has an error "TypeError: '<' not supported between instances of 'method-wrapper' and 'method-wrapper'" please diagnose and fix

With those two additional problem, Gemini was able to diagnose and fix both errors. Whit that done, I have a working app. A working database driven webapp via a spec in less than ten minutes with the agent fixing its own bugs.

What does the code look like? Better. Follows pep8. Models, routes and utilities are separated into different files. There is a config.py file. There are missing docstrings but those are easy enough to fill in as you test. There are no unit tests but I refuse to let AI write my tests anyway. If I were to work on this app, I would be adding pytests as I went along and manually refactored and updated the app. Most importantly, it manages dependances via uv and pyproject.toml so I could easily add dev time dependancies for testing and deployment image creation. It didn't use pydantic for validation and there are no type hints for functions but overall, it wasn't bad as it used Flask and db.Model features for type checking. 

One little thing that that impressed me was the seed.py file to populate the database with test data. That was a nice touch.

You can see the code here. A copy of the gemini.md file I used is in library2.

Final thoughts

While I did get it to work and spit out a good starting point for an app, it doesn't seem to be that much better and approach when compared to AirTable, AppSheet or similar no-code tooling for database apps.  



Comments

Popular posts from this blog

Capturing text from any Mac Application into Emacs org-mode with Automator and org-protocol

After decades of using vi and Vim I switched to  Spacemacs  which is an amazing vi keystroke emulation layer running on Emacs and configured with an amazing set of preconfigured layers for different tasks. I decided to give it a try after seeing Org-Mode in action and seeing it was a nice taking system with integrations with almost anything imaginable. A few weeks ago I found out about org-protocol and followed this post  by Jethro on using a bookmarket to capture from the Web to Emacs.  This page assumes a few things You use Emacs on a Mac You are using org and understand how to use capture and capture templates. You need to yank text from random apps into Emacs You don't need to be using Spacemacs and this should work with any install of Emacs that supports org, org-capture and org-protocol. Creating Automator Action Start Automator. It's this icon. I'm guessing many people have had this for years and have never used it.  Open it and pick Quick Action Grab the...

Halloween Candy Distribution Robot Chute

I am not a hardware guy and my Brooklyn apartment lacks true workshop space but we were able to put together a reasonably well done candy chute robot able to deliver candy directly into Trick-or-Treat's candy bags.  My wife wanted the robot to blink lights and wave an arm. I decided to use a servo motor driven by a Raspberry pi pico running MicroPython. The pico and MicroPython were chosen because I had them already from prior projects with my son.  Legos, chopsticks and leftover screws. Only the best. Cardboard, aluminum foil and Tupperware to protect the electronics. Those are the bags of candy and we managed to go through all of the candy by dark. This is what it looked like up on our balcony. How do you get the candy down to the trick-or-treaters? A dryer duct. Last time we used plastic sheeting and zip ties. The $25 to get a duct was worth it. We tested it with fun sized chocolate, smarties, double bubble gum, skittles and m&ms. The bagged candy, skittles and M...

Using Google Colab for REST API exploration and testing

New York City's Office of Technology Innovation provides a collection of useful APIs that let you access City data. For the past few weeks I have been playing with the APIs looking for useful application ideas and I've been using Google's colab product for that exploration. These APIs are free to access and you can sign up here . If you are new to Colab, Google provides an introduction notebook that covers the basics. If you've used jupyter  with Python you should be good to go. While Colab is frequently used for data science and AI, I think this is a great platform for building internal tools. For one specific type of user, users with lots of domain specific knowledge who may not know an API or tool, Colab is useful as a way of bundling instructions and code in a way that allows them to be productive.   Minimal code example for connecting to an end point This is a minimal test script. It connects to an API, uses a secure way of holding the API keys and allows the use...