English language as the next programming language?

2025-06-06

Why natural language fails as code and what we need instead

Three years after ChatGPT's launch, I can't imagine coding without AI assistance. The productivity boost from intelligent auto-completion that predicts my next 3-5 actions is transformative. But while most engineers agree on the value of LLMs for coding, I'm interested in something deeper: how we think about using LLMs in software development at scale in a predictable and effective way.

I'm confident that most engineers actively writing code in this day and age, would agree with the fact that this technology is an all-around winner (ok, there are a few that probably won't agree, but that's ok too). But that's not what this blog post is about. The debate on using LLMs in programming will continue for ages to come, just like there are still people to this day that don't think JavaScript or Python are "real" programming languages or that balk at intelli-sense, priding themselves on memorizing all the method signatures and citing kubernetes deployment configurations by heart.

Organized Abstraction

When I write a program, the code written establishes certain laws. For example, I have a function called fibonacci that can accept an argument, and that argument must be a non-negative number, and it will return an array of numbers that will represent the fibonacci sequence starting from 1 up until that number. This is predictable and deterministic. And best of all, pretty readable for humans, especially if you're using a high-level language such as Python or JavaScript.

This is a powerful concept! We take this sequence of symbols that make sense to a human with a minimal amount of understanding effort, and this sequence of symbols that represent some functionality can be transferred to other humans. They can read it and understand it too! But what if you give this code as-is to a machine? Well, the machine can't really do much with it. That's where compilers or interpreters come in. These are mechanisms for taking this beautiful, human understandable code and putting it into a form that makes sense for computers; machine code! Now this is obviously an oversimplification, and we won't go into all the nuanced steps involved in transforming Python or JavaScript code into machine code, but for our purposes we'll stick with this simplification.

So lets take a step back. We have a high-level language that serves as a means for humans to organize their code, share their code and maintain their code, defining the "laws" of their program. We can even put several of these programs together and they can all intermingle, using each other's functionality as an API with high confidence that it will operate as expected and defined in the docs. This then scales up and turns into massive programs with at times millions of lines of code, operating with pretty high reliability. It truly is a beautiful concept.

So now that we have LLMs which can take human language and turn it into working code, we can use the same human language that we use to order a burrito at Chipotle as that higher level abstraction for our applications, right!?

Well, not really...

So why does this matter in relation to LLMs?

Writing code with an insanely smart LLM by your side can be great, but sometimes it can make you pull your hair out in frustration. What's the root of the frustration? Well sometimes the LLM goes off on these wild tangents, completing a task that they were asked to do, but in a completely incorrect way. For example, you're writing a function that needs to implement logic that takes an integer and renders the fibonacci sequence for that integer. More often than not, the LLM will literally re-implement the logic within the current function instead of looking for an existing implementation of that logic in the code base and deciding if it's appropriate to re-use it. Most times, you have to explicitly guide it to where that implementation is located. Another example is you'll ask the LLM to troubleshoot why a unit test is failing, and sometimes the LLM will go and change the actual code in the implementation just to make the test pass.

This is why I believe strongly that "English" language (or any other language for that matter) spoken by humans isn't a sufficient specification to guide an LLM. The thought that anyone with an idea and a ChatGPT subscription is going to be able to put together a full-stack application and turn it into a deliverable product is a fallacy. Human language is just too fuzzy and not made for the level of specificity that is required to build software. This is the reason I also believe that human replacement in the software development lifecycle is way overblown. Do I think that humans are superior to LLMs at writing software? No, absolutely not! Claude Opus 4 can write better code than most people I know and it can compete with the best of the best in coding competitions. But I also don't believe that LLMs alone are the future. The reason why is because what's better than just a human or just an LLM is a human that has honed the skill of working with an LLM. Taming it and being able to have the LLM bend to the will of the engineer in the driver seat. Having it carry out tasks with extreme precision, autonomously, while coming back and requesting feedback/confirmation before proceeding further. That's the future I see that will multiply a single human's efforts by 10X.

But how do we achieve that perfect blend of peanut butter and chocolate where human and digital labor are intermingled to lead to these unbelievable productivity booms? LLMs require a lot of guidance in order to be useful and that's the root of a lot of frustration. Imagine writing Python code, and having that code be converted to machine code in a non-deterministic manner. You would never be able to predict with high confidence that your code will execute the way you intended it to. Forget about sharing it with others, or even maintaining it. Scaling it is out of the question. So the question is, how do we effectively guide the LLM without having to type out our guidance to the LLM every time? How can we help the LLM understand the boundaries of the task, those "laws" that I mentioned earlier which we want the LLM to stick to. Tools like Cursor and Windsurf seem to do a better job than most in deriving your intent and fully grokking your code base, but still fall short many times. Even the best tools in the market seem to just go off on tangents, needing to be tamed and brought back on track, hopefully before too much damage was caused.

Taming the LLMs

I've found that mechanisms such as the .cursorrules file can be a very effective way of providing guidance to the LLM. This is a file that is ingested by the LLM every time you initiate a session. It's helpful because it allows me to define certain "laws" which make the LLM's output more deterministic. I've been experimenting with this for a while now, and I've found that it works great for implementing coding heuristics, for example nudging it towards only using functional React components, or conversely forcing it to use class components if your project has established that as a standard. What I've also been experimenting with is creating an outline of all the directories of my projects along with a LLM friendly description. Almost as a way of showing the LLM around the place before they start working. For example, in a monorepo, I'll clarify that the backend package has my backend code, which is written in Python, using FastAPI and specifying the entry point for my code while also specifying the package manager that I'm using. Similar to how I might speak to a junior engineer onboarding onto a project.

Now although this has proven to be effective, my concern is that this isn't being standardized in any way to my knowledge. Actually quite the opposite, every LLM powered tool seems to be implementing their own version. Some recommending markdown format, while others actually going as far as to recommend writing these "rules" in yaml (yikes). If LLMs are going to be here to stay, and we want them to be as effective as possible, this seems like something that should be standardized as soon as possible. These critical files that help guide the LLMs should be actively managed and monitored. They should be committed to the repo and subject to the same version control standards as the rest of the code base. I would imagine that it would also be helpful to monitor the performance of these rules files. How can we track how effective a change to the rules file was if we're not monitoring the prompt output quality as a result of that change? What would the metrics for that even be?

Going forward, these models are going to just keep being trained on more and more data. Much of which is going to be LLM generated itself in the future. If we want the models of the future to be able to take in something that represents a higher level abstraction (our prompt) and effectively convert that into code, then we need to all centralize on a standardized way of guiding these LLMs.

Now I'm not recommending we write a whole new language or protocol to serve as a means of guiding these LLMs better. Far from it, I think that would be a horrible idea. Whatever that "new" thing is would be completely novel to LLMs, therefore leading to worse results, not better.

I'm not entirely sure what the solution is, that is part of what I'm trying to explore while writing this. However, I have developed some ideas from my experience trying to tame these LLMs over the past couple years. One thing that I've certainly noticed is that LLMs are spectacularly good at understanding markdown. They do a really good job of understanding heading tags and the most appropriate type of elements to use based on the type of content in a specific section. Markdown seems to be deeply rooted in the knowledge of the LLMs. Therefore my recommendation, and what I always do, is write my .cursorrules files in well structured markdown. I've also seen that there is a huge benefit of specifying the package manager being used for the project, or for the workspace if working in a monorepo and workspaces are enabled. This is by no means an exhaustive list of all the scrappy little things I've found to work over the years, that'll probably be a separate blog post in itself.

Conclusion

LLMs are here to stay, and my prediction is they are only going to get more and more intertwined in our lives as time goes on. Especially for knowledge workers, and I don't think it will be a bad thing. But I strongly feel that we need to find a way to effectively tame the LLMs and orient them to the task at hand, which will be quite unique for each and every person, team and organization.

Whatever mechanism the community gravitates towards should be standardized, and ideally community driven and open. This will ensure that we maximize the circulation of ideas and actively improve. It should also be seamlessly integrated into developer tooling, requiring minimal effort and a short learning curve, to not create lots of friction for devs incorporating this tool into their workflow.

I'm an LLM optimist, I honestly think that the world will be a much more prosperous place with LLMs being accessible to the masses, and I believe LLMs will be a net-good for society. But that will only be the case because humans have played the role of leading and influencing that future, the LLMs are just a tool.

At the end of the day these LLMs are just another layer of abstraction on top of electrons which run majority of the world we see around us. The powerful thing about this abstraction is that it has the potential to expand the accessibility of a tremendously powerful technology to many more people in society...what a time to be alive!