For background, I am a programmer, but have largely ignored everything having to do with AI (re: LLMs) for the past few years.
I just got to wondering, though. Why are these LLMs generating high level programming language code instead skipping the middle man and spitting out raw 1s and 0s for x86 to execute?
Is it that they aren’t trained on this sort of thing? Is it for the human code reviewers to be able to make their own edits on top of the AI-generated code? Are there AIs doing this that I’m just not aware of?
I just feel like there might be some level of optimization that could be made by something that understands the code and the machine at this level.
You’re assuming ai writes useable code. I haven’t seen it.
Think of the ai more as a writing assistant, like autocomplete or stack overflow but more so. The IDE I use can autocomplete variables or function calls, but the ai can autocomplete entire lines of code or entire unit tests. AI might try to fit an online answer and related doc to solve a problem I’m seeing. AI might even create a class around a public api that is a great starting point for my code. AI can be a useful tool but it can’t write useable code
I think that’s the misconception people think they are going to give you a program if you just tell AI to do it.
Because they’re trained on open source code and stack overflow answers.
Neither of which are commonly written in assembly.
-
Machine code is less portable, as new CPU optimizations and instructions are released, its easier to update a compiler to integrate those in its optimizations than regenerate and retest all of your code. Also, if you need to target different OSs, like windows vs MacOs vs Linux its easier to make portable code in something higher level like python or java.
-
Static analysis to check for things like memory leaks or security vulnerabilities like sql injections are likely easier to do on human readable code rather than assembly.
-
Its easier for a human to go in an tweak code that is written in human readable language rather than assembly.
-
They would not be able to.
Ai only mix and match what they have copied from human stuff, and most of code out there is on high level language, not machine code.
In other words, ai don’t know what they are doing they just maximize a probability to give you an answer, that’s all.
But really, the objective is to provide a human with a more or less correct boilerplate code, and humans would not read machine code
The code still needs to be reviewable by humans and you can’t do that realistically with machine code. Somebody needs to able to look at the code and see what’s going on without just taking the AI’s word for it.
No one is training LLMs on machine code. This is sorta silly, really.
I think on top of this, the question has an incorrect implicit assumption - that LLMs understand what they produce (this would be necessary for them to produce code in languages other than what they’re trained on).
LLMs don’t product intelligent output. They produce plausible strings of symbols, based on what is common in a given context. That can look intelligent only in so far as the training dataset contains intelligently produced material.
You’re a programmer? Yes, integrating and debugging binary code would be absolutely ridiculous.