Code Generation

Finally, we reached our last step in the compilation process. We finished lexing, parsing, static analysis, we have the IR, we did some optimization, and now there is only one thing left to do.

The code generation.

It is about converting our IR to a format the CPU can actually execute. First, we have to decide what kind of machine are we generating the code for?

Real or virtual?

If we decide to generate real machine code, it means we have lightning fast execution because the OS can directly load our code into the CPU and execute it. The problem is though, generating native machine code is extremely hard and takes a lot of work. Also, that would mean to write the code generation step for every supported architecture, which is — again — a huge amount of work.

As a workaround, we can choose to generate code for a virtual machine. This is not a real machine, it is just the result of our imagination. This way we can invent any kind of instruction set we desire.

If every instruction in the set are one byte long, we call it bytecode, and our virtual machine that executes it is a bytecode virtual machine. When we design our instruction set we usually try to map the low level instructions to the semantics of our language without tying them to any particular architecture, so it is enough to write just one virtual machine implementation across all supported platforms.

Next time I’ll talk about the virtual machine.

Stay tuned. I’ll be back.

Attila's Blog

About writing your own programming language in C, Go and Swift

Code Generation