首页 > 解决方案 > A basic lesson about compilers

问题描述

for a couple of months I tried to understand the how compilers work. For now I don't care about syntax analyzing, splitting it into tokens and creating an AST. I want to focus at the creation of executables. As far as I know, compilers split the code up and translate it into assembly instructions, which are then somehow transformed into executable code.

I want to create a basic compiler for a custom programming language. Is the only possibility to do this, to take an assembler, transforming my source code into assembly language and then assembling it? Or is it possible to create executables without using third party tools. The only code snippet that is missing for my project is how exactly the EXE file is created from assembly instructions.

I know that this is a very complicated topic. That's why I was looking for techniques to create a cross platform compiler. I also wondered if I could take a compiler like g++, transform my code into C++ code and compiling it with g++. That is an alternative plan but still not that what i wanted to create. Do I have to write a compiler / an assembler for each individual processor architechture and how can i do this basically? After a few months of research about this, I finally decided to ask people who have better knowledge about this topic.

I hope you can light up my mind. :)

greetings BraunBerry

标签: compiler-construction

解决方案


Your question is mostly off-topic. However, a good book about compilation is The Dragon Book. (And you could also read Scott's Programming Language Pragmatics and Queinnec's Lisp In Small Pieces)

As far as I know, compilers split the code up and translate it into assembly instructions, which are then somehow transformed into executable code.

It is much more complex than that. Compilers are practically (and many times) transforming some internal representation for optimization purposes (and optimization is an important but difficult topic, and that is why there are few C compilers).. For example, most of GCC optimization passes (GCC has hundreds of them) are transforming Gimple to Gimple (e.g. for inlining, loop unrolling, etc).

That's why I was looking for techniques to create a cross platform compiler. I also wondered if I could take a compiler like g++, transform my code into C++ code and compiling it with g++.

In general, many people are using C as a portable target programming language, not C++. This answer explains more. Actually, it might be difficult to generate genuine good smelling C++ code (e.g. using C++ containers and smart pointers). At last, your system C++ compiler might need a lot of time to compile your generated C++ code (in other words, C++ is slow to compile).

Or is it possible to create executables without using third party tools.

It might be possible, but why do you want to avoid third party tools? Notice that many compilers are at least using assemblers and linkers (and both qualify as "third party tools"). If you choose to generate C (probably a good choice), the C compiler that you would use is a third party tool (and a quite big one!).

If you want to generate executables directly by yourself (I don't recommend doing that, it is a lot of work), you need to understand precisely the file format of executables (which is operating system specific), such as ELF or PE. I recommend Levine's book Linkers and Loaders. You may also need to understand how to do system calls for your OS (so read Operating Systems: Three Easy Pieces), and you'll need to implement some standard library for your language. And dynamic linking complicates things.

And you could consider a JIT-translating library, such as libgccjit and others (mentioned here).

Do I have to write a compiler / an assembler for each individual processor architecture and how can i do this basically?

Most compilers deal with that issue by defining some target-neutral intermediate representation (e.g. Gimple for GCC). Most optimizations are done on (and using) that intermediate representation.

PS. In your case, I strongly recommend building your compiler for and above Linux, since Linux is made of free software whose source code you can study. If you use Windows, which is proprietary software, some details are not public and are important to you, and you'll need a lot of time to reverse-engineer them.


推荐阅读