mirror of
https://github.com/samsonjs/compiler.git
synced 2026-04-27 14:57:45 +00:00
updated README
This commit is contained in:
parent
7aa6fa7d15
commit
0298c9bbf3
1 changed files with 66 additions and 177 deletions
243
README.md
243
README.md
|
|
@ -2,7 +2,7 @@ sjs<br>
|
||||||
[sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)<br>
|
[sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)<br>
|
||||||
|
|
||||||
published : 2009-09-22<br>
|
published : 2009-09-22<br>
|
||||||
updated : 2009-09-24
|
updated : 2010-01-19
|
||||||
|
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
|
|
@ -13,91 +13,81 @@ on the tutorial "Let's Build a Compiler" by Jack Crenshaw[1].
|
||||||
|
|
||||||
[1]: http://compilers.iecc.com/crenshaw/
|
[1]: http://compilers.iecc.com/crenshaw/
|
||||||
|
|
||||||
The semantics are simple and familiar to all programmers. Eager
|
The semantics are simple and familiar to most programmers. Eager
|
||||||
evaluation, assignment with the equals sign, arithmetic using + - *
|
evaluation, assignment with the equals sign, arithmetic using + - *
|
||||||
and /, loops, if/else statement, etc. Integers are the only data type.
|
and /, loops, if/else statement, etc. Integers are the only data
|
||||||
|
type.
|
||||||
|
|
||||||
|
While the parser still closely resembles Crenshaw's recursive descent
|
||||||
|
parser, back-end generates x85 machine code using a homegrown
|
||||||
|
assembler in ~1000 lines of Ruby (just 650 lines of real code).
|
||||||
|
|
||||||
NOTE: OS X is the only platform that compiles working binaries right
|
NOTE: OS X is the only platform that compiles working binaries right
|
||||||
now. ELF support for Linux coming soon.
|
now. ELF support for Linux coming ... eventually.
|
||||||
|
|
||||||
|
|
||||||
Compiling
|
Pre-requisites
|
||||||
========
|
==============
|
||||||
|
|
||||||
OS X
|
OS X
|
||||||
----
|
----
|
||||||
|
|
||||||
You need gcc, so install Xcode or use MacPorts to build gcc.
|
You need Ruby and gcc. Ruby is standard on Macs but you'll need to
|
||||||
|
install Xcode for gcc. You can also compile it yourself or use
|
||||||
|
MacPorts, or [homebrew](http://github.com/mxcl/homebrew).
|
||||||
|
|
||||||
Linux
|
Linux
|
||||||
-----
|
-----
|
||||||
|
|
||||||
You need ruby and ld which lives in the binutils package.
|
You need Ruby and ld - which lives in the binutils package.
|
||||||
|
|
||||||
% sudo aptitude install ruby binutils
|
% sudo aptitude install ruby binutils
|
||||||
|
|
||||||
That's it! The assembler is included in ~900 lines of Ruby (including
|
That's it!
|
||||||
comments).
|
|
||||||
|
|
||||||
|
|
||||||
You should be fine letting the build script detect your platform. If
|
Compiling
|
||||||
not append 'elf' or 'macho' to the command.
|
=========
|
||||||
|
|
||||||
|
The build script should detect your platform. If not append 'elf' or
|
||||||
|
'macho' to the command.
|
||||||
|
|
||||||
% ./build.rb filename.code [elf | macho]
|
% ./build.rb filename.code [elf | macho]
|
||||||
|
|
||||||
The resulting native executable will be called 'filename' and you
|
The resulting native executable is called 'filename' and you should be
|
||||||
should be able to run it directly.
|
able it run it directly.
|
||||||
|
|
||||||
% ./filename <return>
|
|
||||||
|
|
||||||
|
% ./filename
|
||||||
|
|
||||||
|
|
||||||
Syntax in 2 minutes
|
Syntax in 2 minutes
|
||||||
===================
|
===================
|
||||||
|
|
||||||
The recursive descent parser starts by parsing a block of code. A
|
The parser starts by parsing a block of code. A block consists of one
|
||||||
block consists of zero or more statements. Whitespace is largely
|
or more statements. Whitespace is largely ignored beyond delimiting
|
||||||
ignored beyond delimiting tokens so statements can be grouped on one
|
tokens, so statements can be grouped on one line or spread out over
|
||||||
line or spread out over multiple lines. With no explicit terminator
|
multiple lines. With no explicit terminator this can look strange so
|
||||||
this can look strange so we will see how it works out when the syntax
|
we will see how it works out when the syntax evolves into something
|
||||||
evolves into something more complicated.
|
more complicated.
|
||||||
|
|
||||||
There are no functions or function calls, no closures, arrays, hashes,
|
There are variables and integers. That's honestly about it. There
|
||||||
or anything else you can think of.
|
are no functions or function calls, no closures, arrays, hashes, or
|
||||||
|
anything else.
|
||||||
|
|
||||||
Supported statements are:
|
Supported statements are:
|
||||||
|
|
||||||
* assignment
|
* assignment<br> e.g. foo = 4096
|
||||||
e.g. foo = 4096
|
* if/else<br> e.g. if x < 0 a=0 else a=1 end
|
||||||
|
* while<br> e.g. while x > 0 x=x-1 end
|
||||||
* if/else
|
* until<br> e.g. until x == 0 x=x-1 end
|
||||||
e.g. if x < 0 a=0 else a=1 end
|
* break<br> e.g. break
|
||||||
|
* repeat<br> e.g. repeat x=x-1 if x == 0 break end end
|
||||||
* while
|
* for<br> e.g. for i=1 to 5 x=x+1 end
|
||||||
e.g. while x > 0 x=x-1 end
|
* do<br> e.g. do 5 x=x+1 end
|
||||||
|
* print<br> e.g. a=1 print
|
||||||
* until
|
|
||||||
e.g. until x == 0 x=x-1 end
|
|
||||||
|
|
||||||
* break
|
|
||||||
e.g. break
|
|
||||||
|
|
||||||
* repeat
|
|
||||||
e.g. repeat x=x-1 if x == 0 break end end
|
|
||||||
|
|
||||||
* for
|
|
||||||
e.g. for i=1 to 5 x=x+1 end
|
|
||||||
|
|
||||||
* do
|
|
||||||
e.g. do 5 x=x+1 end
|
|
||||||
|
|
||||||
* print
|
|
||||||
e.g. a=1 print
|
|
||||||
|
|
||||||
Print is strange, it prints the last value calculated in hex and that
|
Print is strange, it prints the last value calculated in hex and that
|
||||||
is all. Please don't look at the implementation. ;-)
|
is all.
|
||||||
|
|
||||||
Supported operations are the following, in increasing order of
|
Supported operations are the following, in increasing order of
|
||||||
precedence:
|
precedence:
|
||||||
|
|
@ -114,140 +104,39 @@ precedence:
|
||||||
|
|
||||||
Parentheses are used to force a specific order of evaluation.
|
Parentheses are used to force a specific order of evaluation.
|
||||||
|
|
||||||
As far as booleans go 0 is false and everything else is true. Right
|
As far as booleans go, 0 is false and everything else is true. Right
|
||||||
now there are only integers so this makes sense.
|
now there are only integers so this makes sense.
|
||||||
|
|
||||||
|
|
||||||
Internals
|
Internals
|
||||||
=========
|
=========
|
||||||
|
|
||||||
I wasn't satisfied using an external assembler and outputing assembly
|
It wasn't much fun generating assembly text, so I wrote an x86
|
||||||
text so I wrote an x86 assembler in Ruby. It assembles just the
|
assembler library in Ruby. It implements just the instructions needed
|
||||||
instructions I need for this compiler, so it is by no means complete.
|
for this compiler and is by no means complete. It only does 32-bit
|
||||||
32-bit only and no prefixes are supported. It's basically just a
|
and no prefixes are supported. It's basically just a handful of
|
||||||
handful of instructions and mod-rm encoding. I use the system's
|
instructions and mod-rm encoding. I use the system's linker and have
|
||||||
linker and have no intention of writing my own, don't worry!
|
no intention of writing my own, don't worry!
|
||||||
|
|
||||||
The code currently consists of a recursive descent parser that outputs
|
ELF support is still in C and not published in the repository. The
|
||||||
x86 code in ELF binaries on Linux and Mach-O binaries on Darwin.
|
class to output Mach-O binaries is found in asm/machofile.rb.
|
||||||
Most of the code for outputing executables is Ruby, but ELF support is
|
|
||||||
still in C and not published in the repository. Classes to output
|
|
||||||
Mach-O and ELF binaries are found in asm/(elf|macho)writer.rb, but ELF
|
|
||||||
support is not implemented yet so binaries only compile and run on OS
|
|
||||||
X right now. ELF should come soon as I now have lights in my den. :)
|
|
||||||
|
|
||||||
Some major refactoring is needed as the project grew organically and
|
The asm/ directory holds the assembler but also the Mach-O code, for
|
||||||
in order to keep up with the tutorials I have not yet made radical
|
now. This is my first assembler and first time working with the x86
|
||||||
changes. The asm/ directory holds the assembler but also the MachO
|
ISA, so it probably isn't great. It outputs horribly inefficient code
|
||||||
and ELF code, for now. The assembler is a from-scratch implementation
|
and there are no optimizations.
|
||||||
in Ruby. This is my first assembler and first time working with the
|
|
||||||
x86 ISA, so it probably isn't great. It outputs horribly inefficient
|
|
||||||
code and there are no optimizations.
|
|
||||||
|
|
||||||
Hopefully I can reduce the number of lines by factoring more, but it's
|
I did not write this compiler with the intention of anyone else
|
||||||
pretty slim at ~3000 LOC. About 2100 of those are actual code. I did
|
reading it but there are a reasonable amount of comments.
|
||||||
not write this compiler with the intention of anyone else reading it
|
|
||||||
but there are a reasonable amount of comments.
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<th>Real Lines</th>
|
|
||||||
<th>Total Lines</th>
|
|
||||||
<th>Filename</th>
|
|
||||||
</tr>
|
|
||||||
|
|
||||||
<tr>
|
|
||||||
<td>87</td>
|
|
||||||
<td>112</td>
|
|
||||||
<td>build.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>617</td>
|
|
||||||
<td>891</td>
|
|
||||||
<td>compiler.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>12</td>
|
|
||||||
<td>29</td>
|
|
||||||
<td>asm/asm.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>569</td>
|
|
||||||
<td>843</td>
|
|
||||||
<td>asm/binary.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>197</td>
|
|
||||||
<td>319</td>
|
|
||||||
<td>asm/cstruct.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>4</td>
|
|
||||||
<td>6</td>
|
|
||||||
<td>asm/elfsymtab.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>4</td>
|
|
||||||
<td>8</td>
|
|
||||||
<td>asm/elfwriter.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>170</td>
|
|
||||||
<td>374</td>
|
|
||||||
<td>asm/machofile.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>95</td>
|
|
||||||
<td>163</td>
|
|
||||||
<td>asm/macho.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>19</td>
|
|
||||||
<td>28</td>
|
|
||||||
<td>asm/machosym.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>48</td>
|
|
||||||
<td>77</td>
|
|
||||||
<td>asm/machosymtab.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>19</td>
|
|
||||||
<td>25</td>
|
|
||||||
<td>asm/machowriter.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>16</td>
|
|
||||||
<td>25</td>
|
|
||||||
<td>asm/objwriter.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>20</td>
|
|
||||||
<td>31</td>
|
|
||||||
<td>asm/registers.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>42</td>
|
|
||||||
<td>66</td>
|
|
||||||
<td>asm/regproxy.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>56</td>
|
|
||||||
<td>89</td>
|
|
||||||
<td>asm/symtab.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>131</td>
|
|
||||||
<td>183</td>
|
|
||||||
<td>asm/text.rb</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>2097</td>
|
|
||||||
<td>3269</td>
|
|
||||||
<td><b>total</b></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
|
|
||||||
Happy hacking!
|
What next?
|
||||||
|
==========
|
||||||
|
|
||||||
|
Whatever interests me really, I don't know yet.. Right now I need to
|
||||||
|
clean up some of the code, now that object files of any size can be
|
||||||
|
generated and tests pass again.
|
||||||
|
|
||||||
|
|
||||||
|
Happy hacking!<br>
|
||||||
-sjs
|
-sjs
|
||||||
Loading…
Reference in a new issue