updated README

This commit is contained in:
Sami Samhuri 2010-01-20 00:00:07 -08:00
parent 7aa6fa7d15
commit 0298c9bbf3

243
README.md
View file

@ -2,7 +2,7 @@ sjs<br>
[sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)<br> [sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)<br>
published : 2009-09-22<br> published : 2009-09-22<br>
updated : 2009-09-24 updated : 2010-01-19
Overview Overview
@ -13,91 +13,81 @@ on the tutorial "Let's Build a Compiler" by Jack Crenshaw[1].
[1]: http://compilers.iecc.com/crenshaw/ [1]: http://compilers.iecc.com/crenshaw/
The semantics are simple and familiar to all programmers. Eager The semantics are simple and familiar to most programmers. Eager
evaluation, assignment with the equals sign, arithmetic using + - * evaluation, assignment with the equals sign, arithmetic using + - *
and /, loops, if/else statement, etc. Integers are the only data type. and /, loops, if/else statement, etc. Integers are the only data
type.
While the parser still closely resembles Crenshaw's recursive descent
parser, back-end generates x85 machine code using a homegrown
assembler in ~1000 lines of Ruby (just 650 lines of real code).
NOTE: OS X is the only platform that compiles working binaries right NOTE: OS X is the only platform that compiles working binaries right
now. ELF support for Linux coming soon. now. ELF support for Linux coming ... eventually.
Compiling Pre-requisites
======== ==============
OS X OS X
---- ----
You need gcc, so install Xcode or use MacPorts to build gcc. You need Ruby and gcc. Ruby is standard on Macs but you'll need to
install Xcode for gcc. You can also compile it yourself or use
MacPorts, or [homebrew](http://github.com/mxcl/homebrew).
Linux Linux
----- -----
You need ruby and ld which lives in the binutils package. You need Ruby and ld - which lives in the binutils package.
% sudo aptitude install ruby binutils % sudo aptitude install ruby binutils
That's it! The assembler is included in ~900 lines of Ruby (including That's it!
comments).
You should be fine letting the build script detect your platform. If Compiling
not append 'elf' or 'macho' to the command. =========
The build script should detect your platform. If not append 'elf' or
'macho' to the command.
% ./build.rb filename.code [elf | macho] % ./build.rb filename.code [elf | macho]
The resulting native executable will be called 'filename' and you The resulting native executable is called 'filename' and you should be
should be able to run it directly. able it run it directly.
% ./filename <return>
% ./filename
Syntax in 2 minutes Syntax in 2 minutes
=================== ===================
The recursive descent parser starts by parsing a block of code. A The parser starts by parsing a block of code. A block consists of one
block consists of zero or more statements. Whitespace is largely or more statements. Whitespace is largely ignored beyond delimiting
ignored beyond delimiting tokens so statements can be grouped on one tokens, so statements can be grouped on one line or spread out over
line or spread out over multiple lines. With no explicit terminator multiple lines. With no explicit terminator this can look strange so
this can look strange so we will see how it works out when the syntax we will see how it works out when the syntax evolves into something
evolves into something more complicated. more complicated.
There are no functions or function calls, no closures, arrays, hashes, There are variables and integers. That's honestly about it. There
or anything else you can think of. are no functions or function calls, no closures, arrays, hashes, or
anything else.
Supported statements are: Supported statements are:
* assignment * assignment<br> e.g. foo = 4096
e.g. foo = 4096 * if/else<br> e.g. if x < 0 a=0 else a=1 end
* while<br> e.g. while x > 0 x=x-1 end
* if/else * until<br> e.g. until x == 0 x=x-1 end
e.g. if x < 0 a=0 else a=1 end * break<br> e.g. break
* repeat<br> e.g. repeat x=x-1 if x == 0 break end end
* while * for<br> e.g. for i=1 to 5 x=x+1 end
e.g. while x > 0 x=x-1 end * do<br> e.g. do 5 x=x+1 end
* print<br> e.g. a=1 print
* until
e.g. until x == 0 x=x-1 end
* break
e.g. break
* repeat
e.g. repeat x=x-1 if x == 0 break end end
* for
e.g. for i=1 to 5 x=x+1 end
* do
e.g. do 5 x=x+1 end
* print
e.g. a=1 print
Print is strange, it prints the last value calculated in hex and that Print is strange, it prints the last value calculated in hex and that
is all. Please don't look at the implementation. ;-) is all.
Supported operations are the following, in increasing order of Supported operations are the following, in increasing order of
precedence: precedence:
@ -114,140 +104,39 @@ precedence:
Parentheses are used to force a specific order of evaluation. Parentheses are used to force a specific order of evaluation.
As far as booleans go 0 is false and everything else is true. Right As far as booleans go, 0 is false and everything else is true. Right
now there are only integers so this makes sense. now there are only integers so this makes sense.
Internals Internals
========= =========
I wasn't satisfied using an external assembler and outputing assembly It wasn't much fun generating assembly text, so I wrote an x86
text so I wrote an x86 assembler in Ruby. It assembles just the assembler library in Ruby. It implements just the instructions needed
instructions I need for this compiler, so it is by no means complete. for this compiler and is by no means complete. It only does 32-bit
32-bit only and no prefixes are supported. It's basically just a and no prefixes are supported. It's basically just a handful of
handful of instructions and mod-rm encoding. I use the system's instructions and mod-rm encoding. I use the system's linker and have
linker and have no intention of writing my own, don't worry! no intention of writing my own, don't worry!
The code currently consists of a recursive descent parser that outputs ELF support is still in C and not published in the repository. The
x86 code in ELF binaries on Linux and Mach-O binaries on Darwin. class to output Mach-O binaries is found in asm/machofile.rb.
Most of the code for outputing executables is Ruby, but ELF support is
still in C and not published in the repository. Classes to output
Mach-O and ELF binaries are found in asm/(elf|macho)writer.rb, but ELF
support is not implemented yet so binaries only compile and run on OS
X right now. ELF should come soon as I now have lights in my den. :)
Some major refactoring is needed as the project grew organically and The asm/ directory holds the assembler but also the Mach-O code, for
in order to keep up with the tutorials I have not yet made radical now. This is my first assembler and first time working with the x86
changes. The asm/ directory holds the assembler but also the MachO ISA, so it probably isn't great. It outputs horribly inefficient code
and ELF code, for now. The assembler is a from-scratch implementation and there are no optimizations.
in Ruby. This is my first assembler and first time working with the
x86 ISA, so it probably isn't great. It outputs horribly inefficient
code and there are no optimizations.
Hopefully I can reduce the number of lines by factoring more, but it's I did not write this compiler with the intention of anyone else
pretty slim at ~3000 LOC. About 2100 of those are actual code. I did reading it but there are a reasonable amount of comments.
not write this compiler with the intention of anyone else reading it
but there are a reasonable amount of comments.
<table>
<tr>
<th>Real Lines</th>
<th>Total Lines</th>
<th>Filename</th>
</tr>
<tr>
<td>87</td>
<td>112</td>
<td>build.rb</td>
</tr>
<tr>
<td>617</td>
<td>891</td>
<td>compiler.rb</td>
</tr>
<tr>
<td>12</td>
<td>29</td>
<td>asm/asm.rb</td>
</tr>
<tr>
<td>569</td>
<td>843</td>
<td>asm/binary.rb</td>
</tr>
<tr>
<td>197</td>
<td>319</td>
<td>asm/cstruct.rb</td>
</tr>
<tr>
<td>4</td>
<td>6</td>
<td>asm/elfsymtab.rb</td>
</tr>
<tr>
<td>4</td>
<td>8</td>
<td>asm/elfwriter.rb</td>
</tr>
<tr>
<td>170</td>
<td>374</td>
<td>asm/machofile.rb</td>
</tr>
<tr>
<td>95</td>
<td>163</td>
<td>asm/macho.rb</td>
</tr>
<tr>
<td>19</td>
<td>28</td>
<td>asm/machosym.rb</td>
</tr>
<tr>
<td>48</td>
<td>77</td>
<td>asm/machosymtab.rb</td>
</tr>
<tr>
<td>19</td>
<td>25</td>
<td>asm/machowriter.rb</td>
</tr>
<tr>
<td>16</td>
<td>25</td>
<td>asm/objwriter.rb</td>
</tr>
<tr>
<td>20</td>
<td>31</td>
<td>asm/registers.rb</td>
</tr>
<tr>
<td>42</td>
<td>66</td>
<td>asm/regproxy.rb</td>
</tr>
<tr>
<td>56</td>
<td>89</td>
<td>asm/symtab.rb</td>
</tr>
<tr>
<td>131</td>
<td>183</td>
<td>asm/text.rb</td>
</tr>
<tr>
<td>2097</td>
<td>3269</td>
<td><b>total</b></td>
</tr>
</table>
Happy hacking! What next?
==========
Whatever interests me really, I don't know yet.. Right now I need to
clean up some of the code, now that object files of any size can be
generated and tests pass again.
Happy hacking!<br>
-sjs -sjs