diff --git a/README.md b/README.md index 549edf8..b5247db 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ sjs
[sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)
published : 2009-09-22
-updated : 2009-09-24 +updated : 2010-01-19 Overview @@ -13,91 +13,81 @@ on the tutorial "Let's Build a Compiler" by Jack Crenshaw[1]. [1]: http://compilers.iecc.com/crenshaw/ -The semantics are simple and familiar to all programmers. Eager +The semantics are simple and familiar to most programmers. Eager evaluation, assignment with the equals sign, arithmetic using + - * -and /, loops, if/else statement, etc. Integers are the only data type. +and /, loops, if/else statement, etc. Integers are the only data +type. +While the parser still closely resembles Crenshaw's recursive descent +parser, back-end generates x85 machine code using a homegrown +assembler in ~1000 lines of Ruby (just 650 lines of real code). NOTE: OS X is the only platform that compiles working binaries right -now. ELF support for Linux coming soon. +now. ELF support for Linux coming ... eventually. -Compiling -======== +Pre-requisites +============== OS X ---- -You need gcc, so install Xcode or use MacPorts to build gcc. - +You need Ruby and gcc. Ruby is standard on Macs but you'll need to +install Xcode for gcc. You can also compile it yourself or use +MacPorts, or [homebrew](http://github.com/mxcl/homebrew). Linux ----- -You need ruby and ld which lives in the binutils package. +You need Ruby and ld - which lives in the binutils package. % sudo aptitude install ruby binutils -That's it! The assembler is included in ~900 lines of Ruby (including -comments). +That's it! -You should be fine letting the build script detect your platform. If -not append 'elf' or 'macho' to the command. +Compiling +========= + +The build script should detect your platform. If not append 'elf' or +'macho' to the command. % ./build.rb filename.code [elf | macho] -The resulting native executable will be called 'filename' and you -should be able to run it directly. - - % ./filename +The resulting native executable is called 'filename' and you should be +able it run it directly. + % ./filename Syntax in 2 minutes =================== -The recursive descent parser starts by parsing a block of code. A -block consists of zero or more statements. Whitespace is largely -ignored beyond delimiting tokens so statements can be grouped on one -line or spread out over multiple lines. With no explicit terminator -this can look strange so we will see how it works out when the syntax -evolves into something more complicated. +The parser starts by parsing a block of code. A block consists of one +or more statements. Whitespace is largely ignored beyond delimiting +tokens, so statements can be grouped on one line or spread out over +multiple lines. With no explicit terminator this can look strange so +we will see how it works out when the syntax evolves into something +more complicated. -There are no functions or function calls, no closures, arrays, hashes, -or anything else you can think of. +There are variables and integers. That's honestly about it. There +are no functions or function calls, no closures, arrays, hashes, or +anything else. Supported statements are: - * assignment - e.g. foo = 4096 - - * if/else - e.g. if x < 0 a=0 else a=1 end - - * while - e.g. while x > 0 x=x-1 end - - * until - e.g. until x == 0 x=x-1 end - - * break - e.g. break - - * repeat - e.g. repeat x=x-1 if x == 0 break end end - - * for - e.g. for i=1 to 5 x=x+1 end - - * do - e.g. do 5 x=x+1 end - - * print - e.g. a=1 print + * assignment
e.g. foo = 4096 + * if/else
e.g. if x < 0 a=0 else a=1 end + * while
e.g. while x > 0 x=x-1 end + * until
e.g. until x == 0 x=x-1 end + * break
e.g. break + * repeat
e.g. repeat x=x-1 if x == 0 break end end + * for
e.g. for i=1 to 5 x=x+1 end + * do
e.g. do 5 x=x+1 end + * print
e.g. a=1 print Print is strange, it prints the last value calculated in hex and that -is all. Please don't look at the implementation. ;-) +is all. Supported operations are the following, in increasing order of precedence: @@ -114,140 +104,39 @@ precedence: Parentheses are used to force a specific order of evaluation. -As far as booleans go 0 is false and everything else is true. Right +As far as booleans go, 0 is false and everything else is true. Right now there are only integers so this makes sense. Internals ========= -I wasn't satisfied using an external assembler and outputing assembly -text so I wrote an x86 assembler in Ruby. It assembles just the -instructions I need for this compiler, so it is by no means complete. -32-bit only and no prefixes are supported. It's basically just a -handful of instructions and mod-rm encoding. I use the system's -linker and have no intention of writing my own, don't worry! +It wasn't much fun generating assembly text, so I wrote an x86 +assembler library in Ruby. It implements just the instructions needed +for this compiler and is by no means complete. It only does 32-bit +and no prefixes are supported. It's basically just a handful of +instructions and mod-rm encoding. I use the system's linker and have +no intention of writing my own, don't worry! -The code currently consists of a recursive descent parser that outputs -x86 code in ELF binaries on Linux and Mach-O binaries on Darwin. -Most of the code for outputing executables is Ruby, but ELF support is -still in C and not published in the repository. Classes to output -Mach-O and ELF binaries are found in asm/(elf|macho)writer.rb, but ELF -support is not implemented yet so binaries only compile and run on OS -X right now. ELF should come soon as I now have lights in my den. :) +ELF support is still in C and not published in the repository. The +class to output Mach-O binaries is found in asm/machofile.rb. -Some major refactoring is needed as the project grew organically and -in order to keep up with the tutorials I have not yet made radical -changes. The asm/ directory holds the assembler but also the MachO -and ELF code, for now. The assembler is a from-scratch implementation -in Ruby. This is my first assembler and first time working with the -x86 ISA, so it probably isn't great. It outputs horribly inefficient -code and there are no optimizations. +The asm/ directory holds the assembler but also the Mach-O code, for +now. This is my first assembler and first time working with the x86 +ISA, so it probably isn't great. It outputs horribly inefficient code +and there are no optimizations. -Hopefully I can reduce the number of lines by factoring more, but it's -pretty slim at ~3000 LOC. About 2100 of those are actual code. I did -not write this compiler with the intention of anyone else reading it -but there are a reasonable amount of comments. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Real LinesTotal LinesFilename
87112build.rb
617891compiler.rb
1229asm/asm.rb
569843asm/binary.rb
197319asm/cstruct.rb
46asm/elfsymtab.rb
48asm/elfwriter.rb
170374asm/machofile.rb
95163asm/macho.rb
1928asm/machosym.rb
4877asm/machosymtab.rb
1925asm/machowriter.rb
1625asm/objwriter.rb
2031asm/registers.rb
4266asm/regproxy.rb
5689asm/symtab.rb
131183asm/text.rb
20973269total
+I did not write this compiler with the intention of anyone else +reading it but there are a reasonable amount of comments. -Happy hacking! +What next? +========== + +Whatever interests me really, I don't know yet.. Right now I need to +clean up some of the code, now that object files of any size can be +generated and tests pass again. + + +Happy hacking!
-sjs \ No newline at end of file