diff --git a/README.md b/README.md
index 549edf8..b5247db 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@ sjs
[sami.samhuri@gmail.com](mailto:sami.samhuri@gmail.com)
published : 2009-09-22
-updated : 2009-09-24
+updated : 2010-01-19
Overview
@@ -13,91 +13,81 @@ on the tutorial "Let's Build a Compiler" by Jack Crenshaw[1].
[1]: http://compilers.iecc.com/crenshaw/
-The semantics are simple and familiar to all programmers. Eager
+The semantics are simple and familiar to most programmers. Eager
evaluation, assignment with the equals sign, arithmetic using + - *
-and /, loops, if/else statement, etc. Integers are the only data type.
+and /, loops, if/else statement, etc. Integers are the only data
+type.
+While the parser still closely resembles Crenshaw's recursive descent
+parser, back-end generates x85 machine code using a homegrown
+assembler in ~1000 lines of Ruby (just 650 lines of real code).
NOTE: OS X is the only platform that compiles working binaries right
-now. ELF support for Linux coming soon.
+now. ELF support for Linux coming ... eventually.
-Compiling
-========
+Pre-requisites
+==============
OS X
----
-You need gcc, so install Xcode or use MacPorts to build gcc.
-
+You need Ruby and gcc. Ruby is standard on Macs but you'll need to
+install Xcode for gcc. You can also compile it yourself or use
+MacPorts, or [homebrew](http://github.com/mxcl/homebrew).
Linux
-----
-You need ruby and ld which lives in the binutils package.
+You need Ruby and ld - which lives in the binutils package.
% sudo aptitude install ruby binutils
-That's it! The assembler is included in ~900 lines of Ruby (including
-comments).
+That's it!
-You should be fine letting the build script detect your platform. If
-not append 'elf' or 'macho' to the command.
+Compiling
+=========
+
+The build script should detect your platform. If not append 'elf' or
+'macho' to the command.
% ./build.rb filename.code [elf | macho]
-The resulting native executable will be called 'filename' and you
-should be able to run it directly.
-
- % ./filename
+The resulting native executable is called 'filename' and you should be
+able it run it directly.
+ % ./filename
Syntax in 2 minutes
===================
-The recursive descent parser starts by parsing a block of code. A
-block consists of zero or more statements. Whitespace is largely
-ignored beyond delimiting tokens so statements can be grouped on one
-line or spread out over multiple lines. With no explicit terminator
-this can look strange so we will see how it works out when the syntax
-evolves into something more complicated.
+The parser starts by parsing a block of code. A block consists of one
+or more statements. Whitespace is largely ignored beyond delimiting
+tokens, so statements can be grouped on one line or spread out over
+multiple lines. With no explicit terminator this can look strange so
+we will see how it works out when the syntax evolves into something
+more complicated.
-There are no functions or function calls, no closures, arrays, hashes,
-or anything else you can think of.
+There are variables and integers. That's honestly about it. There
+are no functions or function calls, no closures, arrays, hashes, or
+anything else.
Supported statements are:
- * assignment
- e.g. foo = 4096
-
- * if/else
- e.g. if x < 0 a=0 else a=1 end
-
- * while
- e.g. while x > 0 x=x-1 end
-
- * until
- e.g. until x == 0 x=x-1 end
-
- * break
- e.g. break
-
- * repeat
- e.g. repeat x=x-1 if x == 0 break end end
-
- * for
- e.g. for i=1 to 5 x=x+1 end
-
- * do
- e.g. do 5 x=x+1 end
-
- * print
- e.g. a=1 print
+ * assignment
e.g. foo = 4096
+ * if/else
e.g. if x < 0 a=0 else a=1 end
+ * while
e.g. while x > 0 x=x-1 end
+ * until
e.g. until x == 0 x=x-1 end
+ * break
e.g. break
+ * repeat
e.g. repeat x=x-1 if x == 0 break end end
+ * for
e.g. for i=1 to 5 x=x+1 end
+ * do
e.g. do 5 x=x+1 end
+ * print
e.g. a=1 print
Print is strange, it prints the last value calculated in hex and that
-is all. Please don't look at the implementation. ;-)
+is all.
Supported operations are the following, in increasing order of
precedence:
@@ -114,140 +104,39 @@ precedence:
Parentheses are used to force a specific order of evaluation.
-As far as booleans go 0 is false and everything else is true. Right
+As far as booleans go, 0 is false and everything else is true. Right
now there are only integers so this makes sense.
Internals
=========
-I wasn't satisfied using an external assembler and outputing assembly
-text so I wrote an x86 assembler in Ruby. It assembles just the
-instructions I need for this compiler, so it is by no means complete.
-32-bit only and no prefixes are supported. It's basically just a
-handful of instructions and mod-rm encoding. I use the system's
-linker and have no intention of writing my own, don't worry!
+It wasn't much fun generating assembly text, so I wrote an x86
+assembler library in Ruby. It implements just the instructions needed
+for this compiler and is by no means complete. It only does 32-bit
+and no prefixes are supported. It's basically just a handful of
+instructions and mod-rm encoding. I use the system's linker and have
+no intention of writing my own, don't worry!
-The code currently consists of a recursive descent parser that outputs
-x86 code in ELF binaries on Linux and Mach-O binaries on Darwin.
-Most of the code for outputing executables is Ruby, but ELF support is
-still in C and not published in the repository. Classes to output
-Mach-O and ELF binaries are found in asm/(elf|macho)writer.rb, but ELF
-support is not implemented yet so binaries only compile and run on OS
-X right now. ELF should come soon as I now have lights in my den. :)
+ELF support is still in C and not published in the repository. The
+class to output Mach-O binaries is found in asm/machofile.rb.
-Some major refactoring is needed as the project grew organically and
-in order to keep up with the tutorials I have not yet made radical
-changes. The asm/ directory holds the assembler but also the MachO
-and ELF code, for now. The assembler is a from-scratch implementation
-in Ruby. This is my first assembler and first time working with the
-x86 ISA, so it probably isn't great. It outputs horribly inefficient
-code and there are no optimizations.
+The asm/ directory holds the assembler but also the Mach-O code, for
+now. This is my first assembler and first time working with the x86
+ISA, so it probably isn't great. It outputs horribly inefficient code
+and there are no optimizations.
-Hopefully I can reduce the number of lines by factoring more, but it's
-pretty slim at ~3000 LOC. About 2100 of those are actual code. I did
-not write this compiler with the intention of anyone else reading it
-but there are a reasonable amount of comments.
-
-
-
- | Real Lines |
- Total Lines |
- Filename |
-
-
-
- | 87 |
- 112 |
- build.rb |
-
-
- | 617 |
- 891 |
- compiler.rb |
-
-
- | 12 |
- 29 |
- asm/asm.rb |
-
-
- | 569 |
- 843 |
- asm/binary.rb |
-
-
- | 197 |
- 319 |
- asm/cstruct.rb |
-
-
- | 4 |
- 6 |
- asm/elfsymtab.rb |
-
-
- | 4 |
- 8 |
- asm/elfwriter.rb |
-
-
- | 170 |
- 374 |
- asm/machofile.rb |
-
-
- | 95 |
- 163 |
- asm/macho.rb |
-
-
- | 19 |
- 28 |
- asm/machosym.rb |
-
-
- | 48 |
- 77 |
- asm/machosymtab.rb |
-
-
- | 19 |
- 25 |
- asm/machowriter.rb |
-
-
- | 16 |
- 25 |
- asm/objwriter.rb |
-
-
- | 20 |
- 31 |
- asm/registers.rb |
-
-
- | 42 |
- 66 |
- asm/regproxy.rb |
-
-
- | 56 |
- 89 |
- asm/symtab.rb |
-
-
- | 131 |
- 183 |
- asm/text.rb |
-
-
- | 2097 |
- 3269 |
- total |
-
-
+I did not write this compiler with the intention of anyone else
+reading it but there are a reasonable amount of comments.
-Happy hacking!
+What next?
+==========
+
+Whatever interests me really, I don't know yet.. Right now I need to
+clean up some of the code, now that object files of any size can be
+generated and tests pass again.
+
+
+Happy hacking!
-sjs
\ No newline at end of file