This GC wraps Eval() to create a copy of the result, and then
memcpy's it backwards to the position where the g_mem pointer
resided at the beginning of Eval() thereby discarding all the
cons cells that got created which weren't part of the result.
Overlap (or negative GC) isn't possible because we don't copy
cells beneath the low water mark.
As it turns out 44 bytes is all you need to implement garbage
collection when your language guarantees that data structures
can't have cycles, due to the lack of mutability.
The flag bit is now removed from atoms in favor of the sign bit.
That let us remove shifts. It also means NIL can be zero, which
freed up the %cx register. Using %cx to call Eval saved 2 bytes.
Saved six bytes removing bss memset as it's not needed anymore.
%bx wasn't needed in PutChar since we're assuming the machine boots
in a non-graphical mode, i.e. BIOS teletype. In MDA mode BX will be
2 which is dark green on black. The same probably applies with CGA.
Now that BX is always two, we can now use multi-register addressing
when accessing CDR cells. That's a huge savings, since we no longer
need all those displacement bytes; we only need the modrm.
%cx wasn't needed it could be replaced with %si in GetToken thereby
making it possible to remove our use of the %fs register.
Changing the second argument Cons() from %si not %ax helped to save
even more space, since all the things that called it needed to move
the register from %ax to %si.
This changes fixes an issue where clearing the bss could overlap
the stack memory used in the early initialization code.
This change fixes a regression caused by an earlier commit with
nul terminators. Builtins are now automatically numbered.
Comments have been added and labels have been redefined to add
further clarity to how the assembly works. The most beautiful of
which is the code that merges Cadr, Cdr, and Car into one func.