From ebf5f86f9bedf401a74117a8f55031ee3224c420 Mon Sep 17 00:00:00 2001 From: Sami Samhuri Date: Mon, 18 Jan 2010 20:52:30 -0800 Subject: [PATCH] new post --- basics-of-the-mach-o-file-format.html | 158 ++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 basics-of-the-mach-o-file-format.html diff --git a/basics-of-the-mach-o-file-format.html b/basics-of-the-mach-o-file-format.html new file mode 100644 index 0000000..fb9b4e0 --- /dev/null +++ b/basics-of-the-mach-o-file-format.html @@ -0,0 +1,158 @@ +

This post is part of a series on generating basic x86 Mach-O files +with Ruby. The + +first post introduced CStruct, a Ruby class used to serialize +simple struct-like objects.

+ +

Please note that the best way to learn about Mach-O properly is to +read Apple's + +documentation on Mach-O, which is pretty good combined with the +comments in /usr/include/mach-o/*.h. These posts will only cover +the basics necessary to generate a simple object file for linking with +ld or gcc, and are not meant to be comprehensive.

+ +

Mach-O File Format Overview

+ +

A Mach-O file consists of 2 main pieces: the header and +the data. The header is basically a map of the file describing +what it contains and the position of everything contained in it. The +data comes directly after the header and consists of a number of +binary blobs of data, one after the other.

+ +

The header contains 3 types of records: the Mach header, +segments, and sections. Each binary blob is described +by a named section in the header. Sections are grouped into one or +more named segments. The Mach header is just one part of the header +and should not be confused with the entire header. It contains +information about the file as a whole, and specifies the number of +segments as well.

+ +

Take a quick look at Figure 1 in + +Apple's Mach-O overview, which illustrates this quite nicely.

+ +

A very basic Mach object file consists of a header followed by single +blob of machine code. That blob could be described by a single +section named __text, inside a single nameless segment. Here's a +diagram showing the layout of such a file:

+ +
+
+            ,---------------------------,
+  Header    |  Mach header              |
+            |    Segment 1              |
+            |      Section 1 (__text)   | --,
+            |---------------------------|   | 
+  Data      |           blob            | <-'
+            '---------------------------'      
+
+ + +

The Mach Header

+ +

The Mach header contains the architecture (cpu type), the type of +file (object in our case), and the number of segments. There is more +to it but that's about all we care about. To see exactly what's in a +Mach header fire up a shell and type otool -h /bin/zsh (on a +Mac).

+ +

Using + +CStruct we define the Mach header like so:

+ + + + +

Segments

+ +

Segments, or segment commands, specify where in memory the +segment should be loaded by the OS, and the number of bytes to +allocate for that segment. They also specify which bytes inside the +file are part of that segment, and how many sections it contains.

+ +

One benefit to generating an object file rather than an executable is +that we let the linker worry about some details. One of those details +is where in memory segments will ultimately end up.

+ +

Names are optional and can be arbitrary, but the convention is to +name segments with uppercase letters preceded by two underscores, +e.g. __DATA or __TEXT

+ +

The code exposes some more details about segment commands, but should +be easy enough to follow.

+ + + + +

Sections

+ +

All sections within a segment are described one after the other +directly after each segment command. Sections define their name, +address in memory, size, offset of section data within the file, and +segment name. The segment name might seem redundant but in the next +post we'll see why this is useful information to have in the section +header.

+ +

Sections can optionally specify a map to addresses within their +binary blob, called a relocation table. This is used by the +linker. Since we're letting the linker work out where to place +everything in memory the addresses inside our machine code will need +to be updated.

+ +

By convention segments are named with lowercase letters preceded by +two underscores, e.g. __bss or __text

+ +

Finally, the Ruby code describing section structs:

+ + + + +

macho.rb

+ +

As much of the Mach-O format as we need is defined in + +asm/macho.rb. The Mach header, Segment commands, sections, +relocation tables, and symbol table structs are all there, with a few +constants as well.

+ +

I'll cover symbol tables and relocation tables in my next post.

+ + +

Looking at real Mach-O files

+ +

To see the segments and sections of an object file, run +otool -l /usr/lib/crt1.o. -l is for load commands. +If you want to see why we stick to generating object files instead of +executables run otool -l /bin/zsh. They are complicated +beasts.

+ +

If you want to see the actual data for a section otool provides a +couple of ways to do this. The first is to use +otool -d <segment> <section> for an arbitrary +section. To see the contents of a well-known section, such as __text +in the __TEXT segment, use otool -t /usr/bin/true. You can +also disassemble the __text section with +otool -tv /usr/bin/true.

+ +

You'll get to know otool quite well if you work with Mach-O.

+ + +

Take a break!

+ +

That was probably a lot to digest, and to make real sense of it you +might need to read some of the + +official documentation.

+ +

We're close to being able to describe a minimal Mach object file +that can be linked, and the resulting binary executed. By the end of +the next post we'll be there.

+ +

(You can almost do that with what we know now. If you +create a Mach file with a Mach header (ncmds=1), a single unnamed +segment (nsects=1), and then a section named __text with a segment +name of __TEXT, and some x86 machine code a the section data, you +would almost have a useful Mach object file.)

+ +

Till next time, happy hacking!