mirror of
https://github.com/samsonjs/blog.git
synced 2026-03-25 09:05:51 +00:00
new post
This commit is contained in:
parent
8fd63c0f44
commit
ebf5f86f9b
1 changed files with 158 additions and 0 deletions
158
basics-of-the-mach-o-file-format.html
Normal file
158
basics-of-the-mach-o-file-format.html
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
<p><i>This post is part of a series on generating basic x86 Mach-O files
|
||||
with Ruby. The
|
||||
<a href="http://samisamhuri.blogspot.com/2010/01/generating-mach-o-binaries-working-with.html">
|
||||
first post</a> introduced CStruct, a Ruby class used to serialize
|
||||
simple struct-like objects.</i></p>
|
||||
|
||||
<p>Please note that the best way to learn about Mach-O properly is to
|
||||
read Apple's
|
||||
<a href="http://developer.apple.com/Mac/library/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html#//apple_ref/doc/uid/TP40000895-CH248-SW3">
|
||||
documentation on Mach-O</a>, which is pretty good combined with the
|
||||
comments in /usr/include/mach-o/*.h. These posts will only cover
|
||||
the basics necessary to generate a simple object file for linking with
|
||||
ld or gcc, and are not meant to be comprehensive.</p>
|
||||
|
||||
<h2>Mach-O File Format Overview</h2>
|
||||
|
||||
<p>A Mach-O file consists of 2 main pieces: the <b>header</b> and
|
||||
the <b>data</b>. The header is basically a map of the file describing
|
||||
what it contains and the position of everything contained in it. The
|
||||
data comes directly after the header and consists of a number of
|
||||
binary blobs of data, one after the other.</p>
|
||||
|
||||
<p>The header contains 3 types of records: the <b>Mach header</b>,
|
||||
<b>segments</b>, and <b>sections</b>. Each binary blob is described
|
||||
by a named section in the header. Sections are grouped into one or
|
||||
more named segments. The Mach header is just one part of the header
|
||||
and should not be confused with the entire header. It contains
|
||||
information about the file as a whole, and specifies the number of
|
||||
segments as well.</p>
|
||||
|
||||
<p>Take a quick look at <b>Figure 1</b> in
|
||||
<a href="http://developer.apple.com/Mac/library/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html#//apple_ref/doc/uid/TP40000895-CH248-SW3">
|
||||
Apple's Mach-O overview</a>, which illustrates this quite nicely.</p>
|
||||
|
||||
<p>A very basic Mach object file consists of a header followed by single
|
||||
blob of machine code. That blob could be described by a single
|
||||
section named __text, inside a single nameless segment. Here's a
|
||||
diagram showing the layout of such a file:</p>
|
||||
|
||||
<pre>
|
||||
|
||||
,---------------------------,
|
||||
Header | Mach header |
|
||||
| Segment 1 |
|
||||
| Section 1 (__text) | --,
|
||||
|---------------------------| |
|
||||
Data | blob | <-'
|
||||
'---------------------------'
|
||||
</pre>
|
||||
|
||||
|
||||
<h2>The Mach Header</h2>
|
||||
|
||||
<p>The Mach header contains the architecture (cpu type), the type of
|
||||
file (object in our case), and the number of segments. There is more
|
||||
to it but that's about all we care about. To see exactly what's in a
|
||||
Mach header fire up a shell and type <tt>otool -h /bin/zsh</tt> (on a
|
||||
Mac).</p>
|
||||
|
||||
<p>Using
|
||||
<a href="http://samisamhuri.blogspot.com/2010/01/generating-mach-o-binaries-working-with.html">
|
||||
CStruct</a> we define the Mach header like so:</p>
|
||||
|
||||
<script src="http://gist.github.com/280635.js"></script>
|
||||
|
||||
|
||||
<h2>Segments</h2>
|
||||
|
||||
<p>Segments, or <b>segment commands</b>, specify where in memory the
|
||||
segment should be loaded by the OS, and the number of bytes to
|
||||
allocate for that segment. They also specify which bytes inside the
|
||||
file are part of that segment, and how many sections it contains.</p>
|
||||
|
||||
<p>One benefit to generating an object file rather than an executable is
|
||||
that we let the linker worry about some details. One of those details
|
||||
is where in memory segments will ultimately end up.</p>
|
||||
|
||||
<p>Names are optional and can be arbitrary, but the convention is to
|
||||
name segments with uppercase letters preceded by two underscores,
|
||||
e.g. __DATA or __TEXT </p>
|
||||
|
||||
<p>The code exposes some more details about segment commands, but should
|
||||
be easy enough to follow.</p>
|
||||
|
||||
<script src="http://gist.github.com/280642.js"></script>
|
||||
|
||||
|
||||
<h2>Sections</h2>
|
||||
|
||||
<p>All sections within a segment are described one after the other
|
||||
directly after each segment command. Sections define their name,
|
||||
address in memory, size, offset of section data within the file, and
|
||||
segment name. The segment name might seem redundant but in the next
|
||||
post we'll see why this is useful information to have in the section
|
||||
header.</p>
|
||||
|
||||
<p>Sections can optionally specify a map to addresses within their
|
||||
binary blob, called a <b>relocation table</b>. This is used by the
|
||||
linker. Since we're letting the linker work out where to place
|
||||
everything in memory the addresses inside our machine code will need
|
||||
to be updated.</p>
|
||||
|
||||
<p>By convention segments are named with lowercase letters preceded by
|
||||
two underscores, e.g. __bss or __text</p>
|
||||
|
||||
<p>Finally, the Ruby code describing section structs:</p>
|
||||
|
||||
<script src="http://gist.github.com/280643.js"></script>
|
||||
|
||||
|
||||
<h2>macho.rb</h2>
|
||||
|
||||
<p>As much of the Mach-O format as we need is defined in
|
||||
<a href="http://github.com/samsonjs/compiler/blob/master/asm/macho.rb">
|
||||
asm/macho.rb</a>. The Mach header, Segment commands, sections,
|
||||
relocation tables, and symbol table structs are all there, with a few
|
||||
constants as well.</p>
|
||||
|
||||
<p>I'll cover symbol tables and relocation tables in my next post.</p>
|
||||
|
||||
|
||||
<h2>Looking at real Mach-O files</h2>
|
||||
|
||||
<p>To see the segments and sections of an object file, run
|
||||
<tt>otool -l /usr/lib/crt1.o</tt>. <b>-l</b> is for load commands.
|
||||
If you want to see why we stick to generating object files instead of
|
||||
executables run <tt>otool -l /bin/zsh</tt>. They are complicated
|
||||
beasts.</p>
|
||||
|
||||
<p>If you want to see the actual data for a section otool provides a
|
||||
couple of ways to do this. The first is to use
|
||||
<tt>otool -d <segment> <section></tt> for an arbitrary
|
||||
section. To see the contents of a well-known section, such as __text
|
||||
in the __TEXT segment, use <tt>otool -t /usr/bin/true</tt>. You can
|
||||
also disassemble the __text section with
|
||||
<tt>otool -tv /usr/bin/true</tt>.</p>
|
||||
|
||||
<p>You'll get to know otool quite well if you work with Mach-O.</p>
|
||||
|
||||
|
||||
<h2>Take a break!</h2>
|
||||
|
||||
<p>That was probably a lot to digest, and to make real sense of it you
|
||||
might need to read some of the
|
||||
<a href="http://developer.apple.com/Mac/library/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html#//apple_ref/doc/uid/TP40000895-CH248-SW3">
|
||||
official documentation</a>.</p>
|
||||
|
||||
<p>We're close to being able to describe a minimal Mach object file
|
||||
that can be linked, and the resulting binary executed. By the end of
|
||||
the next post we'll be there.</p>
|
||||
|
||||
<p><i>(You can almost do that with what we know now. If you
|
||||
create a Mach file with a Mach header (ncmds=1), a single unnamed
|
||||
segment (nsects=1), and then a section named __text with a segment
|
||||
name of __TEXT, and some x86 machine code a the section data, you
|
||||
would almost have a useful Mach object file.)</i></p>
|
||||
|
||||
<p>Till next time, happy hacking!</p>
|
||||
Loading…
Reference in a new issue