<table style="text-align: left; width: 600px; height: 107px;" border="0" cellpadding="0" cellspacing="0">
  <tbody>
    <tr>
      <td style="width: 122px;"><a href="/" title="DWARF Wiki">
      <img src="/images/dwarf.png" alt="DWARF Debugging Format" style="border: 0px solid ; width: 112px; height: 107px;"></a></td>
      <th colspan="4" rowspan="1" style="vertical-align: middle; text-align: center;"><big><big>DWARF Debugging Standard Wiki</big></big>
      </th>
    </tr>
  </tbody>
</table>
<p>


<!-- End Of Header -->

<p>(From Jason Molenda -jmolenda at apple.com)</p>

<p>Until 2005 (or was it 2005?) we only used stabs for debug info on our
platform. We use gcc and gdb as our main system&rsquo;s compiler and debugger
&ndash; those were ready to support DWARF already, with small tweaks to work
with the Mach-O file format, but our linker (the assembler based on a
very old GNU as, the linker homegrown) wasn&rsquo;t even close. The linker
work needed to do a standard DWARF process was going to be very hard for
us to schedule so we looked at other possible solutions.</p>

<p>Our final solution was to segregate executable linking and debug info
linking into two separate actions. We have already seen examples of
binaries that have nearly a gigabyte of debug information &ndash; and at the
same time, we&rsquo;re very focused on the compile-link-debug turnaround time
for our development environment. And we very much wanted to take
advantage of the vendor extensibility of DWARF to add more information
in the future. Forcing all of this data through the linker &ndash; when the
linker was the least parallelizable step of the compile-link-debug cycle
&ndash; was destined to be a losing proposition.</p>

<p>The first half of our solution is to leave the DWARF debug information
in the .o files, much like Michael implemented at Sun many years ago.
The addresses in the .o files bear no resemblance to the final
executable of course &ndash; we have the linker synthesize some &ldquo;debug
map&rdquo; entries (really just nlist stabs records - the toolchain knows how
to handle stabs very well already so there wasn&rsquo;t any compelling reason
to come up with a new format). For those familiar with stabs, there&rsquo;s
no surprises - here&rsquo;s what a source file called &ldquo;a.c&rdquo; with one
function, main(), defined looks like in the final executable binary:</p>

<pre><code>0000000000000000 - 00 0000    SO /tmp/
0000000000000000 - 00 0000    SO a.c
00000000491a513a - 00 0001   OSO /private/tmp/a.o
0000000100000f1a - 01 0000 BNSYM
0000000100000f1a - 01 0000   FUN _main
000000000000001d - 00 0000   FUN
000000000000001d - 01 0000 ENSYM
0000000000000000 - 01 0000    SO
</code></pre>

<p>The &ldquo;OSO&rdquo; stabs (like the SO stab but for a .o file, hence OSO)
provides a pointer to where the object file is located. The SO stabs
tell us what source file this corresponds to so we can read just the a.o
debug info when the user asks to put a breakpoint on file a.c line 10.
The FUN stabs tell us the start address and length of the main()
function.</p>

<p>The second half of our solution is to create a dedicated DWARF linker.
This is critical to being a fully fledged solution &ndash; the program
&ldquo;dsymutil&rdquo; on our platform uses the executable, its debug map entries,
and the .o files listed therein, to create a single DWARF debug info
file with all fo the addresses mapped to their final addresses. This
debug file is separate from the executable binary itself. On Mac OS X we
have the concept of a &ldquo;bundle&rdquo; which is a directory containing
multiple files related to a single entity. For instance, an application
on our platform is in an &ldquo;app bundle&rdquo; &ndash; in the app bundle you&rsquo;ll
find the executable, the localization strings, help text, etc, all in a
single directory that can be moved around the filesystem. We put the
debug info file in one of these bundles, we call it a dSYM bundle.</p>

<p>As I mentioned earlier, the creation of the dSYM bundle was critical to
make this a usable solution. Without a way to collect all of the debug
info into a single binary, you have no way to save a binary &amp; its debug
info in a release without tarring up all the .o files or something
similar. If you want to copy a debuggable copy of a program to another
system you&rsquo;d need to drag the binary and all its .o files along with it
&ndash; and keep them in the same file path or have a way to override that
in the debugger.</p>

<p>Developers aren&rsquo;t used to keeping a binary and its debug info in step
so we added some safeguards to catch mistakes. In the debug map entries,
we include the modification time of the object file in the stab entry.
If someone has rebuilt a .o file but not the executable, we ignore that
debug information - it&rsquo;s a better failure mode than assuming the debug
info is usable and getting subtly wrong debugger behavior.</p>

<p>The second safeguard we added was to stamp every binary we create with a
128-bit unique identifier (the LC_UUID load command). When a dSYM is
created by dsymutil, this uuid from the executable is copied into the
DWARF binary. This give us a reliable way to tell if a given binary and
given DWARF info are correct for one another. As useful as it is to
protect our users from adding the wrong debug info, it has proven to be
a great help for FINDING the debug info for a binary. We include many
different ways to find the debug info for a binary. The simplest is to
look for the dSYM bundle next to the binary. The next is to use a
system-wide service on Mac OS X called &ldquo;Spotlight&rdquo; which indexes all
the files on the computer and can do very quick lookups; we have an
importer for Spotlight which records the uuids of any dSYM bundles on
the system. The user can also define their own lookups: A list of
directories to look in for dSYMs, or an external program to locate a
dSYM, for instance.</p>

<p>In any of these examples the names of the executable and the dSYM are
irrelevant &ndash; the only thing used is the uuid in the executable binary.
The debugger asks for a dSYM for a uuid and one of these methods returns
a path to the matched dSYM.</p>

<p>The result of this is that users are often unburdened of keeping their
executable and dSYMs together, or in The Right Place &ndash; if the dSYM is
sitting somewhere on their computer, we discover it without their
intervention. The ability to call out to a program to find the dSYM
allows us internally to do something really cool: We have the dSYMs for
every binary in our OS, of course, maintained by our OS build
organization. We have a program that puts all the uuids of those dSYMs
in a little SQL database and another program that gdb can call to find.
When you have this program registered in gdb&rsquo;s lookup scheme, and you
launch a random process, you find that you have debug info for every
frame in the stack, regardless of what library it came from. We wrote
all of this intending that other organizations would want to do the same
- it&rsquo;s a pretty simple interface.</p>

<p>Our debugger has two ways of processing DWARF: One for DWARF in .o files
and one for DWARF in a dSYM bundle. For dSYMs, there are only a handful
of changes from normal gdb&rsquo;s behavior - all of the addresses have been
remapped to their final locations by dsymutil so it&rsquo;s just a matter of
reading the DWARF out of a separate file. In the .o file location, gdb
needs to do the address translation on its own. When we start gdb on
such an executable, we read the debug map (the stabs entries) out of the
executable and create an address translation map for each .o file
specifying what address the symbols landed at. We read the pubtypes
sections from the .o files so we have a global view of types, and
that&rsquo;s it. We don&rsquo;t read any of the debug info in the .o files until
individually needed. We found the time to ingest the pubtypes sections
from the .o files to be very fast so we didn&rsquo;t try to come up with a
way for the linker to include type information in the executable.</p>

<p>The translation of addresses in the debugger is generally
straightforward &ndash; some functions in the .o file may not make it to the
final executable, of course. We process the DWARF in the .o file as you
normally would but all addresses go through the translation table so
they&rsquo;re changed to their final values. &ldquo;common&rdquo; symbols (global data
not initialized to a value which all have an address of 0 in the .o
file) need a little bit of extra care; in gdb I use the names of the
symbols to process them. Greg Clayton wrote dsymutil and he uses some
additional relocation information from the binary to do them; both
approaches work fine. I only had easy access to the nlist records in gdb
so the approach I used made more sense there.</p>

<p>In a nutshell, that&rsquo;s everything that comes to mind about our approach.
We implemented the basic scheme a few years back and have switched over
all our tools to using it for a while now &ndash; the most recent Mac OS X
OS release (10.5, &ldquo;Leopard&rdquo;) was all using DWARF with this setup. The
only unfinished bit I can think of is that I haven&rsquo;t gotten CFI for
DWARF in .o files to work yet &ndash; but we don&rsquo;t use CFI to do our
backtraces by default on our platform yet because there are a couple of
shortcomings in gcc&rsquo;s output so it hasn&rsquo;t been a priority by any
means. I&rsquo;ve put a good amount of effort into overhauling gdb&rsquo;s
backtrace scheme for our i386/x86_64 back-ends so we usually OK without
CFI/EH frame info.</p>

<p>There are some wrinkles with kernel extensions (&ldquo;kexts&rdquo;) on our system
that took some effort to handle correctly with our DWARF scheme. I
won&rsquo;t go into it, kexts are very unusual.</p>

<p>One benefit to having our DWARF debug info in a dSYM bundle (a
directory) instead of just a plain file is that we have the option of
including additional data in there as well. You could envision putting a
copy of the source code in the dSYM bundle too, maybe xar'ed and
compressed, and having the debugger expand it as-needed when debugging.</p>

<p>The uuids in the binaries has proven to be a real boon across the
organization. It doesn&rsquo;t provide the features that crypto signing
binaries does (we do that on our platform as well) but it does
disambiguate binaries very nicely. On our platform when we generate a
crash report we include the name of each binary/library, its load
location, and its UUID. We include the overall system version # but if
the user has installed some binaries on their own, or an update did not
complete successfully, we&rsquo;ve got everything we need to detect what
happened. It also aids in automated symbolication of such crash reports;
no one has to figure out what debug info the crash report should be
symbolicated against, it&rsquo;s declared right in the text via the uuids.</p>

<p>Anyway, I think that&rsquo;s it. While I&rsquo;m impressed with the work Cary has
been doing these past few months, we decided that getting the linker out
of the DWARF business altogether was the way to go &ndash; we get
dramatically faster turnaround on incremental builds when the developer
is waiting for the computer, and although the separate debug info file
seems like a drawback at first, we&rsquo;ve been able to use the uuids to
solve lots of long-standing problems and I suspect we&rsquo;ll be using the
dSYM bundle scheme to interesting effect in the future; putting the
DWARF in there was just our first step.</p>

<p>If anything isn&rsquo;t clear just give me a shout and I&rsquo;ll try to be
clearer. I kind of threw this together at the end of a long day and
I&rsquo;ve got an early morning tomorrow. :)</p>


<!-- Start Of Footer -->

<p>
<table style="text-align: left; width: 800px;" cellspacing="0" cellpadding="0" border="0">
  <tbody><tr><td style="width: 800px; text-align: center;"><small>
<em><strong>dwarfstd.org</strong></em> is supported by <a href="https://sourceware.org/">Sourceware</a>. Contributions are welcome.
<br><br>
All logos and trademarks in this site are property of their respective
owner. <br>
The comments are property of their posters, all the rest © 2007-2022
by DWARF Standards Committee.</small></td></tr></tbody>
</table>