![]() |
DWARF Debugging Standard Wiki |
---|
Cary Coutant Updated July 23, 2009
In large applications, and particularly in large C++ applications, it is common for several similar functions to produce identical object code, even though the source code for each function is slightly different. These identical functions could, in principle, be merged into a single copy, with each symbol set to the address of that one copy. For one particular large C++ application studied, the potential reduction in size from this optimization has been estimated at 6-7%.
One problem with an optimization such as this is that a PC value can no longer be unambiguously mapped to a particular function. This can be expected to cause mysterious behavior when profiling and debugging an application that has been built with this optimization.
The proposal presented here adds a new table to the DWARF debugging information to allow the debugger and other tools to disambiguate such PC values by examining the call chain.
Microsoft’s Visual Studio
product offers a
compiler option, /Gy
, that directs the compiler to place the object
code for each function in a separate COMDAT section. This can be used
with a linker option, /OPT:ICF
(“Identical COMDAT Folding”), that
directs the linker to detect duplicate instances of identical COMDAT
sections and remove the redundant copies. All the original function
symbols are then set to the address of the one remaining copy.
Microsoft advises its users not to use this option when profiling or debugging an application, as there is no support for disambiguating a PC value that may belong to any of several different functions that were folded into one copy.
When the PC is inside a merged function, we will attempt to disambiguate the function by examining the function’s return pointer (i.e., the point of call that invoked the merged function). There are four cases:
this
pointer in the
callee, the debug information entry for the corresponding class can
be identified, and the function being called can be determined by
matching the slot index to the virtual function members of that
class. If the callee’s this
pointer cannot be determined (e.g.,
due to optimization of the routine), the vtable slot index might
still be used to eliminate some candidate functions.This requires the construction of two additional debug information tables in the final executable: a direct-call table and a virtual-call table. Each of these will be generated by the compiler and placed in a new debug section in the relocatable object files. The linker will combine these sections as it normally does to produce a single combined section for each table.
The number of functions that can be merged is expected to be a small fraction of the total number of functions in any typical application. For functions that will not be merged together, entries in the direct call table are not needed, and would represent a significant waste of space. At compile time, unfortunately, all functions are candidates for merging and the duplicate functions have not yet been identified, so the compiler must generate call table entries for all direct and virtual calls.
The actual merging of identical functions takes place at link time. In order to reduce the total size of the direct call table, therefore, the linker can discard any call table entry that refers to a function that has not been merged with another.
Typical DWARF sections consist of a header per compilation unit followed by data about that compilation unit. The header provides, at the minimum, an indication of the size of the data for the compilation unit so that, once many compilation units have been combined by the linker, a consumer can identify the separate compilation units. When the data may contain address-sized objects (as the call tables will), the header typically also provides the size of an address for the target machine. When the data may contain offsets to other DWARF information (as the call tables will), the header also provides an indication of whether these offsets are expressed as 32-bit or 64-bit offsets (see Section 7.4 of the DWARF spec). DWARF distinguishes between addresses and offsets because a 64-bit program, which would use 64-bit addresses, is rarely larger than the 4 GB limit that would require it to also use 64-bit offsets for debug information.
For each compilation unit, the compiler will place the direct call table
in the section .debug_dcall
, and the virtual call table in the section
.debug_vcall
. Each of these sections will begin with a compilation
unit header with the following format:
unit_length
: A 4-byte or 12-byte unsigned integer representing the
length of the contribution for this compilation unit, not including
the length field itself. In the 32-bit DWARF format, this is a
4-byte value (which must be less than 0xfffffff0); in the 64-bit
DWARF format, this consists of the 4-byte value 0xffffffff followed
by an 8-byte unsigned integer that gives the actual length.version
: A 1-byte unsigned integer indicating the version number
of the format of the call table information. The version number for
the format described here is 4.debug_info
(Direct call table only): A 4-byte or 8-byte unsigned
integer representing the base of the portion of the .debug_info
section for this compilation unit. In the 32-bit DWARF format, this
value is 4 bytes; in the 64-bit DWARF format, this value is 8 bytes.address_size
: A 1-byte unsigned integer representing the size in
bytes of an address on the target architecture.The call table entries immediately follow.
Entries in the direct call table have the following format:
call_site
: An address-sized value containing a pointer to the call
site. Because this table will be consulted during a stack unwind
using a return pointer, the value recorded here as the call site
will actually be the address following the call instruction, to
which the callee would return.callee_die
: An unsigned LEB128 value representing the location of
a debug information entry in the .debug_info
section, relative to
the base of the .debug_info
section. This entry represents a
declaration (not necessarily a defining declaration) of the called
subroutine, and should be either a DW_TAG_subprogram
or
DW_TAG_entry_point
debug entry.Entries in the virtual call table have the following format:
call_site
: An address-sized value containing a pointer to the call
site. Because this table will be consulted during a stack unwind
using a return pointer, the value recorded here as the call site
will actually be the address following the call instruction, to
which the callee would return.vtable_slot
: An unsigned LEB128 value containing the index of the
vtable slot used for the virtual function call.The linker will combine all .debug_dcall
contributions into a single
section in the output file, and will do likewise with .debug_vcall
contributions.
In order to reduce the final size of the direct call table, it may
optionally process the contents of each contribution to the
.debug_dcall
section, removing all entries whose callees are known not
to have been merged with another. This step is optional, because the
analysis and understanding of a specific section is not a typical linker
function, and may tend to hurt the linker’s performance.
The linker may also optionally scan for functions whose address has been taken, during its normal scan of relocations, and mark those functions as ineligible for merging with other identical functions.
A debugger, profiler, or other application that needs to disambiguate a PC value within a merged function will need to perform a (partial) stack trace to obtain the return pointer of the function containing the current PC. It can then scan the direct and virtual call tables to find an entry that corresponds to the return pointer.
callee_die
field to locate the debug information entry for the
function that was actually called.this
pointer to identify the virtual table for the object whose
member function was called. Since the DWARF information does not
record any information directly about virtual tables, the
application will need to find the symbol corresponding to the vtable
address, demangle the symbol name to identify the actual type to
which the vtable corresponds, then find the debug information for
that type. It can then match the location expression from the
vtable_elem_loc
field against the DW_AT_vtable_elem_location
attributes of the type’s member function entries to identify the
function that was actually called.In some cases, the point of call may itself be in a merged function. For
example, consider a function A
that calls function B
, and a function
C
that calls function D
. Suppose functions B
and D
are identical
and are thus merged, and that as a result, functions A
and C
are
also identical and are merged. In this case, a PC that could be in
either B
or D
will have a return pointer that could be in either A
or C
. We would need to recursively determine whether we are in A
or
C
before we can decide whether we are in B
or D
.
Finding the vtable from the virtual call table as described above
requires the demangling of C++ symbols, introducing some external
dependencies on both the C++ language and its specific implementation
for the target platform. To remove these dependencies, we could
implement an additional DWARF extension to record virtual tables in the
DWARF information. Each virtual table generated by the compiler would be
represented by a debugging information entry with the tag
DW_TAG_vtable
(or DW_TAG_GNU_vtable
if implemented as a GNU
extension). The vtable entry would have a DW_AT_location
attribute
giving its location, and a DW_AT_type
attribute referring to the class
that it belongs to.
dwarfstd.org is supported by Sourceware. Contributions are welcome.
All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © 2007-2022 by DWARF Standards Committee. |