![]() |
DWARF Debugging Standard Wiki |
---|
Cary Coutant Modified: January 28, 2009
PC sampling is often used to profile the execution paths of an application, and the sampled data needs to be fed back to the compiler for profile-based opimization. In order to map the individual PC samples to the original source code, the optimizer needs to use the DWARF line number table. This approach works in general, but it is currently unable to distinguish among multiple paths of execution on a single source line.
A single line of source code can have multiple paths of execution for several reasons. The simplest examples involve a conditional or short-circuit logical operator, or an if-then-else or loop statement written on a single line. Consider the following example:
x = (i == 1 ? one() : two());
An x86 compiler might generate the following code:
.loc 1 10 0
cmpl $1, 8(%ebp)
je .L7
call two
.L4:
movl %eax, -8(%ebp)
...
.L7:
.loc 1 10 0
call one
.p2align 4,,5
jmp .L4`
Here, both paths of execution are marked with the same location (file 1, line 10).
Multiple paths of execution can also result from control flow generated by the compiler to implement a high-level source construct like a switch statement. Consider a switch statement with a dense set of case labels:
switch (i)
{
case 1: s = "1"; break;
case 2: s = "2"; break;
case 3: s = "3"; break;
case 4: s = "4"; break;
...
case 198: s = "198"; break;
case 199: s = "199"; break;
case 200: s = "200"; break;
default: s = "?"; break;
}
The compiler will typically use a jump table for this switch, and must generate bounds checking code prior to indexing the table:
.loc 1 4 0
cmpl $200, %eax
jbe .L207
.L2:
movl $.LC1, %eax // "2"
.loc 1 209 0
...
.L207:
.loc 1 4 0
jmp *.L203(,%eax,4)
...
Here, the fall-through path (representing the default case) and the path that accesses the jump table are both marked with the same location (file 1, line 4). (The default case really ought to be marked with a different source line number, but for some reason isn’t in the version of gcc I used for this sample.)
Other code generation strategies for switch statements may generate combinations of tests and loops that will have multiple execution paths that will need to be distinguished.
Additional cases to consider are structure copies that may cause the compiler to generate an inline loop, or explicit calls to memcpy that get inlined. These cases are a problem only where there is additional code outside the loop on the same line of source code; for example:
memcpy(&a[i], &b[j], n);
Here, if the call to memcpy is inlined, the code to compute the first two parameters is on a separate execution path from the copy loop, and needs to be distinguished in order for the optimizer to make the best use of the profiling information.
In DWARF, the line number table is a compressed representation of a
sparse table, with one row for each machine instruction, and several
columns. The first column contains the address of the instruction. Three
more columns identify the source file, line number, and column number
for the source code responsible for that instruction. A fifth column
contains a flag, is_stmt
, that is true for recommended breakpoint
locations. In the gcc toolchain, the .loc
assembly pseudo-op indicates
each position in the assembly code where any of these values change. The
remaining columns in the table are irrelevant to this proposal.
In order to distinguish these additional paths of execution, we propose to add an additional column to the DWARF line number table, which will hold a “discriminator”, a simple integer that allows a consumer to map a PC value to one of the several different basic blocks that may be associated with a given line of source code.
The optimizer, when processing the profiling information, will be able to use this extra column of information to determine which basic block a given PC sample corresponds to. All other tools will ignore the extra column; because of the extensible design of the DWARF format, no changes are necessary to any other tool.
The Gnu assembler needs to be enhanced to accept an additional parameter
to the .loc
pseudo-op, which will contain the discriminator. The
additional parameter needs to be specified only when the discriminator
is non-zero.
The assembler will encode the discriminator column into the DWARF line
number table using the new DWARF line number program opcodes described
below. For the rows in the table with a non-zero discriminator, the
is_stmt
flag should be false, since these rows do not describe a
breakpoint location.
The readelf utility and the bfd library (for objdump) will need minor enhancements to dump the additional DWARF line number information.
The compiler needs to be enhanced to provide the additional
discriminator parameter in .loc
pseudo-ops at each point where a new
basic block begins following a conditional transfer of control for the
same source line. The first block should have a discriminator of zero,
and each conditional transfer of control within the source line should
increment the discriminator by 1.
A new compiler option, -femit_discriminators
, should be provided to
enable this new functionality. Ideally, this option would eventually be
turned on by default.
A new register, discriminator
, will be added to the line number
information state machine (described in Section 6.2.2 of the DWARF
specification). This register will contain an unsigned integer
indicating the discriminator value to be inserted into the line number
table when the next row is added. This register will have the value 0 at
the beginning of each sequence within a line number program, and will be
reset to 0 whenever a new row is added to the line number table.
A new extended opcode, DW_LNE_GNU_set_discriminator
, will be added to
set the discriminator
register to a specific value. This opcode takes
a single parameter, an unsigned LEB128 integer, which is the new value
of the discriminator
register.
The new extended opcode, DW_LNE_GNU_set_discriminator
, will initially
have the value 0x80, in the vendor-specific range. If the proposal is
accepted by the DWARF workgroup, it will be assigned a value in the
lower range (probably 4).
The DWARF change is minimal, and observes the rules for extensions to the DWARF format. No existing consumer should be affected by this change (although some incorrect implementations may nevertheless exist).
The assembler change is also minor, involving only a change to the
.loc
pseudo-op and the extra column in the line number table.
The compiler change will involve detecting conditional transfers of control and placing a new discriminator on subsequent instructions whenever the source file and line number remain the same. At this point, I’m unsure of the actual scope of this change.
As long as the new behavior is controlled by a command-line option, compilation without that option should remain completely unaffected, providing a trivial fallback position.
With the option turned on, existing DWARF consumers ought to completely ignore the extra line number program opcodes and should notice no difference at all (other than a slight increase in the size of the line number tables). There is a risk that some (incorrectly-written) consumers will balk at the new opcode, but the fix in each case should be minor.
The size of the DWARF line number tables will increase slightly, due to the new opcodes in the line number program. An extended opcode takes up two bytes for the opcode (0 followed by 0x80), plus an LEB128 number identifying the length of what follows, plus another LEB128 number providing the discriminator. Each of these LEB128 numbers will be only a single byte under all but extreme cases, so the typical new opcode will take 4 bytes. The percentage of source lines needing discriminators, however, is expected to be very small, so the impact of these extra opcodes ought to be well under 1% in terms of space.
Incorrect information in the discriminator column may take the form of either a missing discriminator or an unnecessary discriminator. A missed discriminator will result in missed optimization opportunities, but should never be worse than the status quo. An extra discriminator should not actually be a problem (other than increased size of the line number table), since the compiler will be using the same algorithm to assign discriminators when it generates the code both before and after profiling.
dwarfstd.org is supported by Sourceware. Contributions are welcome.
All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © 2007-2022 by DWARF Standards Committee. |