DWARF Debugging Standard Wiki

Extending the DWARF Line Number Table to Support Profiling - Design Document

Cary Coutant Modified: January 28, 2009

Objective

PC sampling is often used to profile the execution paths of an application, and the sampled data needs to be fed back to the compiler for profile-based opimization. In order to map the individual PC samples to the original source code, the optimizer needs to use the DWARF line number table. This approach works in general, but it is currently unable to distinguish among multiple paths of execution on a single source line.

A single line of source code can have multiple paths of execution for several reasons. The simplest examples involve a conditional or short-circuit logical operator, or an if-then-else or loop statement written on a single line. Consider the following example:

       x = (i == 1 ? one() : two());

An x86 compiler might generate the following code:

               .loc 1 10 0
               cmpl    $1, 8(%ebp)
               je      .L7
               call    two
       .L4:
               movl    %eax, -8(%ebp)
               ...

       .L7:
               .loc 1 10 0
               call    one
               .p2align 4,,5
               jmp     .L4`

Here, both paths of execution are marked with the same location (file 1, line 10).

Multiple paths of execution can also result from control flow generated by the compiler to implement a high-level source construct like a switch statement. Consider a switch statement with a dense set of case labels:

       switch (i)
         {
           case 1: s = "1"; break;
           case 2: s = "2"; break;
           case 3: s = "3"; break;
           case 4: s = "4"; break;
           ...
           case 198: s = "198"; break;
           case 199: s = "199"; break;
           case 200: s = "200"; break;
           default:  s = "?"; break;
         }

The compiler will typically use a jump table for this switch, and must generate bounds checking code prior to indexing the table:

       .loc 1 4 0
               cmpl    $200, %eax
               jbe     .L207
       .L2:
               movl    $.LC1, %eax     // "2"
               .loc 1 209 0
               ...

       .L207:
               .loc 1 4 0
               jmp     *.L203(,%eax,4)
               ...

Here, the fall-through path (representing the default case) and the path that accesses the jump table are both marked with the same location (file 1, line 4). (The default case really ought to be marked with a different source line number, but for some reason isn’t in the version of gcc I used for this sample.)

Other code generation strategies for switch statements may generate combinations of tests and loops that will have multiple execution paths that will need to be distinguished.

Additional cases to consider are structure copies that may cause the compiler to generate an inline loop, or explicit calls to memcpy that get inlined. These cases are a problem only where there is additional code outside the loop on the same line of source code; for example:

       memcpy(&a[i], &b[j], n);

Here, if the call to memcpy is inlined, the code to compute the first two parameters is on a separate execution path from the copy loop, and needs to be distinguished in order for the optimizer to make the best use of the profiling information.

In DWARF, the line number table is a compressed representation of a sparse table, with one row for each machine instruction, and several columns. The first column contains the address of the instruction. Three more columns identify the source file, line number, and column number for the source code responsible for that instruction. A fifth column contains a flag, is_stmt, that is true for recommended breakpoint locations. In the gcc toolchain, the .loc assembly pseudo-op indicates each position in the assembly code where any of these values change. The remaining columns in the table are irrelevant to this proposal.

In order to distinguish these additional paths of execution, we propose to add an additional column to the DWARF line number table, which will hold a “discriminator”, a simple integer that allows a consumer to map a PC value to one of the several different basic blocks that may be associated with a given line of source code.

The optimizer, when processing the profiling information, will be able to use this extra column of information to determine which basic block a given PC sample corresponds to. All other tools will ignore the extra column; because of the extensible design of the DWARF format, no changes are necessary to any other tool.

Design Highlights

Changes to the Assembler and binutils

The Gnu assembler needs to be enhanced to accept an additional parameter to the .loc pseudo-op, which will contain the discriminator. The additional parameter needs to be specified only when the discriminator is non-zero.

The assembler will encode the discriminator column into the DWARF line number table using the new DWARF line number program opcodes described below. For the rows in the table with a non-zero discriminator, the is_stmt flag should be false, since these rows do not describe a breakpoint location.

The readelf utility and the bfd library (for objdump) will need minor enhancements to dump the additional DWARF line number information.

Changes to the Compiler

The compiler needs to be enhanced to provide the additional discriminator parameter in .loc pseudo-ops at each point where a new basic block begins following a conditional transfer of control for the same source line. The first block should have a discriminator of zero, and each conditional transfer of control within the source line should increment the discriminator by 1.

A new compiler option, -femit_discriminators, should be provided to enable this new functionality. Ideally, this option would eventually be turned on by default.

Changes to the DWARF Line Number Information

A new register, discriminator, will be added to the line number information state machine (described in Section 6.2.2 of the DWARF specification). This register will contain an unsigned integer indicating the discriminator value to be inserted into the line number table when the next row is added. This register will have the value 0 at the beginning of each sequence within a line number program, and will be reset to 0 whenever a new row is added to the line number table.

A new extended opcode, DW_LNE_GNU_set_discriminator, will be added to set the discriminator register to a specific value. This opcode takes a single parameter, an unsigned LEB128 integer, which is the new value of the discriminator register.

The new extended opcode, DW_LNE_GNU_set_discriminator, will initially have the value 0x80, in the vendor-specific range. If the proposal is accepted by the DWARF workgroup, it will be assigned a value in the lower range (probably 4).

Scope

The DWARF change is minimal, and observes the rules for extensions to the DWARF format. No existing consumer should be affected by this change (although some incorrect implementations may nevertheless exist).

The assembler change is also minor, involving only a change to the .loc pseudo-op and the extra column in the line number table.

The compiler change will involve detecting conditional transfers of control and placing a new discriminator on subsequent instructions whenever the source file and line number remain the same. At this point, I’m unsure of the actual scope of this change.

Risk

As long as the new behavior is controlled by a command-line option, compilation without that option should remain completely unaffected, providing a trivial fallback position.

With the option turned on, existing DWARF consumers ought to completely ignore the extra line number program opcodes and should notice no difference at all (other than a slight increase in the size of the line number tables). There is a risk that some (incorrectly-written) consumers will balk at the new opcode, but the fix in each case should be minor.

The size of the DWARF line number tables will increase slightly, due to the new opcodes in the line number program. An extended opcode takes up two bytes for the opcode (0 followed by 0x80), plus an LEB128 number identifying the length of what follows, plus another LEB128 number providing the discriminator. Each of these LEB128 numbers will be only a single byte under all but extreme cases, so the typical new opcode will take 4 bytes. The percentage of source lines needing discriminators, however, is expected to be very small, so the impact of these extra opcodes ought to be well under 1% in terms of space.

Incorrect information in the discriminator column may take the form of either a missing discriminator or an unnecessary discriminator. A missed discriminator will result in missed optimization opportunities, but should never be worse than the status quo. An extra discriminator should not actually be a problem (other than increased size of the line number table), since the compiler will be using the same algorithm to assign discriminators when it generates the code both before and after profiling.

dwarfstd.org is supported by Sourceware. Contributions are welcome.

All logos and trademarks in this site are property of their respective owner.
The comments are property of their posters, all the rest © 2007-2022 by DWARF Standards Committee.