Two-Level Line Number Tables

<table style="text-align: left; width: 600px; height: 107px;" border="0" cellpadding="0" cellspacing="0">
  <tbody>
    <tr>
      <td style="width: 122px;"><a href="/" title="DWARF Wiki">
      <img src="/images/dwarf.png" alt="DWARF Debugging Format" style="border: 0px solid ; width: 112px; height: 107px;"></a></td>
      <th colspan="4" rowspan="1" style="vertical-align: middle; text-align: center;"><big><big>DWARF Debugging Standard Wiki</big></big>
      </th>
    </tr>
  </tbody>
</table>
<p>


<!-- End Of Header -->

<h1 id="Two-2d-Level-Line-Number-Tables">Two-Level Line Number Tables</h1>

<p><em>Cary Coutant</em></p>

<p><em>Last updated December 12, 2014</em></p>

<p>Inlined call information is currently represented in DWARF-4 in the DIE
structure, as a DW_TAG_inlined_subroutine DIE representing the inlined
function under another DIE representing the calling function. The
DW_TAG_inlined_subroutine DIE contains a DW_AT_abstract_origin attribute
that refers to a top-level DW_TAG_subprogram DIE that provides the name
and declaration coordinates for the inlined function. For each instance
of an inlined call, there is one DW_TAG_inlined_subroutine DIE, which
contains DW_AT_range (or DW_AT_low_pc/high_pc) attributes that identify
the instructions that correspond to the inlined function.</p>

<p>To generate a symbolic backtrace across inline calls using DWARF-4, it
is currently necessary to consult the line number table to map an
instruction to the file and line number of the inlined subprogram, then
consult the DIE tree to identify the particular instruction as part of
an inlined subroutine. Once the DW_AT_inlined_subroutine DIE is located,
the name of the inlined function and the source coordinates of the point
of call can be determined.</p>

<p>Because inlining typically happens during optimization, and optimization
tends to schedule code so that instructions for different source
statements are heavily interleaved, the range tables for inlined calls
can be quite inefficient (in space and lookup time). Requiring a
symbolizer to search the DIE tree for inlined function instances also
introduces a significant inefficiency, since there is no hint in the
line number table that any given instruction does or does not correspond
with an inlined function.</p>

<p>On HP-UX, inlined call information is represented directly in the line
number tables, using a two-level scheme. The top level, called the
logicals table, is a line table with one row for each instance of a
logical statement in the program. In this table, each row provides a
filename, line number, and recommended breakpoint location. For
statements that are part of an inlined function, a &ldquo;context&rdquo; column
provides a reference to the row representing the point of call, and a
&ldquo;function name&rdquo; column provides a pointer to a string table entry giving
the name of the inlined function. The second level, called the actuals
table, is a line table with one row for each machine instruction in the
program. In this table, each row provides the instruction address and a
reference to the row in the logicals table that describes the statement
associated with that instruction. Both tables are encoded by the
standard DWARF line number table scheme, with the exception that some of
the standard opcodes are defined slightly differently for the two
tables.</p>

<h2 id="The-Basic-Proposal">The Basic Proposal</h2>

<p>The basic idea is to split the line number table into two parts: a
&ldquo;logicals&rdquo; table, and an &ldquo;actuals&rdquo; table. The logicals table would
contain a row for each logical statement in the program, mapping each
statement to a recommended breakpoint location. The actuals table would
contain a row for each machine instruction, mapping each instruction to
a row in the logicals table. The two tables would reside in the same
.debug_line section, share the same header, and use the same encoding.
The line number program header would be extended with a pointer to the
actuals table, which would follow the logicals table. The actuals table
would be optional, and if absent, the logicals table would degrade
gracefully to the single-level line number table from DWARF-4.</p>

<p>We also add a subprograms list to the line number program header,
following the lists of directories and file names. Like those lists, the
subprograms list has a customizable format that allows the producer to
provide additional information about each subprogram (e.g., declaration
coordinates). Each row in the logicals table may reference an entry in
the subprograms list.</p>

<p>The DW_TAG_inlined_subroutine DIEs are still necessary in order for the
debugger to identify functions that have been inlined into others, and
the range information provided by DW_AT_ranges or DW_AT_low_pc/high_pc
may still be useful to debuggers. The DW_AT_call_file, DW_AT_call_line,
and DW_AT_call_column attributes may be omitted.</p>

<h3 id="The-Logicals-Table">The Logicals Table</h3>

<p>The logicals table corresponds most closely to the DWARF-4 line number
table, and the DW_AT_stmt_list attribute in the compilation unit DIE
points to it. This table is used directly for mapping a source location
(&ldquo;logical statements&rdquo;) onto a recommended breakpoint location. It
contains the following columns:</p>

<ul>
<li>address/op_index</li>
<li>file</li>
<li>line</li>
<li>column</li>
<li>is_stmt</li>
<li>prologue_end</li>
<li>epilogue_begin</li>
<li>discriminator</li>
<li>context</li>
<li>subprogram</li>
</ul>


<p>The &ldquo;basic_block&rdquo; and &ldquo;isa&rdquo; registers are unused in the logicals table.
Because this table represents single locations rather than address
ranges, the &ldquo;end_sequence&rdquo; register is also unused in the logicals
table.</p>

<p>Each row with &ldquo;is_stmt&rdquo; true corresponds to a recommended breakpoint
location for a source statement. Rows with &ldquo;is_stmt&rdquo; false correspond to
prologue_end and epilogue_begin points or to logical positions in the
source where a non-zero discriminator is required. The table is ordered
by address, and rows are implicitly numbered starting from 1.</p>

<p>The &ldquo;context&rdquo; column is new, and is used to represent inlined functions.
When it is non-zero, the row describes a logical statement that is part
of an inlined function, and the value in the &ldquo;context&rdquo; column refers to
another row number in the logicals table. That row represents the
logical statement where the inlined call was made (which may itself be
part of another inlined function).</p>

<p>The &ldquo;subprogram&rdquo; column is new, and is used to provide information about
the subprogram corresponding to the logical statement. When it is
non-zero, it refers to an entry in a subprograms list in the line number
program header. Entries in the subprograms list provide the name and
optional related information, as described by the header. Typically,
each entry in the list would provide the file and line number where the
subprogram was defined (i.e., the DW_AT_decl_file and DW_AT_decl_line
attributes from the subprogram DIE).</p>

<p>When a statement in the source program is replicated in the generated
code (e.g., via loop unrolling), each replication is represented with a
separate row in the logicals table, so that the debugger can easily find
all breakpoint locations for a given statement.</p>

<p>One new standard opcode is needed to set the &ldquo;context&rdquo; and &ldquo;subprogram&rdquo;
registers: DW_LNS_inlined_call takes two unsigned LEB128 numbers as
operands. It sets the &ldquo;context&rdquo; register to the value of the first
operand, and the &ldquo;subprogram&rdquo; register to the value of the second
operand.</p>

<p>A second new standard opcode, DW_LNS_pop_context, can be used to restore
the state machine registers to the values from the logical row referred
to by the current value of the &ldquo;context&rdquo; register. (This models a return
from an inlined call.) This opcode takes no operands.</p>

<p>At the beginning of each sequence, the &ldquo;context&rdquo; and &ldquo;subprogram&rdquo;
registers are both set to zero. The special opcodes and DW_LNS_copy do
not alter these two new registers.</p>

<h3 id="The-Actuals-Table">The Actuals Table</h3>

<p>The actuals table is used to map individual machine instructions to
logical statements. On HP-UX, this table is placed in a separate section
with its own line number table program header, but we could instead
place it in .debug_line immediately following the logicals table, and
add another item to the line number program header that provides the
offset to the actuals table. The actuals table contains the following
columns:</p>

<ul>
<li>address/op_index</li>
<li>logical_row</li>
<li>basic_block</li>
<li>end_sequence</li>
<li>isa</li>
</ul>


<p>The &ldquo;logical_row&rdquo; column is populated from the &ldquo;line&rdquo; register. The
&ldquo;file&rdquo;, &ldquo;column&rdquo;, &ldquo;is_stmt&rdquo;, &ldquo;prologue_end&rdquo;, &ldquo;epilogue_begin&rdquo;, and
&ldquo;context&rdquo; registers are ignored for this table.</p>

<p>Each row in the actuals table corresponds to a machine instruction, and
is ordered by address. Where multiple consecutive rows differ only in
the address column, only the first row is represented in the table. A
row where &ldquo;basic_block&rdquo; is true, however, may not be omitted, but if one
row has &ldquo;basic_block&rdquo; true, and subsequent rows have &ldquo;basic_block&rdquo;
false, the subsequent rows may be omitted from the table.</p>

<p>When a single machine instruction corresponds to more than one source
statement (e.g., due to optimizations such as common subexpression
elimination), a separate row for the same address is added to the
actuals table for each statement. These consecutive rows are then
treated as a single row designating a set of logical statements that are
associated with the instruction at that address. (The &ldquo;logical_row&rdquo;
register and column therefore hold a set of values rather than a single
value.) If the rows for subsequent machine instructions are omitted,
those subsequent instructions are also associated with the same set of
logical statements.</p>

<p>All the existing opcodes that modify the &ldquo;line&rdquo; register can be used to
set &ldquo;logical_row&rdquo;.</p>

<p>One new standard opcode, DW_LNS_set_address_from_logical, can be used to
set the &ldquo;address&rdquo; register to the value of the &ldquo;address&rdquo; column from the
logicals table row referred to by the current value of the &ldquo;line&rdquo;
register. This opcode takes one operand, and works like
DW_LNS_advance_line, with the additional side effect of setting the
&ldquo;address&rdquo; register.</p>

<h2 id="Extensions">Extensions</h2>

<p>The two-level line table is also a convenient mechanism for supporting
additional optimizations such as software pipelining, for supporting
accelerated single stepping, and for supporting checkpoint-based
debugging.</p>

<h3 id="Support-for-Software-Pipelining">Support for Software Pipelining</h3>

<p>With software pipelining, instructions in a loop are often scheduled one
or more iterations ahead (or behind). It can be useful to a debugger
(and the user) if we can tag such instructions. To support this, we add
a new &ldquo;iteration&rdquo; register and column to the actuals table.</p>

<p>When loop prologue code contains instructions generated for a statement
within the loop body, the &ldquo;iteration&rdquo; register may be set to 1 to
indicate that the instruction logically belongs with the first iteration
of the loop. Likewise, instructions within the loop body that logically
belong to the next iteration would be tagged with an &ldquo;iteration&rdquo; of 1.
(Higher values can be used when scheduling instructions more than one
iteration ahead.) Instructions within the loop body (or epilogue) that
logically belong to the previous (or last) iteration can be tagged with
an &ldquo;iteration&rdquo; of -1. (Lower values can be used when scheduling
instructions more than one iteration behind).</p>

<p>One new standard or extended opcode is required:
DW_LNS/DW_LNE_set_iteration, which takes an unsigned LEB128 number as an
operand. It sets the &ldquo;iteration&rdquo; register to the value of its operand.
[Because the &ldquo;column&rdquo; register is not used in the actuals table, we
could alias DW_LNS_set_column and DW_LNS_set_iteration, giving an
efficient means of setting the new register without allocating an extra
standard opcode.]</p>

<p>At the beginning of each sequence, and whenever a row is added to the
actuals table (via a special opcode or by DW_LNS_copy), the &ldquo;iteration&rdquo;
register is reset to 0. (These are the same rules as for the
&ldquo;discriminator&rdquo; register.)</p>

<h3 id="Enhanced-Support-for-Single-2d-Stepping">Enhanced Support for Single-Stepping</h3>

<p>In order to single step by source statement, a debugger typically single
steps by machine instructions until the &ldquo;file&rdquo; or &ldquo;line&rdquo; column changes.
In optimized code, this technique can cause the debugger to stop early,
and can often cause the debugger to &ldquo;hop&rdquo; between two source lines
several times. We can improve this by providing enough information for
the debugger to determine the set of possible next breakpoint locations,
so that it can set temporary breakpoints at each location, then free run
until hitting one. To support this, we add a new &ldquo;next&rdquo; register and
column to the logicals table. This register can hold a set of row
numbers (unlike other registers, which only hold a single value). The
row numbers in the set refer to other rows in the logicals table, and
each such row represents one of the possible breakpoint locations that
can be reached after single stepping over the current statement.</p>

<p>Most statements will fall through unconditionally to the next logical
statement (i.e., the next row in the logicals table where &ldquo;is_stmt&rdquo; is
true), and the &ldquo;next&rdquo; register can contain a special value for this
case. At the beginning of each sequence, and whenever a row is added to
the logicals table, the &ldquo;next&rdquo; register is set to this special value.</p>

<p>Some statements may have one or more alternate locations that may be
reached conditionally, in addition to the fall-through case (e.g.,
if-then-else and switch statements). For these cases, we provide a new
standard opcode, DW_LNS_append_next, which takes an unsigned LEB128
number as its operand. It appends the operand&rsquo;s value to the &ldquo;next&rdquo;
register, without removing the previous contents of the set. [The &ldquo;isa&rdquo;
register is not used in the logicals table, so we could alias
DW_LNS_set_isa and DW_LNS_append_next, avoiding the allocation of an
extra standard opcode.]</p>

<p>Some statements may never fall through (e.g, goto statements and ends of
loops), so we also provide a new standard opcode,
DW_LNS_clear_fall_through, which removes the special fall-through value
from the &ldquo;next&rdquo; register. [The &ldquo;basic_block&rdquo; register is not used in
the logicals table, so we could alias DW_LNS_set_basic_block and
DW_LNS_clear_fall_through.]</p>

<p>When a statement contains one or more non-inlined function calls, the
next breakpoint location in the called function(s) is not necessarily in
the same logicals table, and cannot be represented in the &ldquo;next&rdquo;
register. Instead, we add a row to the logicals table for each call
instruction, and add those row numbers to the &ldquo;next&rdquo; register, in
addition to the locations that may be reached by stepping over the call.
These new rows that can be reached via &ldquo;step-into&rdquo; will have &ldquo;is_stmt&rdquo;
false, while rows that can be reached via &ldquo;step-over&rdquo; will have
&ldquo;is_stmt&rdquo; true.</p>

<p>When a statement contains one or more inlined function calls, the
compiler should add all locations that may be reached via &ldquo;step-into&rdquo; or
&ldquo;step-over&rdquo;. To step over a statement that may have inlined calls, the
debugger can simply ignore rows whose &ldquo;context&rdquo; value points back to the
current row.</p>

<h3 id="Support-for-Checkpoint-2d-Based-Debugging">Support for Checkpoint-Based Debugging</h3>

<p>With the current line number tables, single-stepping or moving from one
breakpoint to the next requires that each logical statement have its own
address, so that the debugger can execute at least one machine
instruction per statement. An alternative approach is simulated
execution, where a sequence of statements may all share a common
recommended breakpoint location (a &ldquo;checkpoint&rdquo;), and single-stepping
through the sequence results in no activity in the inferior process. In
order to support this kind of debugging scenario, the location lists for
each variable modified by the statements in the sequence must be able to
select a different DWARF expression based on the logical row number
rather than the current PC address. In addition, each logical row must
indicate whether a single step from that row can be performed via
simulation (i.e., by simply changing the current logical row), or by
actually executing instructions in the inferior process.</p>


<!-- Start Of Footer -->

<p>
<table style="text-align: left; width: 800px;" cellspacing="0" cellpadding="0" border="0">
  <tbody><tr><td style="width: 800px; text-align: center;"><small>
<em><strong>dwarfstd.org</strong></em> is supported by <a href="https://sourceware.org/">Sourceware</a>. Contributions are welcome.
<br><br>
All logos and trademarks in this site are property of their respective
owner. <br>
The comments are property of their posters, all the rest © 2007-2022
by DWARF Standards Committee.</small></td></tr></tbody>
</table>