CFI with Abbrevs

From Dwarf Wiki
Jump to: navigation, search

SKETCH: CFI with abbrevs

The idea here is to provide a more general description for CFI information based on DIEs and attributes. Any future need for more information in the header for a CFI entry can be encoded as a new attribute without breaking older consumers. Similarly, a producer can extend the information in the CFI without fear of breaking consumers that may not understand the extension. Of course, this benefit only starts after the release of a standard using this change, as it is totally incompatible with the DWARF 3 standard.

6.4.1: Structure of Call Frame Information

...This table would be extremely large... [all preceding and paragraph as before]

The virtual unwind information is encoded in two self-contained sections called .debug_frame_info and .debug_frame_abbrev. Entries in a .debug_frame_info are Frame Description Entries (FDEs), which are specialized Debugging Information Entries with tag DW_TAG_frame_info.

If the range of code addresses for a subprogram is not contiguous, there may be multiple FDEs corresponding to the parts of that subprogram.

An FDE may contain any of the following attributes:

* DW_AT_frame_version whose value is a constant version number specific to the
  call frame information and independent of the DWARF version number (see
  Appendix F).

* DW_AT_code_alignment_factor whose value is a constant that is factored out of
  all advance location instructions (see below).  If the attribute is not
  present, the code alignment factor is 1.

* DW_AT_data_alignment_factor whose value is a constant that is factored out of
  all offset instructions (see below).  If the attribute is not present, the
  data alignment factor is 1.

* DW_AT_return_address_register whose value is a constant that indicates which
  column in the rule table represents the return address of the function.  Note
  that this column might not correspond to an actual machine register.  If this
  attribute is not present, the return address column is 0.

* DW_AT_low_pc and DW_AT_high_pc whose values encode the contiguous address
  range described by the FDE (see Section 2.17).

* DW_AT_initial_instructions whose value is a block containing a sequence of
  rules that are interpreted to create the initial setting of each column in
  the table.

  The default rule for all columns before interpretation of the initial
  instructions is the undefined rule.  However, an ABI authoring body or a
  compilation system authoring body may specify an alternate default value for
  any or all columns.

  This attribute is distinct from the DW_AT_instructions attribute so that
  it can appear in a separate DW_TAG_frame_info entry referenced by a
  DW_AT_frame_info attribute, and so that the initial state that it defines can
  be used by the DW_CFA_restore instruction.

* DW_AT_instructions whose value is a block containing a sequence of table
  defining instructions that are described below.

* DW_AT_frame_info whose value is a reference to another FDE which contains
  additional frame information not present in this FDE.  A common use for this
  attribute is to reference an FDE which contains attributes that apply to a
  number of FDEs such as DW_AT_cie_version, DW_AT_code_alignment_factor,
  DW_AT_data_alignment_factor, and DW_AT_return_address_register.  Such an
  FDE would have been a CIE in DWARF 3.

Additional FDE attributes may be defined by an ABI authoring body or a compilation system authoring body.

7.23: Call Frame Information

The Call Frame Information is encoded in the same fashion as Debugging Information, as described in section 7.5. However, the Call Frame Information is located in the .debug_frame_info and .debug_frame_abbrev sections, which are analogous to the .debug_info and .debug_abbrev sections, respectively. Normally, they will contain only DW_TAG_frame_info and null entries. (Also, normally, the .debug_info and .debug_abbrev sections will not contain DW_TAG_frame_info entries.)

The value of the DW_AT_frame_version version number is 4 (see Appendix F).

Call frame instructions are encoded... [as before]

Figure 18: Tag Encodings:

[Add:] DW_TAG_frame_info [The value thereof should be the current last tag + 1]

Figure 20: Attribute Encodings:

[Add:] DW_AT_frame_version [The value thereof should be the current last attr + 1] DW_AT_code_alignment_factor [etc.] DW_AT_data_alignment_factor DW_AT_return_address_register DW_AT_instructions DW_AT_frame_info

Figure 42: Attributes by TAG value

[Append:] DW_TAG_frame_info

  DW_AT_frame_version
  DW_AT_code_alignment_factor
  DW_AT_data_alignment_factor
  DW_AT_return_address_register
  DW_AT_low_pc
  DW_AT_high_pc
  DW_AT_initial_instructions
  DW_AT_instructions
  DW_AT_frame_info

INTERESTING POINTS:

The .debug_frame section is gone, replaced by .debug_frame_info and .debug_frame_abbrev. I considered calling .debug_frame_info .debug_frame, but I was concerned that the totally different format might confuse DWARF 2/3 consumers.

The length field was omitted because it can be determined from the abbreviation associated with the FDE and from the block size for the instructions in each of those. This does require a bit more reading to skip a CIE or FDE. Or, a producer could use a DW_AT_sibling attribute to allow a consumer to walk the FDEs even faster. This issue could be revisited if this is deemed a problem.

The cie_id is not necessary because there is no distinction between a CIE and an FDE any more.

The DW_AT_cie_pointer attribute can use the DW_FORM_refn forms to eliminate relocations.

The FDE uses the DW_AT_low_pc and DW_AT_high_pc attributes. This is predicated on the acceptance of the "DW_AT_high_pc encoded as a constant offset from the DW_AT_low_pc" proposal which allows the elimination of the relocation entry for the DW_AT_high_pc. This is important because the old FDE encoding required a relocation entry for only the "initial location" (DW_AT_low_pc counterpart) and not for the "address range" (DW_AT_high_pc counterpart).

The augmentation string is gone. Producers now can define their own attributes instead of relying on odd augmentation strings. This provides greater compatibility. Currently, if a producer uses an augmentation string, it may imply that a CIE that contains the string or an FDE that references such a CIE may have additional header fields. As a result, consumers cannot interpret those CIEs or FDEs at all. With an attribute-based approach, a consumer can ignore only those attributes that it doesn't understand and need not worry about become desynchronized with the stream of bytes because every attribute has a form which implies its size. (This is just like ordinary DIEs.)

The .debug_frame section would have compilation unit headers, just like .debug_info.

The DW_AT_initial_instructions exists primarily so that DW_CFA_restore remains meaningful. Were it not for that, DW_AT_instructions could have been used for both with some rules about interpreting instructions from a DW_AT_frame_info-referenced FDE before interpreting instructions from the current FDE.

Any padding required can be done with DW_CFA_nop instructions in the DW_AT_initial_instructions and DW_AT_instructions blocks.

====================================================================

Areas not covered by this proposal sketch yet:

  1.5.1 mentions Common Information Entry
  DW_CFA_restore mentions CIE
  DW_CFA_nop mentions CIE
  6.4.3 needs to be reworked
  7.2.2 mentions CIE
  7.4 mentions CIE in numerous places
  7.23 mentions CIE
  Figures 63 & 64 need reworked
  Index mentions CIE & common information entry

I'll take care of these after the plan starts to settle down and it's time to produce a final proposal.

====================================================================

Further discussion:

The issue of nesting for DIEs in these sections was brought up. It was suggested that that could be used for "fallback" information instead of following an attribute. Also, with either approach, the issue of how contradictions between a DIE and a referenced (or parent) DIE are resolved. This is particularly an issue for DW_AT_instructions. My comments on this were:

I envisioned no nesting in the .debug_frame_info section. The proposal defines no meaning for nesting, and so I figure any nesting would be semantically neutral.

The general problem you bring up about attributes that exist in the "leaf" DW_TAG_frame_info and any referenced DW_TAG_frame_info seems analogous to our handling of abstract & concrete DIEs. In that case, there is no direct statement about the meaning of the DWARF if a concrete DIE has an attribute that contradicts one in its abstract counterpart. I would imagine that the one from the concrete DIE would win, but I can find no statement to that effect. Rather, it is stated that the concrete DIE merely omits attributes because they are present in the abstract DIE. I figure the implication is that a contradiction between the abstract & concrete DIEs would be considered bad Dwarf.

I can see a number of solutions:

  1) Reformulate this in terms similar to that for abstract/concrete DIEs, so
     that attributes are omitted from frame_info's if they are specified in a
     frame_info referenced by a DW_AT_frame_info (transitively).  This leaves a
     contradiction undefined, but with the implication of bad Dwarf again.

  2) Indicate that any attributes in a frame_info override any same-named
     attributes in another frame_info referenced by DW_AT_frame_info.

  3) Indicate that any attributes in a frame_info override any same-named
     attributes in another frame_info referenced by DW_AT_frame_info, except
     for DW_AT_initial_instructions and/or DW_AT_instructions, in which case
     the instructions are appended to those from the referenced frame_info
     (transitively).

Personally, I favor approach #2. I don't see contradictions happenning often, and this description is simple and flexible.

====================================================================

These are comments from the meeting, transcribed directly from the minutes:

Should Todd proceed with this, turning it into a full-fledged proposal, working out all the parts of the document which need to be changed to accommodate?

Someone would like to see some discussion of what we'll do for old consumers who aren't prepared to read this new format. He doesn't think we need a comprehensive list of all the little places that change in the std to have a discussion about it.

Does anyone know offhand what the gcc augmentation strings do? Most of the augmentations exist to specify the size of different objects in the frame info (how big a pointer is, whether a given pointer value is relative to the frame or is absolute, etc).

David Anderson observes that this mostly appears in the eh_frame information. Todd agrees that most of what they've added has been added to the eh_frame information.

Should we incorporate exception handling & unwinding in DWARF? Is there any value in trying to incorporate description of the EH section into DWARF? General feeling was that this was outside the scope of DWARF as a debugging format.

Todd would be inclined to see if the gcc developers would be inclined to moving towards an abbrev/info format like we're looking at.

DWARF has the property that old consumers can skip information produced by new producers - and that's what we want to address in this proposed change.

Michael asks if a new consumer would really be able to understand frame info if it doesn't understand all the attributes? Jim argues yes - that it will behave like the debug_info section does today.

John DelSignore points out that consumers will have to consume both old debug_frame and new frame+abbrev informations (programs will undoubtedly have a mixture of frame infos within the program

Michael asks what the impact of doing something like this will be. Is it big work? Little work? Small work for big benefit? Jim Blandy says that it's a question of how much value you place on making the format flexible.

Andrew says that the alternative is to add an augmentation which specifies the address size. But consumers that don't know that augmentation will ignore the debug_frame info altogether.

Would it be less incompatible if we defined the augmentation strings so they were somehow self describing, or their size was self describing? Jim says that this is what the abbrev format is.

Andrew suggests that the augmentation is for compression.

In terms of the spec, Jim suggests that making the debug frame info/abbrev-based, he thinks it will reduce the size of the spec. Instead of having a different format used in the frame info.

Michael is less worried about the size of the spec than the ease of understanding the frame info - he thinks the info/abbrev style format would be easier to understand.

Todd notes that a lot of the CIE would remain as it is.

One person suggests that the format change seems like a lot of trouble for an intangible gain.

Jim says that the original impetus for this was the address size issue. Then there's the segmentation discussion. Maybe what we should do is say OK, here's an idea, we know how it would work; we wait until we have something exciting that we want to change in the frame info and when someone says Gee, I wish I could do this with CFI, then we can roll out our info+abbrev style CFI.

Isn't it a possibility here to pick up & document the augmentations used by the eh_frame and allow them to be used in the DWARF frame? Those have the capabilities for describing the different pointer sizes and addressing modes. This might be added to the website or wiki, rather than part of the standard.

Jim has been uninterested in documenting the augmentation strings because in practice the augmentation strings are not so different from version numbers; in practice, when you see a character, either you recognize it and can consume the information or you don't and you have to stop reading. In practice there isn't that much difference between adding a new character in the augmentation and simply bumping the version number.

So in the specific case of describing the target address size, instead of adding it to the augmentations, he'd be just as happy to bump the CFI version number and describe it in a header.

John suggests (Bill White suggested in the past) that we make the CFI format the same but we make the augmentation field extensible - maybe with a DWARF1 style abbrev+info+value. This format could be indicated via a new augmentation character in the existing format. There is a lot of agreement that this could be the best approach impacting consumers the least amount while allowing for future extensibility.


Where do we go: A proposal to add address size; a proposal to move to abbrev+info for the whole section; a proposal to use a structured format augmentation format? Or leave it alone altogether?

Jim argues for some way to, at least, specify an address size in the CIE.

Michael thinks that adding address size information would be a good thing.

Given that we're talking about DWARF 4, John Bishop would give a weak Yes vote given that we'll be doing a major revision.

Michael summarizes by saying that adding address information is good - he's not hearing a lot of support for adding a new abbrev+info format for augmentation, or revising the format entirely with an abbrev+info format.

====================================================================

Although many people like this idea and said that, if we were designing this functionality from scratch, this would be a good design, it doesn't seem that there's enough bang for the buck to redesign it in this fashion at this late date.