Discussion of CU blocks with global type info

From Dwarf Wiki
Jump to: navigation, search

Many C and C++ programs include headers that define many many types, but only use a few of them. This can cause huge .debug_info sections unless you work around the problem by omitting unused types. But this causes problems of its own. I'd prefer to allow users to turn on all debug type information, but they'll need a way to remove the redundant information.

I am getting ready to implement a "dwarf compressor" utility that scans CU blocks in a program, and optimizes the type information so that duplicate information is moved into a shared CU block. I haven't designed a mechanism to identify the global type info, and I'd like to discuss my choices.

In the C language model there are no global types. Types (other than language defined base types) are defined in each compile unit. You can have "struct foo" with one definition in one module, and "struct foo" with a different definition in another module.

For most programs, types (structs, unions, etc) don't have different definitions in the same program (other than the convention of using incomplete types, but I don't think that's an issue). In C++ it's a violation of the standard to have the same class defined two different ways in different parts of your program. (The "one definition" rule).

If there are multiple different definitions, a compressor program can deal with this easily by leaving those types in the CU's where they originated. The types in the "global CU" should only be those which don't have conflicting definitions.

This may disable the "enforced invisibility" feature of the debugger, for debuggers that implement such a feature. That is, if your current context is foo.c and you say "print *(bar*)0x1234", then the debugger can say "type bar is unknown in this module".

Frankly, I'm not sure anyone will miss that level of paranoid enforcement of the language model, and dbx has a flag to disable it. (And it doesn't work that well, even when you ask dbx to be strict)

  • add an attribute to the top-level CU DIE of the new 'global' CU
  • add an entry someplace in one of the index sections to point to the global type CU
  • add a pointer from each other CU to the corresponding global types CU
    • This is similar to DW_TAG_imported_unit

We could use DW_TAG_imported_unit to do this without any extensions. But that doesn't mark the CU as "global" in any way. To know if it applies to all CU's, you have to verify that every CU has a pointer to this specific DW_TAG_partial_unit.

I was thinking more of a "distinguished" CU that would only be present once in an executable or shared library. (That doesn't make it difficult to use with an archive library, so maybe that's an issue)....

This could be used for:

  • global types -- in scope of all CUs
  • ABI info, register names
  • base types

We have an outstanding issue which is how to reference types defined in this CU. The normal way is DW_AT_type with a form of DW_FORM_ref_addr (these are always 4/8 bytes). Using lots of this kind of reference can cause the debug_info section to get large again. So it's not ideal. Another way to reference them is to define a local instance of the type with only a name and a boolean flag that says "globally defined".

Obviously this needs a little more thought and discussion. Any ideas would be appreciated.