Identify Main Function

From Dwarf Wiki
Jump to: navigation, search

TODO:

Start with the short version of this proposal on the official issue list.

  • change name to "main_subprogram"
  • say it's the "source level subprogram where execution begins"
  • perhaps use the phrase "the compilation's runtime environment calls this entry point"
  • add examples for C, C++, Fortran, windows (winmain)
  • another example is a system that uses a preprocessor/wrapper to create lowered source from the user's application
  • state that it should never be mixed with 'artificial'

Most C compiler assume the main function to be "main". Other languages might have different names, or even allow it to have any name. If we have an entry to tell what is the main function in a program, it will let the debugger to present this to users in an intelligent way. Another debug-info format stabs could support this with N_MAIN stab type.

proposal text

Proposed text: (more or less)


(( Fortran has a PROGRAM statement which is used to declare a user-supplied name for the main function in a program. C and C++ have no way to rename the main function. ))

If a function has been declared as the main function in a program, it may be marked with a DW_AT_main_function attribute.


Background:


We should not extend DW_AT_entry_point because that applies to Fortran functions with multiple entry points, and that's a different concept altogether.

The most common reason to want the main function's name is to set a breakpoint in it to begin debugging. But it's also very useful to know what name was given to that function by the user, so that the name can be used when printing messages and other output.

There are a variety of implementation specific ways that are mostly reliable for using the ELF symbol table or other information to figure out this information. But it's better to record the information in a portable and reliable way in dwarf.

Sun Implementation of Fortran MAIN functions

In the phone discussion today, I explained the Sun implementation of Fortran "program" statements. But I explained the implementation incorrectly. The situation is more confused than I remembered. Here is an example:

  • The user creates a simple Fortran program:
        program prog
        integer array(1)
        print *, array(1)
        end

If the user compiles this source file into an object file, the following

1) The compiler creates a wrapper function called "main" which has startup / teardown junk (fortran library calls). This function also has a call to function called MAIN_

2) The compiler creates a function named MAIN_ to represent the code in the main function. The compiler records the name "prog" as a sort of "alias" for the function called "MAIN_".

3) Using the stabs format, there was an N_MAIN stab which pointed at the name "MAIN_", not the user-defined name. This was a little better than building the magic name "MAIN_" into the debugger, but it's unrelated to getting the user-assigned name.

This implementation allows any consumer to find the source code for the main program by searching for a function called "MAIN_" anywhere in the program. The ELF symbol table can be used to do this search, and then the debug info for single object file can be read (details will depend on the implementation). But the consumer will not know the user-defined name of the main function unless some additional debug info is read and understood. Not knowing this name is not a fatal problem for usability. The source can still be shown.

The Sun compilers have a general notion of "user name" versus "linker name" for functions. This distinction is used heavily in C++ because of mangled names, but it also crops up for C and Fortran in some cases. In this case it would make sense for the Sun dwarf information to use "MAIN_" for the linker name of this function, and "prog" for the user name of the function. If we did that, then no specific extension would be needed. (Other than the existing heuristic of looking for a function with the linker name of MAIN_/MAIN/main_/main.) But that's not the way it's currently implemented.

Because of this untidiness in the Sun implementation, I can't really argue that adding DW_AT_main will allow Sun to replace our existing extension with a standard mechanism.


Intel Implementation of FORTRAN Main Functions

This is John Bishop writing.

Our debugger looks for known names generated by the compiler: in the Linux case it's "MAIN__". We get the address of that symbol and then look for the "closest preceeding" routine entry symbol with the same address. That's the name of the FORTRAN main routine.

As I verfied at the end of the con-call on April 3, 2007, Intel Fortran won't link a pair of FORTRAN .o files which both have main routines (i.e., a routine specified with "PROGRAM" rather than "SUBROUTINE", etc.)

While this works, it's not beautiful; I therefore re-affirm my (mild) support for a pair of attributes:

  • One for a routine which means "I believe I am a main routine"
  • One for a compilation unit which means "One of my routines believes it is a main routine"

The combination would let our debugger find the main routine on a traverse of the compilation units, dipping only deeply into the one(s) which need to be more closely looked at.

At the moment, this optimization of the initial scan wouldn't make a difference, as our debugger already scans all the symbols in a cursory fashion so that it can build a symbol table with the "important" symbols in it. I checked, and we don't use the .debug_pubnames section.

Description of OpenVMS tools

This was copied by Chris Quenelle from some email by Jeff Nelson It came from several distinct emails, and was glued to together by Chris. There were other questions interspersed, but I tried to capture the details of the OpenVMS implementation.

What do we mean by "main entry point"? On OpenVMS, we sometimes have two:

a) The the first executable code address. This is where the operating system transfers control after it has loaded the program into memory.

b) The logical entry. Some languages (C & C++ in particular) have a RTL-supplied routine that initializes the language environment before control is transferred to what the user thinks is "main". Sometimes the initialization code invokes user-written code (e.g., constructors for static objects).

The OpenVMS debugger gives the user the choice to begin debugging at (a) or skipping to (b).

The compiler emits a "magic symbol" named TRANSFER$BREAK$GO. This symbol must appear in the compilation unit of (a). The symbol value is (b), the address of the logical entry point.

At startup, OpenVMS DEBUG determines the actual entry point, locates the compilation unit containing the entry point address and reads the symbols in the CU. If the debugger finds the magic symbol, the debugger sets a breakpoint at (b) and tells the user "type GO to get to main program". The magic symbol is kept hidden from the user. If the magic symbol should appear in some other CU it is ignored (because the special recognition code only appears in the debugger startup path).

The TRANFER$BREAK$GO symbol is language-independent. Any compiler can emit the symbol and we will honor it. Most don't use it. Those that do include PL/1, Ada, C and C++.

We make no assumption about the name of 'main' or its signature. OpenVMS DEBUG supports the symbolic debugging of 12 languages (BASIC, FORTRAN, Pascal, C, C++, COBOL, PL/1, BLISS, Ada, Assembler, DIBOL, RPG) on as many as three architectures (VAX, Alpha, Integrity) so we try to be as language-independent as possible. As far as we are concerned, any routine can be 'main'.