Symbols

As BE is often used for viewing memory dumps from embedded programs, support for symbol tables is highly desirable. Although BE technically need only support one format, it actually supports a few of the more commonly used formats to avoid a proliferation of symbol file conversion programs.

BE supports :-

ARM linker .sym
Linux Kernel symbols
AIX NM
link.exe .map
NetWare .map
L80 .SYM
SDCC .noi

ARM linker .sym

The arm symbol format is the default. Each non-blank line in the symbol file has the symbol name, followed by a number of spaces, followed by the address specified in hex (without an 0x prefix). Additional information is sometimes present on the end of the line (particularly if overlays are used), but this is ignored.

Linux Kernel symbols

On a Linux computer, the 'proc' filesystem provides a special file called /proc/ksyms. Each line of this file has an address in hex (without an 0x prefix), followed by a space, followed by the symbol name.

This is the ksyms symbol table format.

eg: assuming kernel.dat is a dump of the kernels memory :-

be -Y ksyms -y /proc/ksyms kernel.dat

Note that sometimes the address and symbol are followed by more information. This additional information is ignored.

Linux has a symbol versioning convention whereby it can append a suffix to each symbol. The suffix varies depending upon the type of Linux kernel in use, ie: whether it is SMP or not, or compiled in '2GB mode' or not. BE has the following symbol formats, which strip the indicated suffix of each symbol as it is read :-

BE symbol format what suffix is stripped
ksyms_R _Rhhhhhhhh
ksyms_Rsmp_ _Rsmp_hhhhhhhh
ksyms_R2gig_ _R2gig_hhhhhhhh
ksyms_Rsmp2gig_ _Rsmp2gig_hhhhhhhh

BE symbol format	what suffix is stripped
`ksyms_R`	`_Rhhhhhhhh`
`ksyms_Rsmp_`	`_Rsmp_hhhhhhhh`
`ksyms_R2gig_`	`_R2gig_hhhhhhhh`
`ksyms_Rsmp2gig_`	`_Rsmp2gig_hhhhhhhh`

In the above hhhhhhhh are lower case hex digits, which contain the versioning information. BE allow 8 or 16 digits in the versioning information.

See /usr/src/linux/Rules.make to understand where these suffixes come from.

AIX NM

The nm command on an AIX 4.1 or later machine generates output which is understood by the aix_nm symbol table format.

Typically nm is invoked with the -e argument, so that only external symbols get listed.

Each line has the symbol name, followed by a symbol type character, followed by an address and optionally followed by a length. Fields are seperated with white space. Addresses and lengths are 0x preceeded if they are listed in hex (this is caused by invoking nm with the -x flag).

BE ignores 4 byte type d data entries from the table, as these tend to refer to TOC entries.

BE also ignores machine generated symbols which start _$STATIC.

C++ symbol names are typically listed demangled, and so can contain spaces. BE has quite complicated special logic to handle this.

Note that the symbol values obtained using nm are actually offsets from the beginning of the executable. You'll need to determine where the executable is in memory or the crash dump memory image, perhaps using the AIX crash command. Assuming this base value to be 0xBBBBBBBB, you would pass the following options to BE :-

-Y aix_nm -y symbolfile.sym@0xBBBBBBBB

It is not too difficult to write a memory extension which accesses AIX kernel memory space by accessing /dev/kmem. Hey presto, BE can show live datastructures within the AIX kernel!

link.exe .map

The map format corresponds the .map files written by the 16 bit DOS link.exe program.

This has a section at the beginning of the file which declares segment names, positions and sizes. BE ignores this.

Next the symbols are listed, ordered by name, and BE ignores this too.

Finally the symbols are listed again, ordered by value. BE reads this data.

Each line is of the form :-

SSSS:OOOO SymbolName

BE enters an entry in the symbol table of value 0xSSSSOOOO for each symbol. This works well in conjunction with BEs -g command line argument.

eg: assuming embedded.map is the map file from linking some embedded application, and that dump.dat is a dump of the memory starting at physical location 0xf0000 :-

be -Y map -y embedded.map -g dump.dat@0xf0000

NetWare .map

NLMs and drivers can be linked using the NetWare or Watcom linkers and these can both be made to spit out a .map file.

In the .map file, symbols are listed with their offset from the start of the CODE or DATA segment. In order to know a symbols address we must load the NLM and determine its CODE and DATA segment base addresses. These base values can then be added onto the offset values in the .map file.

The bases can be determined using the built-in NetWare debugger. Enter it via the Shift+Shift+Alt+Esc sequence, use .m nlmname to get the bases, and g to resume NetWare.

The following options are provided :-

-Y nw_nw_code: Read .map file produced by NetWare linker, and extract and process CODE symbols.
-Y nw_nw_data: Read .map file produced by NetWare linker, and extract and process DATA symbols.
-Y nw_wc_code: Read .map file produced by Watcom linker, and extract and process CODE symbols.
-Y nw_wc_data: Read .map file produced by Watcom linker, and extract and process DATA symbols.

Assuming an NLM had its code based at 0xCCCCCCCC, and its data based at 0xDDDDDDDD, and it was linked with the Watcom linker, you would use the following BE options :-

-Y nw_wc_code -y nlmname.map@0xCCCCCCCC
-Y nw_wc_data -y nlnmame.map@0xDDDDDDDD

Notice how we process the .map twice - once to get the code symbols and to relocate them by 0xCCCCCCCC, and once to get the data symbols and to relocate them by 0xDDDDDDDD. Awkward, but it works, without having to post-process the .map file output by the linker.

The NetWare linker output has a section which looks like :-

Publics By Address
  DATA 00005B94 Evan                                (D:\build\ham\hamdata.c)
  DATA 00005B98 deviceName                          (D:\build\ham\hamhacb.c)
  DATA 00005BA8 hamName                             (D:\build\ham\hamnlm.c)

It is this section which BE uses.

The Watcom linker output has lines in it of the form :-

CODE:00305678  fhbf
CODE:00045678+ symmy
DATA:00345678* sym
DATA:00345008s symbol

To complete the picture, all that is needed is a BE memory extension which allows BE to access the memory space of an NLM.

L80 .SYM

L80 is a linker used on CP/M systems. If invoked with the /Y option, it will write a .SYM file, looking like this :-

31F0 BLKRD	2EC1 DUSER	31C1 EXCNFG	321D EXRD	
3232 EXWR	31A5 INITLZ	2010 POC	

^Z

Because its a CP/M file, there can be junk after the ^Z EOF character. The file also has CP/M style CR LF at the end of each line.

A sample command telling BE to read a L80 .SYM file :-

be -A 16 -C z80 -Y l80 -y ALPHA.SYM ALPHA.COM@0x100

SDCC .noi

The SDCC toolchain can write NoICE .noi files, and BE can make an attempt to read them.

The main problem with these files is that the SDCC compiler seems to create extra _func_begin and _func_end, as well as the normal _func for a C function called func. So BE has a heuristic to strip these out.