As BE is often used for viewing memory dumps from embedded programs, support for symbol tables is highly desirable. Although BE technically need only support one format, it actually supports a few of the more commonly used formats to avoid a proliferation of symbol file conversion programs.
BE supports :-
The arm
symbol format is the default.
Each non-blank line in the symbol file has the symbol name,
followed by a number of spaces, followed by the address specified
in hex (without an 0x
prefix).
Additional information is sometimes present on the end of the line
(particularly if overlays are used), but this is ignored.
Linux Kernel symbols
On a Linux computer, the 'proc' filesystem provides a special file called
/proc/ksyms
.
Each line of this file has an address in hex (without an 0x
prefix), followed by a space, followed by the symbol name.
This is the ksyms
symbol table format.
eg: assuming kernel.dat
is a dump of the kernels memory :-
be -Y ksyms -y /proc/ksyms kernel.dat
Note that sometimes the address and symbol are followed by more information. This additional information is ignored.
Linux has a symbol versioning convention whereby it can append a suffix to each symbol. The suffix varies depending upon the type of Linux kernel in use, ie: whether it is SMP or not, or compiled in '2GB mode' or not. BE has the following symbol formats, which strip the indicated suffix of each symbol as it is read :-
BE symbol format | what suffix is stripped |
---|---|
ksyms_R | _Rhhhhhhhh
|
ksyms_Rsmp_ | _Rsmp_hhhhhhhh
|
ksyms_R2gig_ | _R2gig_hhhhhhhh
|
ksyms_Rsmp2gig_ | _Rsmp2gig_hhhhhhhh
|
In the above hhhhhhhh
are lower case hex digits, which
contain the versioning information.
BE allow 8 or 16 digits in the versioning information.
See /usr/src/linux/Rules.make
to understand where these
suffixes come from.
AIX NM
The nm
command on an AIX 4.1 or later machine
generates output which is understood by the aix_nm
symbol
table format.
Typically nm
is invoked with the -e
argument,
so that only external symbols get listed.
Each line has the symbol name,
followed by a symbol type character,
followed by an address
and optionally followed by a length.
Fields are seperated with white space.
Addresses and lengths are 0x
preceeded if they are listed in hex
(this is caused by invoking nm
with the -x
flag).
BE ignores 4 byte type d
data entries from the table,
as these tend to refer to TOC entries.
BE also ignores machine generated symbols which start _$STATIC
.
C++ symbol names are typically listed demangled, and so can contain spaces. BE has quite complicated special logic to handle this.
Note that the symbol values obtained using nm
are actually
offsets from the beginning of the executable.
You'll need to determine where the executable is in memory or the crash
dump memory image, perhaps using the AIX crash
command.
Assuming this base value to be 0xBBBBBBBB
, you would pass the
following options to BE :-
-Y aix_nm -y symbolfile.sym@0xBBBBBBBB
It is not too difficult to write a
memory extension
which accesses AIX kernel memory space by accessing /dev/kmem
.
Hey presto, BE can show live datastructures within the AIX kernel!
link.exe .map
The map
format corresponds the .map
files
written by the 16 bit DOS link.exe
program.
This has a section at the beginning of the file which declares segment names, positions and sizes. BE ignores this.
Next the symbols are listed, ordered by name, and BE ignores this too.
Finally the symbols are listed again, ordered by value. BE reads this data.
Each line is of the form :-
SSSS:OOOO SymbolName
BE enters an entry in the symbol table of value 0xSSSSOOOO
for each symbol.
This works well in conjunction with BEs -g
command line argument.
eg: assuming embedded.map
is the map file from linking
some embedded application, and that dump.dat
is a dump of the
memory starting at physical location 0xf0000 :-
be -Y map -y embedded.map -g dump.dat@0xf0000
NLMs and drivers can be linked using the NetWare or Watcom linkers
and these can both be made to spit out a .map
file.
In the .map
file, symbols are listed with their offset
from the start of the CODE or DATA segment.
In order to know a symbols address we must load the NLM and determine its
CODE and DATA segment base addresses.
These base values can then be added onto the offset values in the
.map
file.
The bases can be determined using the built-in NetWare debugger. Enter it via the Shift+Shift+Alt+Esc sequence, use .m nlmname to get the bases, and g to resume NetWare.
The following options are provided :-
-Y nw_nw_code
.map
file produced by NetWare linker,
and extract and process CODE symbols.
-Y nw_nw_data
.map
file produced by NetWare linker,
and extract and process DATA symbols.
-Y nw_wc_code
.map
file produced by Watcom linker,
and extract and process CODE symbols.
-Y nw_wc_data
.map
file produced by Watcom linker,
and extract and process DATA symbols.
Assuming an NLM had its code based at 0xCCCCCCCC
, and
its data based at 0xDDDDDDDD
, and it was linked with the Watcom
linker, you would use the following BE options :-
-Y nw_wc_code -y nlmname.map@0xCCCCCCCC -Y nw_wc_data -y nlnmame.map@0xDDDDDDDD
Notice how we process the .map
twice - once to get the code
symbols and to relocate them by 0xCCCCCCCC
, and once to get the
data symbols and to relocate them by 0xDDDDDDDD
.
Awkward, but it works, without having to post-process the .map
file output by the linker.
The NetWare linker output has a section which looks like :-
Publics By Address DATA 00005B94 Evan (D:\build\ham\hamdata.c) DATA 00005B98 deviceName (D:\build\ham\hamhacb.c) DATA 00005BA8 hamName (D:\build\ham\hamnlm.c)
It is this section which BE uses.
The Watcom linker output has lines in it of the form :-
CODE:00305678 fhbf CODE:00045678+ symmy DATA:00345678* sym DATA:00345008s symbol
To complete the picture, all that is needed is a BE
memory extension
which allows BE to access the memory space of an NLM.
L80 .SYM
L80 is a linker used on CP/M systems.
If invoked with the /Y
option, it will write a .SYM
file, looking like this :-
31F0 BLKRD 2EC1 DUSER 31C1 EXCNFG 321D EXRD 3232 EXWR 31A5 INITLZ 2010 POC ^Z
Because its a CP/M file, there can be junk after the ^Z
EOF
character.
The file also has CP/M style CR LF at the end of each line.
A sample command telling BE to read a L80 .SYM
file :-
be -A 16 -C z80 -Y l80 -y ALPHA.SYM ALPHA.COM@0x100
The SDCC toolchain can write NoICE .noi
files,
and BE can make an attempt to read them.
The main problem with these files is that the SDCC compiler seems to
create extra _func_begin
and _func_end
, as well
as the normal _func
for a C function called func
.
So BE has a heuristic to strip these out.