Symbol names¶
Running a Clean program in gdb
, or inspecting its symbols in objdump
, you
will realise that Clean function names have been mangled to escape special
characters and duplicate names. For example, you may find the symbol
e____SystemEnumStrict__s__from__s_I24
. To demangle the name, we first have to
unescape it to get the ABC symbol name, and then we have to read the ABC code
to find the Clean function.
From symbol names to ABC symbols¶
To find the ABC symbol name related to this, we can unescape the name. The
escape function is defined in try_parse_label
in
cginput.c
.
Underscores are used for escaping, with _N
adding a second escape layer. We
have the following sequences:
_A
:+
(add)_B
:`
(backtick)_C
::
(colon)_D
:/
(divide)_E
:=
(equal)_G
:>
(greater)_H
:#
(hashtag)_I
:;
_L
:<
(lesser)_M
:*
(multiply)_NA
:&
(ampersand)_NB
:\
(backslash)_NC
:^
(caret)_ND
:$
(dollar)_NE
:!
(exclamation)_NP
:%
(percent)_NQ
:"
(quote)_NS
:'
(single quote)_NT
:@
_O
:|
(or)_P
:.
(period)_Q
:?
(question)_S
:-
(subtract)_T
:~
(tilde)__
:_
From ABC symbols to Clean functions¶
To find the Clean function belonging to an ABC symbol, first find the definition of the label.
If you have compiled with profiling information, look for a .pb
directive
above the label. This directive often contains a more human-readable name than
the label itself (e.g., <case>[line:10];9;2
instead of s3
).
Without such information, the easiest is to figure out the call graph:
- Look which functions are called from, or which thunks are build by, the
label. If a label
s3
builds ae_StdList_nreverse
thunk, look for usages ofreverse
in the Clean source file. - If there are no clear outgoing links, try to find incoming links by searching
for usages of the label, and repeat. For instance, you may find
s3
being used ins5
, and repeat the process withs5
.
By splitting up functions with long #
sequences into multiple functions you
can create more symbol names and make it easier to find locations.
Also, exporting functions can make it easier to read the ABC code as the names of exported functions are more readable.