Symbol names

Running a Clean program in gdb, or inspecting its symbols in objdump, you will realise that Clean function names have been mangled to escape special characters and duplicate names. For example, you may find the symbol e____SystemEnumStrict__s__from__s_I24. To demangle the name, we first have to unescape it to get the ABC symbol name, and then we have to read the ABC code to find the Clean function.

From symbol names to ABC symbols

To find the ABC symbol name related to this, we can unescape the name. The escape function is defined in try_parse_label in cginput.c. Underscores are used for escaping, with _N adding a second escape layer. We have the following sequences:

  • _A: + (add)
  • _B: ` (backtick)
  • _C: : (colon)
  • _D: / (divide)
  • _E: = (equal)
  • _G: > (greater)
  • _H: # (hashtag)
  • _I: ;
  • _L: < (lesser)
  • _M: * (multiply)
  • _NA: & (ampersand)
  • _NB: \ (backslash)
  • _NC: ^ (caret)
  • _ND: $ (dollar)
  • _NE: ! (exclamation)
  • _NP: % (percent)
  • _NQ: " (quote)
  • _NS: ' (single quote)
  • _NT: @
  • _O: | (or)
  • _P: . (period)
  • _Q: ? (question)
  • _S: - (subtract)
  • _T: ~ (tilde)
  • __: _

From ABC symbols to Clean functions

To find the Clean function belonging to an ABC symbol, first find the definition of the label.

If you have compiled with profiling information, look for a .pb directive above the label. This directive often contains a more human-readable name than the label itself (e.g., <case>[line:10];9;2 instead of s3).

Without such information, the easiest is to figure out the call graph:

  • Look which functions are called from, or which thunks are build by, the label. If a label s3 builds a e_StdList_nreverse thunk, look for usages of reverse in the Clean source file.
  • If there are no clear outgoing links, try to find incoming links by searching for usages of the label, and repeat. For instance, you may find s3 being used in s5, and repeat the process with s5.

By splitting up functions with long # sequences into multiple functions you can create more symbol names and make it easier to find locations.

Also, exporting functions can make it easier to read the ABC code as the names of exported functions are more readable.