1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
|
Executable files typically contain a file header at or near the start of the
file. This header contains 'magic numbers' that identify the file type. Beyond
this header, executable files are typically divided into SECTIONS. Each section
is characterized by name, permissions (RWX), size, file offset, and virtual
address (VMA).
Standard Sections
-----------------
.text or _TEXT R-X CODE, const globals, large literals
.rodata R-- const globals, large literals
.data or _DATA RW- initialized globals and static locals
.bss or _BSS RW- uninitialized globals and static locals
* Large literals - are constant values too large to be handled conveniently
with immediate addressing, such as string literals and constant structures.
* The C language requires that the BSS be zeroed before main() is called.
Because every variable in the BSS has the same initial value (zero), only
the BSS size is stored in the executable file.
Executable Formats
==================
* a.out (DJPP a.out, Linux a.out)
* COFF (DJGPP COFF, coff-m68k, sh-coff)
* Win32 PE COFF
* ELF
ELF (Executable and Linking Format)
===
- ELF files: object files, shared libraries, executables
- A loader for ELF executables should find and load the program headers (also
called SEGMENTS, not the sections).
- Each program header has two different size values: size-on-disk (file size)
and size-in-memory (memory size). If the memory size is greater than the file
size, the program header contains the BSS, and the extra memory should be
zeroed.
Relocations
-----------
- If elf object have external references (library function calls, etc.), it
is impossible for assembler to find actual addresses of these functions to
produce calls.
- The assemble produces relocations in this case, it contains
+ Index into the symbol table
+ An offset into the .text section, which refers to the address of the
operand of the call instruction.
+ A tag which indicates what type of relocation is actually present.
- Linker processes all relocations, finds actual addresses and patches them
back into the operands of call instructions.
Shared Libraries
----------------
- ELF Shared Libraries are resolving symbols and externals at run time.
* This is performed with help of symbol table and list of relocations (i.e.
linking is performed in run-time)
- ELF Shared Libraries are position independent (this means that you can load
them more or less anywhere in memory, and they will work). This could be
achieved in two ways.
1. Using rellocations. Even for symbols (global variables / functions)
local to shared library rellocations are generated. This results in
incredible amount of rellocations which should be performed during
library load.
2. The library is compiled to lookup all local symbols in GOT (Global
Offset Table) and PLT (Procedure Linkage Table) tables.
* The GOT is a table of pointers: one pointer for each global variable and
function (functions are handled differently, see below) used in the shared
library.
+ On library load we would need only to fill GOT table (a single
relocation per a global variable).
+ Start of GOT is always pointed by one of machine registers (%ebx on the
i386).
+ Benchmark indicates that for most normal programs the drop in
performance is less than 3% for a worst case.
+ To achiev that all shared library sources should be compiled with
'-fPIC' flags set.
* PLT is an array of jump instructions, one for each existing function.
Thus if a particular function is called from thousands of locations within
the shared library, control will always pass through one jump instruction.
+ Actually, the PLT gets addresses for jump instructions from GOT table
+ If application linked with shared library has its own instance of a
function defined in the shared library, it can set appropriate address
in GOT table and the code of shared library will utilize this redefined
version as well.
+ Consider lazy symbol bindings, below
Lazy symbol binding
-------------------
- By default the .plt entries are all initialized by the linker not to
point to the correct target functions, but instead to point to the
dynamic loader itself. Thus, the first time you call any given function,
the dynamic loader looks up the function and fixes entry in .plt
+ Set 'LD_BIND_NOW=1' to avoild lazy binding
ELF Object
----------
- Header (Elf32_Ehdr in /usr/include/linux/elf.h, readelf -h object.o)
+ magic number
+ class (ELF32|ELF64), type (EXEC|REL|DYN), machine(X86-64|80386),
version (1)
EXEC: executable file
REL: reloctable file
DYN: shared object file
+ Entry point address (for EXEC, DYN, 0 - for objects)
+ start, number, and size of program headers
+ start, number, and size of section headers
+ index of section (.shstrtab) containing section names
- table of section headers (Elf32_Shdr, readelf -S)
+ section offset in the file
+ address in virtual memory where this section should be loaded (if 0
the section will not be loaded to virtual memory)
- table of program headers (readelf -l)
+ distilation of section headers table needed to load appropriate
sections of executable into the virtual memory
+ Type, Offset, VirtAddr, PhysAddr, FileSiz, MemSize, Flags, Align
- sections
+ objdump -d -j <section name> - Disassemble code sections
+ objdump -s -j <section name> - Show section content
- symbol table (could be partily ripped to optimize executable size)
List of all symbols defined or referenced in file:
program entry points
addresses of variables
Format
symbol
address associated with symbol
tag indicating type of symbol
Sections
--------
.shstrtab - List of section names
.interp - The name of dynamic loader
.dynamic - contains distilation from the section headers needed
by dynamic loader to do a job (optimization to save on
parsing of actual headers)
.hash - Hash table for navigating .dynsym
.dynsym - Dynamic Symbol Table?
.dynstr - Dynamic String Table?
.rel* - One ore more relocation sections (readelf -r)
offset(in memory?)
type (R_X86_64_GLOB_DAT | R_X86_64_JUMP_SLO)
Symbol Value (could be 0 if not known)
Symbol Name
addend (Offset from symbol?)
.got - Global Object Table
.plt - Procedure Linkage Table
Loading
=======
Loader:
- Locates the .text section within the executable, loads it into the
appropriate portions of virtual memory, and marks these pages as read-only.
- Locates the .data section in the executable and loads it into the user's
address space, this time in read-write memory.
- Finds the location and size of the .bss section from the image header,
and adds the appropriate pages of memory to the user's address space.
- if the application is linked to a shared library, the name of dynamic
linker is obtained from executable. The kernel than transfers control
to the dynamic linker, not application.
- The dynamic loader is initializing itself, loading the shared libraries
into memory, resolving remaining relocations and then transfering control
to application.
?
application sections
.bss
HEAP
...
MMAP
STACK
shared libraries (growing up)
kernel space (fixed size)
Tools
=====
nm - list symbols from object files
ldd - shared library dependencies
readelf <opts> <obj|lib|app>
-s - symbol table
-S - list of sections (obj)
--segments - list of sections (obj), segments (elf)
objdump <opts> <obj|lib|app>
--private-headers - Print elf headers
strings - Reads all human readable strings from object file
pmap <pid> - memory map of application
strace - traces system calls application executes
ltrace - traces function calls (better version of strace)
ipcs - report interprocess communication facilities status
|