/docs/MyDocs

To get this branch, use:
bzr branch http://darksoft.org/webbzr/docs/MyDocs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
 Executable files typically contain a file header at or near the start of the 
file. This header contains 'magic numbers' that identify the file type. Beyond 
this header, executable files are typically divided into SECTIONS. Each section
is characterized by name, permissions (RWX), size, file offset, and virtual 
address (VMA).

 Standard Sections
 -----------------
  .text or _TEXT 	R-X	CODE, const globals, large literals
  .rodata 		R--	const globals, large literals
  .data or _DATA	RW-	initialized globals and static locals
  .bss or _BSS		RW-	uninitialized globals and static locals

* Large literals - are constant values too large to be handled conveniently 
  with immediate addressing, such as string literals and constant structures.
* The C language requires that the BSS be zeroed before main() is called. 
  Because every variable in the BSS has the same initial value (zero), only 
  the BSS size is stored in the executable file.

Executable Formats
==================
 * a.out (DJPP a.out, Linux a.out)
 * COFF (DJGPP COFF, coff-m68k, sh-coff)
 * Win32 PE COFF
 * ELF

 ELF (Executable and Linking Format)
 ===
  - ELF files: object files, shared libraries, executables 
  - A loader for ELF executables should find and load the program headers (also 
  called SEGMENTS, not the sections). 
  - Each program header has two different size values: size-on-disk (file size)
  and size-in-memory (memory size). If the memory size is greater than the file
  size, the program header contains the BSS, and the extra memory should be 
  zeroed.


  Relocations
  -----------
  - If elf object have external references (library function calls, etc.), it
  is impossible for assembler to find actual addresses of these functions to
  produce calls.
  - The assemble produces relocations in this case, it contains
    + Index into the symbol table
    + An offset into the .text section, which refers to the address of the 
    operand of the call instruction.
    + A tag which indicates what type of relocation is actually present.
  - Linker processes all relocations, finds actual addresses and patches them
  back into the operands of call instructions.
    
  Shared Libraries
  ----------------
  - ELF Shared Libraries are resolving symbols and externals at run time.
    * This is performed with help of symbol table and list of relocations (i.e.
    linking is performed in run-time)

  - ELF Shared Libraries are position independent (this means that you can load
  them more or less anywhere in memory, and they will work). This could be 
  achieved in two ways.
    1. Using rellocations. Even for symbols (global variables / functions)
    local to shared library rellocations are generated. This results in 
    incredible amount of rellocations which should be performed during 
    library load.

    2. The library is compiled to lookup all local symbols in GOT (Global 
    Offset Table) and PLT (Procedure Linkage Table) tables. 
    
    * The GOT is a table of pointers: one pointer for each global variable and
    function (functions are handled differently, see below) used in the shared 
    library.
     + On library load we would need only to fill GOT table (a single 
     relocation per a global variable).
     + Start of GOT is always pointed by one of machine registers (%ebx on the 
     i386).
     + Benchmark indicates that for most normal programs the drop in 
     performance is less than 3% for a worst case.
     + To achiev that all shared library sources should be compiled with 
     '-fPIC' flags set.

    * PLT is an array of jump instructions, one for each existing function.
    Thus if a particular function is called from thousands of locations within 
    the shared library, control will always pass through one jump instruction.
     + Actually, the PLT gets addresses for jump instructions from GOT table
     + If application linked with shared library has its own instance of a 
     function defined in the shared library, it can set appropriate address
     in GOT table and the code of shared library will utilize this redefined 
     version as well.
     + Consider lazy symbol bindings, below

  Lazy symbol binding
  -------------------      
  - By default the .plt entries are all initialized by the linker not to 
  point to the correct target functions, but instead to point to the 
  dynamic loader itself. Thus, the first time you call any given function,
  the dynamic loader looks up the function and fixes entry in .plt
    + Set 'LD_BIND_NOW=1' to avoild lazy binding

  ELF Object
  ----------
    - Header (Elf32_Ehdr in /usr/include/linux/elf.h, readelf -h object.o)
	+ magic number
	+ class (ELF32|ELF64), type (EXEC|REL|DYN), machine(X86-64|80386), 
	version (1)
	    EXEC: executable file
	    REL: reloctable file
	    DYN: shared object file
	+ Entry point address (for EXEC, DYN, 0 - for objects)
	+ start, number, and size of program headers
	+ start, number, and size of section headers
	+ index of section (.shstrtab) containing section names
    - table of section headers (Elf32_Shdr, readelf -S)
	+ section offset in the file
	+ address in virtual memory where this section should be loaded (if 0
	the section will not be loaded to virtual memory)
    - table of program headers (readelf -l)
	+ distilation of section headers table needed to load appropriate 
	sections of executable into the virtual memory
	+ Type, Offset, VirtAddr, PhysAddr, FileSiz, MemSize, Flags, Align 
	
    - sections 
	+ objdump -d -j <section name>	- Disassemble code sections
	+ objdump -s -j <section name>	- Show section content

    - symbol table (could be partily ripped to optimize executable size)
	List of all symbols defined or referenced in file:
	    program entry points
	    addresses of variables
	Format
	    symbol
	    address associated with symbol
	    tag indicating type of symbol

  Sections
  --------
    .shstrtab		- List of section names
    .interp		- The name of dynamic loader
    .dynamic		- contains distilation from the section headers needed
			by dynamic loader to do a job (optimization to save on
			parsing of actual headers)

    .hash		- Hash table for navigating .dynsym
    .dynsym		- Dynamic Symbol Table?
    .dynstr		- Dynamic String Table?
    
    .rel* -  One ore more relocation sections (readelf -r)
	offset(in memory?)
	type (R_X86_64_GLOB_DAT | R_X86_64_JUMP_SLO)
	Symbol Value (could be 0 if not known)
	Symbol Name
	addend (Offset from symbol?)
    
	
    .got		- Global Object Table
    .plt		- Procedure Linkage Table

Loading
=======
 Loader:    
    - Locates the .text section within the executable, loads it into the 
    appropriate portions of virtual memory, and marks these pages as read-only.
    - Locates the .data section in the executable and loads it into the user's 
    address space, this time in read-write memory. 
    - Finds the location and size of the .bss section from the image header, 
    and adds the appropriate pages of memory to the user's address space. 
    - if the application is linked to a shared library, the name of dynamic
    linker is obtained from executable. The kernel than transfers control
    to the dynamic linker, not application.
    - The dynamic loader is initializing itself, loading the shared libraries
    into memory, resolving remaining relocations and then transfering control
    to application.


 ?
 application sections
 .bss
 HEAP
 ...
 MMAP
 STACK
 shared libraries (growing up)
 kernel space (fixed size)

Tools
=====
 nm 			- list symbols from object files
 ldd 			- shared library dependencies
 readelf <opts> <obj|lib|app>
    -s			- symbol table
    -S			- list of sections (obj)
    --segments		- list of sections (obj), segments (elf)
    
 objdump <opts> <obj|lib|app>
    --private-headers	- Print elf headers
 strings		- Reads all human readable strings from object file

 pmap <pid>		- memory map of application

 strace			- traces system calls application executes
 ltrace			- traces function calls (better version of strace)
 ipcs			- report interprocess communication facilities status