/docs/MyDocs

To get this branch, use:
bzr branch http://darksoft.org/webbzr/docs/MyDocs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
Toolset
=======
 Tracers: 
    Function: perf, strace/ltrace
    I/O: fatrace, lsof

 Profilers:
    Function/Tree: perf, valgrind/kcachegrind(slow), google-perftools (low precision)
    Hardware Assisted: perf, oprofile (sysprof/CodeXL)
    Latency: perf
    Heap: valgrind/kcachegrind(slow), Google PerfTools (low precision)

 Tracking of performance-issues:
    General: latencytop
    Schedulling: kernelshark, perf, ps 
    NUMA: numatop
    Memory Fragmentation: TotalView Memoryscape
    
 Debugging:
    Debugger: Slickedit + gdb, TotalView + ReplayEngine (!?)
    Memory Leak Detector: TotalView Memoryscape, valgrind (free), Purify (finds lot)
    Static analyzer: ?

 Monitoring:
    Memory: atop
    Disk I/O: iotop
    Netowrk Usage: nethogs


Accounting & Monitoring
=======================
 Standard top:
 - atop 		- top with disk, network and memory activities, etc. Keeps finished proccesses. Multiple display modes (help with '?')
 - htop 		- Colored version of top with per-processor load
 - pidstat [-t 1]	- like ps, but shows only lately active processes (or if memory view requested, the processes with changed memory usage)

 - iotop 		- top style interface to disk I/O
 - nethogs		- processes consuming network bandwidth

 Advanced per-process statistics:
 - tiptop		- top-style interface to hardware counters
 - numatop		- NUMA locality charactarization (LMA/RMA - Local/Remote memory accesses, CPU - cycles-per-instruction)
 - latencytop		- top-style, lists reasons and maximum time for applications to block (sleep, select, ...). Globally and per-process
 - powertop		- top-style, power usage / wake-ups per second

 Per-device statistics:
 - sysstat/sar		- collects and reports system activity information (large number of characteristics like udp-packets-per-second, number-of-open-inodes, etc.) 
 - vmstat [-Sm 1]	- CPU, virtual memory, and disk (-d) usage
 - dstat		- CPU/Disk/Network usage statistics (similar for vmstat, but easier to interpret)
 - mpstat [-P ALL 1]	- CPU usage (by core, but no process information) 
 - slabtop		- top-style interface to kernel memory (slab) allocations. May be usefull to track I/O buffers, etc.
 - iostat [-xmdz 1]	- I/O statistics like block sizes, IOPS per seconds, etc.
 - iftop		- Shows per-connection network statistics (no proccess information)
 - nicstat [1]		- Per interface network usage statistics

 Per-protocol statistics:
 - cifsiostat		- SMB traffic
 - nfsiostat		- NFS traffic

 Status:
 - ss			- shows number of packets in queue and socket status
 - pcstat		- reports which part of file is currently in disk cache
 - pmr 			- measures pipe bandwidth (cat arc.tar | pmr | tar xf -)

 System Configratuin:
 - dmidecode		- Information about hardware components and BIOS
 - ethtool [-i eth0] 	- Driver and firmware of network interface


Tracers
=======
 - strace		- syscall trace (slower when with perf)
 - ltrace		- function (symbol) trace (along with parameters)
 - debugfs		- sysfs interface to ftrace/kprobes/uprobes i.e. a fast and powerful tracing engine with limited support of triggers 
    + trace_cmd 	- cli to debugfs
    + kernelshark	- GUI part of trace_cmd visualizing the events on CPU cores
 - mutrace		- mutex tracer
 - lttng		- Will trace both kernel and user-space (C + java). Based on large amount of defined static events. Allows to implement new tracepoints in the user-code (out-of-tree kernel modules).
 - sysdig		- Looks like a easy to use interface to system tracing and profilling, but crashes both my test systems (SuSE 13.1 and Tumbleweed). Based on out-of-tree kernel module.
 
 I/O Tracers:
 - lsof			- list of processes holding the specified file / device / socket
 - ftop			- top-style interface listing files open by process
 - blktrace / blkparse	- Tracing of block device. Individual requests with information on the issuing process, cpu, request type, request size, etc.
    + btrace		- is wrapper around blktrace / blkparse
 - fatrace		- traces file accesses on block device (reports accessing process)

 Network Sniffers:
 - ngrep		- a good tool to debug protocols

Profillers
==========
 - oprofile		- Profiler based on the hardware counters
    + AMD CodeXL	- Can be hacked to work with Intel processors as well (but tricky due to incompatible hw counters)
    + sysprof		- Very simple GUI to oprofile
 - perf 		- kernel-level profiller supporting hw-counters, tracer, providing basic instrumentation support (ftrace-based)
    + FlameGraph	- Very nice visualization of callgraph
    + HeatMap		- Visualization of latencies over time
 - perf-tools		- Helper scripts around perf and ftrace (https://github.com/brendangregg/perf-tools.git)
 - msr-tools		- Provides access to the MSR registers, paticularly allowing to configure hardware-counters
 - GNU (gprof / gcov)	- Useless, problems with threaded apps, problms with dlopen...
 - valgrind		- CPU and HEAP profilling including cache & branch simulations, but very slow and, hence, mostly unusable as of 2013
    + kcahcegrind 	- provides graphical tree-style profiller (very convinient)
 - Google Perf Tools	- Low-precision CPU & heap profiller with visualization
 - Intel VTune		- Graphical hardware-based profiller with problematic installation (own driver incompatible with oprofile).
 

Instrumentation
===============
 Generally, everything can be done with tracers and performance counters. we just record all traces along with timestamps and analyze later.
 However, the kernel need to pass the tracepoints to the user-space and enabling many tracepoints will lead to significant performance 
 penalty. Instead it is desirable to allow high-performance filtering of the events and produce kind-of in-kernel summapries or maps (for
 example i/o latency map). This is why we need this instrumentation functionality.
 - brandz		- Framework to Solaris zones allowing execution of Linux apps, hence, enabling dtrace usage for linuxapps.
 - pin 			- Inject arbitrary code (written in C or C++) at arbitrary places in the user-space executable
 - eBPF			- Allows to inject llvm byte-code into the kernel for C-based event filtering and histogram (map) computation. Requires LLVM 3.7 and kernel 4.1+. User-space is missing.
 - systemtap		- The most powerful instrumentation system available for Linux. Based on scripts dynamicly compiled to kernel modules. Hard to use as even provided samples are not always working.
 - ktap			- Is systemtap based on scripting language and byte-code (designed for embedded systems lacking gcc, etc.). Out-of-tree module is required. I guess will be killed by eBPF. 

 

Debugging
=========
 - gdb
 - TotalView 		- Commercial leak detector (MemoryScape) and debugger with Replay Engine

 Kernel debugging:
 - kdb			- built-in kernel debugger
 - kgdb			- remote kernel debugger (gdb based)
 - ksymoops		- convert Oops into readable format

 Memory Tools:
 - valgrind		- Leak detector + profiller 
    + alleyoop 		- Gnome GUI to valgrind memcheck and helgrind.
 - Google Perf Tools	- Malloc debugger + profiller

Fuzzing
=======
 - american fuzzy lop 	- http://lcamtuf.coredump.cx/afl/


Static Analyzers
================
 - Rational suite	- Problems in code (purify), coverage testing (purecov), performance (quantify). Only limited (very) demo is available.
 - Insure++		- Advanced commercial tool to find memory and other problems in the code (no demo)
 - Many free tools apperead meanwhile

Benchmarks
==========
 - perf bench		- Microbenchmarks
 - lmbench		- Recommended microbenchmark, but not very easy to use
 - bandwidth		- Register, L1/L2/L3 and memory benchmark (threaded)
 - ds-memcpy-bench	- Darksoft, test various methods of memcpy
 - fio			- Storage: emulates various workloads, supports AIO, etc.
 - pchar		- Traceroute with measurement of bandwidth and latency at each hop
 - netpipe 		- TCP/Infiniband bandwidth/latency test

Symbols and Segments
====================
 - pmap			- process memory segment map
 - pwdx			- current working directory of a process
 - nm			- list of exported symbols
 - ldd			- list of linked shared libraries
 - elflibviewer		- dependency tree of shared library  (GUI)

Common Tools
============
 - watch <cmd>: Continiously update command display (watch who)
 - pgrep: find processes by various properties