4
Function: perf, strace/ltrace
8
Function/Tree: perf, valgrind/kcachegrind(slow), google-perftools (low precision)
9
Hardware Assisted: perf, oprofile (sysprof/CodeXL)
11
Heap: valgrind/kcachegrind(slow), Google PerfTools (low precision)
13
Tracking of performance-issues:
15
Schedulling: kernelshark, perf, ps
17
Memory Fragmentation: TotalView Memoryscape
20
Debugger: Slickedit + gdb, TotalView + ReplayEngine (!?)
21
Memory Leak Detector: TotalView Memoryscape, valgrind (free), Purify (finds lot)
27
Netowrk Usage: nethogs
30
Accounting & Monitoring
31
=======================
33
- atop - top with disk, network and memory activities, etc. Keeps finished proccesses. Multiple display modes (help with '?')
34
- htop - Colored version of top with per-processor load
35
- pidstat [-t 1] - like ps, but shows only lately active processes (or if memory view requested, the processes with changed memory usage)
37
- iotop - top style interface to disk I/O
38
- nethogs - processes consuming network bandwidth
40
Advanced per-process statistics:
41
- tiptop - top-style interface to hardware counters
42
- numatop - NUMA locality charactarization (LMA/RMA - Local/Remote memory accesses, CPU - cycles-per-instruction)
43
- latencytop - top-style, lists reasons and maximum time for applications to block (sleep, select, ...). Globally and per-process
44
- powertop - top-style, power usage / wake-ups per second
46
Per-device statistics:
47
- sysstat/sar - collects and reports system activity information (large number of characteristics like udp-packets-per-second, number-of-open-inodes, etc.)
48
- vmstat [-Sm 1] - CPU, virtual memory, and disk (-d) usage
49
- dstat - CPU/Disk/Network usage statistics (similar for vmstat, but easier to interpret)
50
- mpstat [-P ALL 1] - CPU usage (by core, but no process information)
51
- slabtop - top-style interface to kernel memory (slab) allocations. May be usefull to track I/O buffers, etc.
52
- iostat [-xmdz 1] - I/O statistics like block sizes, IOPS per seconds, etc.
53
- iftop - Shows per-connection network statistics (no proccess information)
54
- nicstat [1] - Per interface network usage statistics
56
Per-protocol statistics:
57
- cifsiostat - SMB traffic
58
- nfsiostat - NFS traffic
61
- ss - shows number of packets in queue and socket status
62
- pcstat - reports which part of file is currently in disk cache
63
- pmr - measures pipe bandwidth (cat arc.tar | pmr | tar xf -)
66
- dmidecode - Information about hardware components and BIOS
67
- ethtool [-i eth0] - Driver and firmware of network interface
72
- strace - syscall trace (slower when with perf)
73
- ltrace - function (symbol) trace (along with parameters)
74
- debugfs - sysfs interface to ftrace/kprobes/uprobes i.e. a fast and powerful tracing engine with limited support of triggers
75
+ trace_cmd - cli to debugfs
76
+ kernelshark - GUI part of trace_cmd visualizing the events on CPU cores
77
- mutrace - mutex tracer
78
- lttng - Will trace both kernel and user-space (C + java). Based on large amount of defined static events. Allows to implement new tracepoints in the user-code (out-of-tree kernel modules).
79
- sysdig - Looks like a easy to use interface to system tracing and profilling, but crashes both my test systems (SuSE 13.1 and Tumbleweed). Based on out-of-tree kernel module.
82
- lsof - list of processes holding the specified file / device / socket
83
- ftop - top-style interface listing files open by process
84
- blktrace / blkparse - Tracing of block device. Individual requests with information on the issuing process, cpu, request type, request size, etc.
85
+ btrace - is wrapper around blktrace / blkparse
86
- fatrace - traces file accesses on block device (reports accessing process)
89
- ngrep - a good tool to debug protocols
93
- oprofile - Profiler based on the hardware counters
94
+ AMD CodeXL - Can be hacked to work with Intel processors as well (but tricky due to incompatible hw counters)
95
+ sysprof - Very simple GUI to oprofile
96
- perf - kernel-level profiller supporting hw-counters, tracer, providing basic instrumentation support (ftrace-based)
97
+ FlameGraph - Very nice visualization of callgraph
98
+ HeatMap - Visualization of latencies over time
99
- perf-tools - Helper scripts around perf and ftrace (https://github.com/brendangregg/perf-tools.git)
100
- msr-tools - Provides access to the MSR registers, paticularly allowing to configure hardware-counters
101
- GNU (gprof / gcov) - Useless, problems with threaded apps, problms with dlopen...
102
- valgrind - CPU and HEAP profilling including cache & branch simulations, but very slow and, hence, mostly unusable as of 2013
103
+ kcahcegrind - provides graphical tree-style profiller (very convinient)
104
- Google Perf Tools - Low-precision CPU & heap profiller with visualization
105
- Intel VTune - Graphical hardware-based profiller with problematic installation (own driver incompatible with oprofile).
110
Generally, everything can be done with tracers and performance counters. we just record all traces along with timestamps and analyze later.
111
However, the kernel need to pass the tracepoints to the user-space and enabling many tracepoints will lead to significant performance
112
penalty. Instead it is desirable to allow high-performance filtering of the events and produce kind-of in-kernel summapries or maps (for
113
example i/o latency map). This is why we need this instrumentation functionality.
114
- brandz - Framework to Solaris zones allowing execution of Linux apps, hence, enabling dtrace usage for linuxapps.
115
- pin - Inject arbitrary code (written in C or C++) at arbitrary places in the user-space executable
116
- eBPF - Allows to inject llvm byte-code into the kernel for C-based event filtering and histogram (map) computation. Requires LLVM 3.7 and kernel 4.1+. User-space is missing.
117
- systemtap - The most powerful instrumentation system available for Linux. Based on scripts dynamicly compiled to kernel modules. Hard to use as even provided samples are not always working.
118
- ktap - Is systemtap based on scripting language and byte-code (designed for embedded systems lacking gcc, etc.). Out-of-tree module is required. I guess will be killed by eBPF.
125
- TotalView - Commercial leak detector (MemoryScape) and debugger with Replay Engine
128
- kdb - built-in kernel debugger
129
- kgdb - remote kernel debugger (gdb based)
130
- ksymoops - convert Oops into readable format
133
- valgrind - Leak detector + profiller
134
+ alleyoop - Gnome GUI to valgrind memcheck and helgrind.
135
- Google Perf Tools - Malloc debugger + profiller
139
- Rational suite - Problems in code (purify), coverage testing (purecov), performance (quantify). Only limited (very) demo is available.
140
- Insure++ - Advanced commercial tool to find memory and other problems in the code (no demo)
141
- Many free tools apperead meanwhile
145
- perf bench - Microbenchmarks
146
- lmbench - Recommended microbenchmark, but not very easy to use
147
- bandwidth - Register, L1/L2/L3 and memory benchmark (threaded)
148
- ds-memcpy-bench - Darksoft, test various methods of memcpy
149
- fio - Storage: emulates various workloads, supports AIO, etc.
150
- pchar - Traceroute with measurement of bandwidth and latency at each hop
151
- netpipe - TCP/Infiniband bandwidth/latency test
155
- pmap - process memory segment map
156
- pwdx - current working directory of a process
157
- nm - list of exported symbols
158
- ldd - list of linked shared libraries
159
- elflibviewer - dependency tree of shared library (GUI)
163
- watch <cmd>: Continiously update command display (watch who)
164
- pgrep: find processes by various properties