Toolset ======= Tracers: Function: perf, strace/ltrace I/O: fatrace, lsof Profilers: Function/Tree: perf, valgrind/kcachegrind(slow), google-perftools (low precision) Hardware Assisted: perf, oprofile (sysprof/CodeXL) Latency: perf Heap: valgrind/kcachegrind(slow), Google PerfTools (low precision) Tracking of performance-issues: General: latencytop Schedulling: kernelshark, perf, ps NUMA: numatop Memory Fragmentation: TotalView Memoryscape Debugging: Debugger: Slickedit + gdb, TotalView + ReplayEngine (!?) Memory Leak Detector: TotalView Memoryscape, valgrind (free), Purify (finds lot) Static analyzer: ? Monitoring: Memory: atop Disk I/O: iotop Netowrk Usage: nethogs Accounting & Monitoring ======================= Standard top: - atop - top with disk, network and memory activities, etc. Keeps finished proccesses. Multiple display modes (help with '?') - htop - Colored version of top with per-processor load - pidstat [-t 1] - like ps, but shows only lately active processes (or if memory view requested, the processes with changed memory usage) - iotop - top style interface to disk I/O - nethogs - processes consuming network bandwidth Advanced per-process statistics: - tiptop - top-style interface to hardware counters - numatop - NUMA locality charactarization (LMA/RMA - Local/Remote memory accesses, CPU - cycles-per-instruction) - latencytop - top-style, lists reasons and maximum time for applications to block (sleep, select, ...). Globally and per-process - powertop - top-style, power usage / wake-ups per second Per-device statistics: - sysstat/sar - collects and reports system activity information (large number of characteristics like udp-packets-per-second, number-of-open-inodes, etc.) - vmstat [-Sm 1] - CPU, virtual memory, and disk (-d) usage - dstat - CPU/Disk/Network usage statistics (similar for vmstat, but easier to interpret) - mpstat [-P ALL 1] - CPU usage (by core, but no process information) - slabtop - top-style interface to kernel memory (slab) allocations. May be usefull to track I/O buffers, etc. - iostat [-xmdz 1] - I/O statistics like block sizes, IOPS per seconds, etc. - iftop - Shows per-connection network statistics (no proccess information) - nicstat [1] - Per interface network usage statistics Per-protocol statistics: - cifsiostat - SMB traffic - nfsiostat - NFS traffic Status: - ss - shows number of packets in queue and socket status - pcstat - reports which part of file is currently in disk cache - pmr - measures pipe bandwidth (cat arc.tar | pmr | tar xf -) System Configratuin: - dmidecode - Information about hardware components and BIOS - ethtool [-i eth0] - Driver and firmware of network interface Tracers ======= - strace - syscall trace (slower when with perf) - ltrace - function (symbol) trace (along with parameters) - debugfs - sysfs interface to ftrace/kprobes/uprobes i.e. a fast and powerful tracing engine with limited support of triggers + trace_cmd - cli to debugfs + kernelshark - GUI part of trace_cmd visualizing the events on CPU cores - mutrace - mutex tracer - lttng - Will trace both kernel and user-space (C + java). Based on large amount of defined static events. Allows to implement new tracepoints in the user-code (out-of-tree kernel modules). - sysdig - Looks like a easy to use interface to system tracing and profilling, but crashes both my test systems (SuSE 13.1 and Tumbleweed). Based on out-of-tree kernel module. I/O Tracers: - lsof - list of processes holding the specified file / device / socket - ftop - top-style interface listing files open by process - blktrace / blkparse - Tracing of block device. Individual requests with information on the issuing process, cpu, request type, request size, etc. + btrace - is wrapper around blktrace / blkparse - fatrace - traces file accesses on block device (reports accessing process) Network Sniffers: - ngrep - a good tool to debug protocols Profillers ========== - oprofile - Profiler based on the hardware counters + AMD CodeXL - Can be hacked to work with Intel processors as well (but tricky due to incompatible hw counters) + sysprof - Very simple GUI to oprofile - perf - kernel-level profiller supporting hw-counters, tracer, providing basic instrumentation support (ftrace-based) + FlameGraph - Very nice visualization of callgraph + HeatMap - Visualization of latencies over time - perf-tools - Helper scripts around perf and ftrace (https://github.com/brendangregg/perf-tools.git) - msr-tools - Provides access to the MSR registers, paticularly allowing to configure hardware-counters - GNU (gprof / gcov) - Useless, problems with threaded apps, problms with dlopen... - valgrind - CPU and HEAP profilling including cache & branch simulations, but very slow and, hence, mostly unusable as of 2013 + kcahcegrind - provides graphical tree-style profiller (very convinient) - Google Perf Tools - Low-precision CPU & heap profiller with visualization - Intel VTune - Graphical hardware-based profiller with problematic installation (own driver incompatible with oprofile). Instrumentation =============== Generally, everything can be done with tracers and performance counters. we just record all traces along with timestamps and analyze later. However, the kernel need to pass the tracepoints to the user-space and enabling many tracepoints will lead to significant performance penalty. Instead it is desirable to allow high-performance filtering of the events and produce kind-of in-kernel summapries or maps (for example i/o latency map). This is why we need this instrumentation functionality. - brandz - Framework to Solaris zones allowing execution of Linux apps, hence, enabling dtrace usage for linuxapps. - pin - Inject arbitrary code (written in C or C++) at arbitrary places in the user-space executable - eBPF - Allows to inject llvm byte-code into the kernel for C-based event filtering and histogram (map) computation. Requires LLVM 3.7 and kernel 4.1+. User-space is missing. - systemtap - The most powerful instrumentation system available for Linux. Based on scripts dynamicly compiled to kernel modules. Hard to use as even provided samples are not always working. - ktap - Is systemtap based on scripting language and byte-code (designed for embedded systems lacking gcc, etc.). Out-of-tree module is required. I guess will be killed by eBPF. Debugging ========= - gdb - TotalView - Commercial leak detector (MemoryScape) and debugger with Replay Engine Kernel debugging: - kdb - built-in kernel debugger - kgdb - remote kernel debugger (gdb based) - ksymoops - convert Oops into readable format Memory Tools: - valgrind - Leak detector + profiller + alleyoop - Gnome GUI to valgrind memcheck and helgrind. - Google Perf Tools - Malloc debugger + profiller Fuzzing ======= - american fuzzy lop - http://lcamtuf.coredump.cx/afl/ Static Analyzers ================ - Rational suite - Problems in code (purify), coverage testing (purecov), performance (quantify). Only limited (very) demo is available. - Insure++ - Advanced commercial tool to find memory and other problems in the code (no demo) - Many free tools apperead meanwhile Benchmarks ========== - perf bench - Microbenchmarks - lmbench - Recommended microbenchmark, but not very easy to use - bandwidth - Register, L1/L2/L3 and memory benchmark (threaded) - ds-memcpy-bench - Darksoft, test various methods of memcpy - fio - Storage: emulates various workloads, supports AIO, etc. - pchar - Traceroute with measurement of bandwidth and latency at each hop - netpipe - TCP/Infiniband bandwidth/latency test Symbols and Segments ==================== - pmap - process memory segment map - pwdx - current working directory of a process - nm - list of exported symbols - ldd - list of linked shared libraries - elflibviewer - dependency tree of shared library (GUI) Common Tools ============ - watch : Continiously update command display (watch who) - pgrep: find processes by various properties