/docs/MyDocs : revision 16

To get this branch, use:

bzr branch
http://darksoft.org/webbzr/docs/MyDocs

« back to all changes in this revision

Viewing changes to Analysis/profiling/perf.txt

Committer: Suren A. Chilingaryan
Date: 2015-08-21 03:52:00 UTC
Revision ID: csa@suren.me-20150821035200-xu1zh22cqlk2omcq

Profiling

files added:
Administration/Linux/system/storage/softraid/chunk_size.txt

Administration/Platforms/deb

Administration/Platforms/deb/key-management.txt

Administration/Platforms/portage

Administration/Platforms/portage/automake.txt

Administration/Platforms/portage/bugs.txt

Administration/Platforms/portage/catalyst.txt

Administration/Platforms/portage/crossdev.txt

Administration/Platforms/portage/gentoo.txt

Administration/Platforms/portage/packages.txt

Administration/Platforms/suse/problems.txt

Analysis

Analysis/Linux-analysis-tools-Im-Programmer.jpg

Analysis/accounting

Analysis/accounting/network.txt

Analysis/accounting/pstop.txt

Analysis/accounting/tiptop.txt

Analysis/accounting/traffic.mht

Analysis/analysis

Analysis/analysis/Runtime.errors.pdf

Analysis/analysis/rational.txt

Analysis/benchmark

Analysis/benchmark/caching.txt

Analysis/benchmark/cuda.txt

Analysis/benchmark/matlab.txt

Analysis/choosing-a-linux-tracer.html

Analysis/debugger

Analysis/debugger/gdb

Analysis/debugger/gdb/automation.txt

Analysis/debugger/gdb/expecting.txt

Analysis/debugger/gdb/flow.txt

Analysis/debugger/gdb/gdb.txt

Analysis/debugger/gdb/remote.txt

Analysis/debugger/gdb/signals.txt

Analysis/debugger/gdb/threads.txt

Analysis/debugger/gdb/ui.txt

Analysis/debugger/info.txt

Analysis/debugger/kernel

Analysis/debugger/kernel/using-gdb-for-debugging-kernel-modules.html

Analysis/debugger/malloc

Analysis/debugger/malloc/google.txt

Analysis/debugger/malloc/valgrind.txt

Analysis/debugger/totalview.txt

Analysis/gpu.txt

Analysis/instrumentation

Analysis/instrumentation/dtrace

Analysis/instrumentation/dtrace/dtrace.txt

Analysis/instrumentation/dtrace/linux_zones

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux.html

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux_files

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux_files/feed-12x.gif

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux_files/ga.js

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux_files/metrics_group1.js

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux_files/pacifica-custom.css

Analysis/instrumentation/dtrace/linux_zones/dtrace_for_linux_files/s_code_remote.js

Analysis/instrumentation/ebpf

Analysis/instrumentation/ebpf/ebpf.txt

Analysis/instrumentation/ebpf/samples.tar.bz2

Analysis/instrumentation/pin.txt

Analysis/instrumentation/systap

Analysis/instrumentation/systap/examples

Analysis/instrumentation/systap/examples/io.stp

Analysis/instrumentation/systap/examples/pcitool.stp

Analysis/instrumentation/systap/examples/socket_trace.stp

Analysis/instrumentation/systap/modules.txt

Analysis/instrumentation/systap/probes.txt

Analysis/list.txt

Analysis/practice

Analysis/practice/trace-readahead-with-database-io.html

Analysis/profiling

Analysis/profiling/gcc

Analysis/profiling/gcc/gcov.txt

Analysis/profiling/gcc/gnu.txt

Analysis/profiling/gcc/gprof.txt

Analysis/profiling/langs

Analysis/profiling/langs/python.txt

Analysis/profiling/oprofile.txt

Analysis/profiling/outdated

Analysis/profiling/outdated/_list_.txt

Analysis/profiling/outdated/codeanalyst.txt

Analysis/profiling/outdated/perfmon2.txt

Analysis/profiling/outdated/perfsuite.txt

Analysis/profiling/perf-tools.txt

Analysis/profiling/perf.html

Analysis/profiling/perf.txt

Analysis/profiling/techniques

Analysis/profiling/techniques/cache-misses-intel.html

Analysis/profiling/vtune.txt

Analysis/tools

Analysis/tools/segments.txt

Analysis/tracers

Analysis/tracers/blktrace.txt

Analysis/tracers/debugfs.txt

Analysis/tracers/net

Analysis/tracers/net/ngrep.txt

Development/languages/bash/sed.txt

Development/peformance

Development/peformance/memory.txt

files removed:
Administration/Linux/packages

Administration/Linux/packages/deb

Administration/Linux/packages/deb/key-management.txt

Administration/Linux/packages/portage

Administration/Linux/packages/portage/automake.txt

Administration/Linux/packages/portage/bugs.txt

Administration/Linux/packages/portage/catalyst.txt

Administration/Linux/packages/portage/crossdev.txt

Administration/Linux/packages/portage/gentoo.txt

Administration/Linux/packages/portage/packages.txt

Administration/Linux/system/accounting

Administration/Linux/system/accounting/bios.txt

Administration/Linux/system/accounting/network

Administration/Linux/system/accounting/network/info.txt

Administration/Linux/system/accounting/network/ngrep.txt

Administration/Linux/system/accounting/network/traffic.mht

Administration/Linux/system/accounting/network/traffic.txt

Administration/Linux/system/accounting/ps.txt

Administration/Linux/system/accounting/ulimit.txt

Development/debugging

Development/debugging/analysis

Development/debugging/analysis/rational.txt

Development/debugging/benchmarking

Development/debugging/benchmarking/caching.txt

Development/debugging/benchmarking/cuda.txt

Development/debugging/benchmarking/matlab.txt

Development/debugging/debuggers.txt

Development/debugging/gdb

Development/debugging/gdb/automation.txt

Development/debugging/gdb/expecting.txt

Development/debugging/gdb/flow.txt

Development/debugging/gdb/gdb.txt

Development/debugging/gdb/remote.txt

Development/debugging/gdb/signals.txt

Development/debugging/gdb/threads.txt

Development/debugging/gdb/ui.txt

Development/debugging/gpu.txt

Development/debugging/kernel

Development/debugging/kernel/using-gdb-for-debugging-kernel-modules.html

Development/debugging/profilers.txt

Development/debugging/profiling

Development/debugging/profiling/gcc

Development/debugging/profiling/gcc/gcov.txt

Development/debugging/profiling/gcc/gprof.txt

Development/debugging/profiling/google.txt

Development/debugging/profiling/hw-counters

Development/debugging/profiling/hw-counters/cache-misses-intel.html

Development/debugging/profiling/memory.txt

Development/debugging/profiling/oprofile.txt

Development/debugging/profiling/outdated

Development/debugging/profiling/outdated/_list_.txt

Development/debugging/profiling/outdated/codeanalyst.txt

Development/debugging/profiling/outdated/perfmon2.txt

Development/debugging/profiling/outdated/perfsuite.txt

Development/debugging/profiling/perf.txt

Development/debugging/profiling/python.txt

Development/debugging/profiling/valgrind.txt

Development/debugging/profiling/vtune.txt

Development/debugging/tracers

Development/debugging/tracers.txt

Development/debugging/tracers/blktrace.txt

Development/debugging/tracers/debugfs.txt

Development/debugging/tracers/dtrace

Development/debugging/tracers/dtrace/dtrace.txt

Development/debugging/tracers/dtrace/linux_zones

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux.html

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux_files

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux_files/feed-12x.gif

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux_files/ga.js

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux_files/metrics_group1.js

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux_files/pacifica-custom.css

Development/debugging/tracers/dtrace/linux_zones/dtrace_for_linux_files/s_code_remote.js

Development/debugging/tracers/systap

Development/debugging/tracers/systap/examples

Development/debugging/tracers/systap/examples/io.stp

Development/debugging/tracers/systap/modules.txt

Development/debugging/validating

Development/debugging/validating/Runtime.errors.pdf

Show diffs side-by-side

added added

removed removed

Analysis/profiling/perf.txt

Events

======

perf list

Operation modes

===============

perf stat Collects statistics

perf record Record events

perf report/annotate Report the recorded events

perf diff Difference between 2 saved profiles

perf probe Dynamic tracepoints on kernel and user-space code

perf sched Schedulling profile

perf mem/kmem Kernel memory allocation and access

perf lock Spinlocks (require CONFIG_LOCKDEP and CONFIG_LOCK_STAT which are not enabled by default in SuSE)

perf trace Simple strace-style trace

perf timechart Visualization

perf inject, kvm, ...

Symbols / Callgraph

===================

- Compile everything with -g -gdwarf-2 -g3 -fno-omit-frame-pointer

* frame pointers are needed to resolve callgraph

* dwarf is a workaround for missing frame pointers using libunwind

* Haswell provides LBR (Last Branch Record) facility which can be used by recent perf to resolve callgraph (not tested)

- We also need frame pointers to resolve callgraph's

* CONFIG_FRAME_POINTER=y (not set on SuSE)

- Sometimes, the following may help with resolution of symbols in kernel

echo 0 > /proc/sys/kernel/kptr_restrict

- Perf fails to find modules in /lib/modules/kernel-*/extra

* Temporarily move them into the /lib/modules/kernel-*/kernel/extra

Usage

========

- Top-style

perf top (show the most time consuming fucntions)

perf top -p PID (just for selected process)

- Visual

perf timechart record -- sleep 1 (record system-wide statistics)

perf timechart (generate timeline in svg, best viewed using chromium)

- Basic statics

perf stat -d ./mult_sse_debug

perf stat -d -p PID (specified process/thread, until Ctrl+C)

perf stat -d -p PID -- sleep 5 (specified process/thread, for 5 seconds)

perf stat -d -a (system wide)

- Profiling

perf record -F 1000 ./app (sampling at 1000 Hz)

perf record -F 100 -g dwarf ./app (-g instruct to include stack traces, with dwarf much more information obtained)

perf record -F 100 --call-graph lbr ./app (a better approach for stack traces, but requires Haswell processor and 4.1+ kernel)

perf report (all samples separately)

perf script (better for script processing)

perf report -n (standard profile)

perf report -n --stdio (output to standard output)

perf annotate --source dma_ipe_stream_read (mixed with assembler, no other way)

- Hardware events

perf stat -e event1,event2 <app> (specific events)

perf stat -e r80a2 <app> (custom events where format is rUUEE, UU is umask and EE is event number)

perf record -e L1-dcache-loads-misses -g <app> (catch callgraph on all events when L1 cache miss occurs)

perf record -e event -c 1000 -g <app> (catch only 1 event in 1000 to avoid performance pentalty)

- System events

* In OpenSuSE many events are disabled by default, a tracing kernel is required

* perf tracing is much faster compared to strace (but still incur significant penalty unless limited)

perf stat -e 'syscalls:sys_enter_*' Count system calls (seperate number for each of them)

perf stat -e 'block:*' -a Count block device I/O events

perf record -e 'syscalls:*' -a Will also provide some arguments for syscall

- Filter events based on the parameters

perf record -e 'syscalls:sys_enter_read' --filter 'count == 1 || count == 8' -a

- Create histograms (size of incomming read requests)

perf stat -e syscalls:sys_enter_read --filter 'count < 1024' -e syscalls:sys_enter_read --filter 'count >= 1024 && count < 1048576' -e syscalls:sys_enter_read --filter 'count > 1048576' -a -- sleep 5

- Memory (just executes record/report with appropriate set of parameters for memory profilling)

perf mem -t load record <application> (recording)

perf mem -t load report (show number of specified memory operations)

perf mem -t load report --sort mem

- Other

perf bench (microbenchmarks)

perf bench mem memcpy (memcpy performance, strange non-consistent results...)

Probes

======

- Instrument the kernel tcp_sendmsg() function (get all the calls with timestamps and callgraph if stack is enabled)

perf probe --add tcp_sendmsg

perf record -e probe:tcp_sendmsg -ag -- sleep 5

perf probe --del tcp_sendmsg

perf report

- Adding probes on invocation/return of kernel/module function and measuring latencies

perf probe --add pcidriver_ioctl_in=pcidriver_ioctl

perf probe --add pcidriver_ioctl_out=pcidriver_ioctl%return

perf record -p <pid> -e probe:pcidriver_ioctl_in -e probe:pcidriver_ioctl_out -- sleep 5

perf script | grep -v '#' | awk '{ gsub(/:/, "") } $5 ~ /ioctl_in/ { ts[$1, $2] = $4 } $5 ~ /ioctl_out/ { if (l = ts[$1, $2]) { printf "%.f %.f\n", $4 * 1000000, ($4 - l) * 1000000; ts[$1, $2] = 0 } }' > out.lat_us

perf probe --del pcidriver_ioctl_in

perf probe --del pcidriver_ioctl_out

- Inspecting variable values, we just expect the kmem ids and measure function execution time depending on it

perf probe -m pciDriver --add 'kmem_sync_in=pcidriver_kmem_sync ID=kmem_sync->handle.handle_id'

perf probe -m pciDriver --add 'kmem_sync_out=pcidriver_kmem_sync:37 ID=kmem_entry->id'

perf record -p 16390 -e probe:kmem_sync_in -e probe:kmem_sync_out -- sleep 5

....

- Instrumenting user-space functions

* Requires recent kernel, 4.0 is reported to work and 3.11 is reported not to

- Various

perf probe -m <module> ... - Specify module to instrument (or full path to module)

100

perf probe -x <full_path_to_lib_or_binary> - Specify object to instrument

101

perf probe -l - List defined probe

102

perf probe -F - Show functions to instrument

103

perf probe -m pciDriver -L pcidriver_ioctl - Show lines to instrument

104

perf probe -m pciDriver -V pcidriver_ioctl:8 - Show define variables at defined position in code

105

106

Scheduling

107

==========

108

perf sched record -p <pid> -- sleep 5 - Record schedulling information

109

perf sched latency --sort max - Report most significant latencies (how much task have waited for scheduler to put it on CPU)

110

perf sched map - Show the task placement (. - idle CPU, * - event on cpu (new task?), X# - a task running on CPU)

111

perf sched trace - Show raw trace

112

perf sched replay - Emulates the workload (the apps will be visible in top, etc...)

113

114

115

Memory

116

======

117

perf kmem record -- sleep 1 - Record kernel memory allocation statistics

118

perf kmem stat --caller [--page|--slab] - Per-symbol statistics on usage of slab/page kernel allocator

119

perf kmem stat --alloc [...] - Show usage statistics of individual allocated memory blocks

120

perf mem record - Record kernel memory access statistics

121

perf mem report -t [load|store]

122

123

124

Spinlocks

125

=========

126

perf lock info -t - current status

127

perf lock record -- sleep 1 - record

128

perf lock report - report

129

130

I/O

131

===

132

- Measure disk I/O latencies

133

perf record -e block:block_rq_issue -e block:block_rq_complete -a -- sleep 5

134

perf script | awk '{ gsub(/:/, "") } $5 ~ /issue/ { ts[$6, $10] = $4 } $5 ~ /complete/ { if (l = ts[$6, $9]) { printf "%.f %.f\n", $4 * 1000000, ($4 - l) * 1000000; ts[$6, $10] = 0 } }' > out.lat_us

135

136

Memory analysis

137

===============

138

perf stat -e task-clock,cycles,instructions,cache-references,cache-misses <app>

139

perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores L1 details

140

141

Various

142

=======

143

perf record -e sched:sched_process_exec -a Started processes

144

perf record -e syscalls:sys_enter_connect -a Created connection

145

perf record -e 'skb:consume_skb' -ag Who is doing socket I/O

146

147

FlameGraph / Heatmap

148

====================

149

- Very intuitive profiller charts which can be obtained instead opreport.

150

* git clone https://github.com/brendangregg/FlameGraph

151

* perf record ...

152

* perf script | ./stackcollapse-perf.pl > out.perf-folded

153

* ./flamegraph.pl out.perf-folded > perf-kernel.svg

154

* Use mozilla to open svg file

155

- Another tool to visualize latencies produced like exlained in probes

156

* git clone https://github.com/brendangregg/HeatMap.git

157

* perf record ...

158

* perf script | ....

159

* ./trace2heatmap.pl --unitstime=us --unitslat=us --maxlat=50 --unitstime ms out.lat_us > out.svg

160

* Use mozilla to open svg file

161

Older »