1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2
"http://www.w3.org/TR/html4/loose.dtd">
4
<head><title>Ftrace: The hidden light switch [LWN.net]</title>
5
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8>
6
<link rel="icon" href="/images/favicon.png" type="image/png">
7
<link rel="alternate" type="application/rss+xml" title="LWN.net headlines" href="http://lwn.net/headlines/newrss">
8
<link rel="alternate" type="application/rss+xml" title="Comments posted to this article" href="http://lwn.net/headlines/608497/">
9
<link rel="stylesheet" href="/CSS/lwn">
10
<link rel="stylesheet" href="/CSS/nosub">
13
<script type="text/javascript">var p="http",d="static";if(document.location.protocol=="https:"){p+="s";d="engine";}var z=document.createElement("script");z.type="text/javascript";z.async=true;z.src=p+"://"+d+".adzerk.net/ados.js";var s=document.getElementsByTagName("script")[0];s.parentNode.insertBefore(z,s);</script>
14
<script type="text/javascript">
15
var ados_keywords = ados_keywords || [];
16
if( location.protocol=='https:' ) {
17
ados_keywords.push('T:SSL');
19
ados_keywords.push('T:HTTP');
22
var ados = ados || {};
23
ados.run = ados.run || [];
24
ados.run.push(function() {
26
ados_add_placement(4669, 20979, "azk13321_leaderboard", 4).setZone(16026);
28
ados_add_placement(4669, 20979, "azk93271_right_zone", [5,10,6]).setZone(16027);
30
ados_add_placement(4669, 20979, "azk31017_tracking", 20).setZone(20995);
34
ados_setKeywords(ados_keywords.join(', '));
39
<body bgcolor="#ffffff" link="Blue" VLINK="Green" alink="Green">
42
<td class="LeftColumn">
45
<a href="/"><img src="/images/lcorner.png" width="153"
46
height="120" border="0" alt="LWN.net Logo"></a>
49
<script type="text/javascript"><!--
50
google_ad_client = "pub-4358676377058562";
51
google_ad_width = 120;
52
google_ad_height = 240;
53
google_ad_format = "120x240_as";
54
google_ad_type = "text_image";
55
//2007-10-07: side ads
56
google_ad_channel = "0946045135";
57
google_color_border = "ffcc99";
58
google_color_bg = "ffcc99";
59
google_color_link = "0000FF";
60
google_color_text = "000000";
61
google_color_url = "008000";
64
<script type="text/javascript"
65
src="//pagead2.googlesyndication.com/pagead/show_ads.js">
69
<p class="Header">Not logged in</p>
70
<p><a href="/login">Log in now</a></p>
71
<p><a href="/newaccount">Create an account</a></p>
72
<p><a href="/subscribe/Info">Subscribe to LWN</a></p></div>
75
<p class="Header">Weekly Edition</p>
76
<p><a href="/Articles/607987/">Return to the Kernel page</a></p></div>
79
<p class="Header">Recent Features</p>
80
<p><a href="/Articles/654633/">Development statistics for the 4.2 kernel</a></p>
81
<p><a href="/Articles/653711/">LWN.net Weekly Edition for August 13, 2015</a></p>
82
<p><a href="/Articles/654128/">Working with xdg-app application bundles</a></p>
83
<p><a href="/Articles/653180/">LWN.net Weekly Edition for August 6, 2015</a></p>
84
<p><a href="/Articles/653411/">"Big data" features coming in PostgreSQL 9.5</a></p></div>
87
<p><a href="/Articles/608497/?format=printable">Printable page</a></p></div>
92
<td class="MidColumn">
93
<table class="TopNavigation">
95
<!-- First row - content links -->
97
<td class="NavLink"><a href="/current/">Weekly edition</a></td>
99
<a href="/Kernel/">Kernel</a></td>
100
<td class="NavLink"><a href="/Security/">Security</a></td>
102
<a href="http://lwn.net/Distributions/">Distributions</a></td>
103
<td class="NavLink"><a href="/op/FAQ.lwn#contact">Contact Us</a> </td>
104
<td class="NavLink"><a href="/Search/">Search</a> </td>
106
<!-- Second row: navigation links -->
108
<td class="NavLink"><a href="/Archives/">Archives</a></td>
109
<td class="NavLink"><a href="/Calendar/">Calendar</a></td>
110
<td class="NavLink"><a href="/subscribe/Info">Subscribe</a></td>
111
<td class="NavLink"><a href="/op/AuthorGuide.lwn">Write for LWN</a></td>
112
<td class="NavLink"><a href="/op/FAQ.lwn">LWN.net FAQ</a></td>
113
<td class="NavLink"><a href="/op/Sponsors.lwn">Sponsors</a></td>
118
<tr><td colspan=2 class="MCTopBanner">
119
<div id="azk13321_leaderboard"></div></td></tr><tr><td class="MidColumn">
120
<div class="PageHeadline">
121
<h1>Ftrace: The hidden light switch</h1>
123
<div class="ArticleText">
124
<div class="GAByline">
125
<p>August 13, 2014</p>
126
<p>This article was contributed by <a href="http://www.brendangregg.com">Brendan Gregg</a></p>
128
<p>You may think, as I did, that analyzing the Linux kernel is like
129
venturing through a dark dungeon: without the addition of advanced tracers
130
like SystemTap, there's much that can't be seen, and can only be
133
However, I've recently found hidden switches that turn on some bright
134
lights, strategically placed by Steven Rostedt and others since the 2.6.27
135
release. These are the ftrace profilers. I haven't even tried all the
136
switches yet, but I'm stunned at what I've seen so far, and I'm having to
137
rethink what I previously believed about Linux kernel performance
140
<p>Recently at Netflix (where I work), a Cassandra database was performing
142
system upgrade, and disk I/O inflation (a massive
143
increase in the number of I/O operations submitted) was suspected. There
145
causes for this: a worse cache-hit ratio, record-size inflation, readahead
146
inflation, other applications, even other asynchronous kernel tasks (file
147
system background scrubs). The question was: which one, and how do we fix
150
<h4 id="iosnoop">1. iosnoop</h4>
151
<p>I started with basic server health checks, and then my <a
152
href="http://www.brendangregg.com/blog/2014-07-16/iosnoop-for-linux.html"><tt>iosnoop</tt></a>
154
<tt>iosnoop</tt> is a shell script that uses the <tt>/sys/kernel/debug</tt>
156
facilities, and is in my <a
157
href="https://github.com/brendangregg/perf-tools">perf-tools</a> collection
158
on GitHub, along with other ftrace hacks. These work on Netflix's existing
159
servers (which often run Linux 3.2 with security patches) without any
160
other addition to the system, and without requiring kernel debuginfo.
161
In this case, <tt>iosnoop</tt> was run both with
162
and without <tt>-Q</tt> to see the effect of queuing:</p>
165
# <b>./iosnoop -ts</b>
166
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms
167
13370264.614265 13370264.614844 java 8248 R 202,32 1431244248 45056 0.58
168
13370264.614269 13370264.614852 java 8248 R 202,32 1431244336 45056 0.58
169
13370264.614271 13370264.614857 java 8248 R 202,32 1431244424 45056 0.59
170
13370264.614273 13370264.614868 java 8248 R 202,32 1431244512 45056 0.59
172
# <b>./iosnoop -Qts</b>
173
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms
174
13370410.927331 13370410.931182 java 8248 R 202,32 1596381840 45056 3.85
175
13370410.927332 13370410.931200 java 8248 R 202,32 1596381928 45056 3.87
176
13370410.927332 13370410.931215 java 8248 R 202,32 1596382016 45056 3.88
177
13370410.927332 13370410.931226 java 8248 R 202,32 1596382104 45056 3.89
181
<p>I didn't see anything out of the ordinary: a higher disk I/O load was
182
causing higher queue times.</p>
185
<p>The tools that follow are all from the same collection: they demonstrate
186
existing capabilities of ftrace, and how they are useful for solving real
189
<h4 id="tpoint">2. tpoint</h4>
191
<p>To investigate these disk reads in more detail, I used <tt>tpoint</tt> to
192
trace the <tt>block:block_rq_insert</tt> tracepoint:</p>
195
# <b>./tpoint -H block:block_rq_insert</b>
196
Tracing block:block_rq_insert. Ctrl-C to end.
199
# TASK-PID CPU# TIMESTAMP FUNCTION
201
java-16035 [000] 13371565.253582: block_rq_insert: 202,16 WS 0 () 550505336 + 88 [java]
202
java-16035 [000] 13371565.253582: block_rq_insert: 202,16 WS 0 () 550505424 + 56 [java]
203
java-8248 [007] 13371565.278372: block_rq_insert: 202,32 R 0 () 660621368 + 88 [java]
204
java-8248 [007] 13371565.278373: block_rq_insert: 202,32 R 0 () 660621456 + 88 [java]
205
java-8248 [007] 13371565.278374: block_rq_insert: 202,32 R 0 () 660621544 + 24 [java]
206
java-8249 [007] 13371565.311507: block_rq_insert: 202,32 R 0 () 660666416 + 88 [java]
210
<p>I was checking for anything obviously odd, but the I/O details looked
211
normal. The <tt>-H</tt> option prints column headers.</p>
213
<p>Next, I traced the code path that led to this I/O by printing stack
214
traces (<tt>-s</tt>), to see if they contained an explanation. I also added an
215
in-kernel filter to match reads only (when the "<tt>rwbs</tt>"
216
flag field contains "<tt>R</tt>"):</p>
219
# <b>./tpoint -s block:block_rq_insert 'rwbs ~ "*R*"' | head -1000</b>
220
Tracing block:block_rq_insert. Ctrl-C to end.
221
java-8248 [005] 13370789.973826: block_rq_insert: 202,16 R 0 () 1431480000 + 8 [java]
222
java-8248 [005] 13370789.973831: <stack trace>
223
=> blk_flush_plug_list
225
=> generic_make_request.part.50
226
=> generic_make_request
228
=> do_mpage_readpage
229
=> mpage_readpages
230
=> xfs_vm_readpages
232
=> __do_page_cache_readahead
234
=> do_sync_mmap_readahead.isra.24
237
=> handle_pte_fault
238
=> handle_mm_fault
241
java-8248 [005] 13370789.973831: block_rq_insert: 202,16 R 0 () 1431480024 + 32 [java]
242
java-8248 [005] 13370789.973836: <stack trace>
243
=> blk_flush_plug_list
245
=> generic_make_request.part.50
249
<p>Great! The output is similar to the previous example, but with stack
250
traces beneath each tracepoint event. I limited the output using <tt>head</tt> as
253
<p><tt>tpoint</tt> is another ftrace-based tool. It's usually better to use
255
<tt>perf</tt> command) for this
256
particular use case, as it can handle multi-user access to performance data
258
rate, although it is more time-consuming to use. I just wanted to quickly
259
eyeball a few dozen stack traces for a given tracepoint.</p>
261
<p>The stacks were mostly the same as the example above, which provided a
262
couple of leads: page faults and readahead. This Ubuntu system was using 2MB
263
direct-mapped pages, instead of 4KB like the old system. It also
264
had readahead set to 2048KB, instead of 128KB. Either of these
265
differences could be causing the inflation, although tuning readahead had
266
already been tested, and found to make no difference.</p>
268
<h4 id="funccount">3. funccount</h4>
270
<p>I wanted to understand that stack trace better, so I started by counting
271
calls using <tt>funccount</tt>, which uses ftrace function profiling. Starting
272
with the per-second rate of <tt>submit_bio()</tt>:</p>
275
# <b>./funccount -i 1 submit_bio</b>
276
Tracing "submit_bio"... Ctrl-C to end.
286
<p>This rate, about 28,000 calls per second, is on par with what the disks
288
as seen from <tt>iostat</tt>. <tt>funccount</tt> is counting events in the
289
kernel for efficiency.</p>
291
<p>Now checking the rate of <tt>filemap_fault()</tt>, which is closer in
292
the stack to the database:</p>
295
# <b>./funccount -i 1 filemap_fault</b>
296
Tracing "filemap_fault"... Ctrl-C to end.
306
<p>This is consistent with what we believed the application was requesting
307
from the filesystem. There is about a 10x inflation between these calls
308
and the issued disk I/O (as evidenced by the <tt>submit_bio()</tt> calls).</p>
310
<h4 id="funcslower">4. funcslower</h4>
312
<p>Just to confirm that the database is suffering latency caused by the stack
313
trace I was studying, I used <tt>funcslower</tt> (another ftrace-based
314
tool, which uses in-kernel timing and filtering for efficiency)
315
to measure <tt>filemap_fault()</tt> calls
316
taking longer than 1000 microseconds (1ms):</p>
319
# <b>./funcslower -P filemap_fault 1000</b>
320
Tracing "filemap_fault" slower than 1000 us... Ctrl-C to end.
321
0) java-8210 | ! 5133.499 us | } /* filemap_fault */
322
0) java-8258 | ! 1120.600 us | } /* filemap_fault */
323
0) java-8235 | ! 6526.470 us | } /* filemap_fault */
324
2) java-8245 | ! 1458.30 us | } /* filemap_fault */
328
<p>These latencies look similar to those seen from disk I/O (with queue
329
time). I'm in the right area.</p>
331
<h4 id="funccount-again">5. funccount (again)</h4>
333
<p>I noticed that the stack has "readpage" calls and then
334
"readpage<b>s</b>". Tracing them both at the same time:</p>
337
# <b>./funccount -i 1 '*mpage_readpage*'</b>
338
Tracing "*mpage_readpage*"... Ctrl-C to end.
342
do_mpage_readpage 122930
346
do_mpage_readpage 110344
350
<p>Here's our inflation: <tt>mpage_readpages()</tt> is being called about
351
300 times per second, and then <tt>do_mpage_readpage()</tt> over 100k times
352
per second. This still looks like readahead, although we did try to adjust
353
readahead sizes as an
354
experiment, and it didn't make a difference.</p>
356
<h4 id="kprobe">6. kprobe</h4>
358
<p>Maybe our readahead tuning didn't take effect? I can check this using
359
dynamic tracing of kernel functions.
361
Starting with the above stack trace, I saw that
362
<tt>__do_page_cache_readahead()</tt>
363
has <tt>nr_to_read</tt> (number of pages to read) as an argument, which
365
the readahead setting. Using <tt>kprobe</tt>, an ftrace- and kprobes-based tool,
366
to dynamically trace this function and argument:</p>
369
# <b>./kprobe -H 'p:do __do_page_cache_readahead nr_to_read=%cx'</b>
370
Tracing kprobe m. Ctrl-C to end.
373
# TASK-PID CPU# TIMESTAMP FUNCTION
375
java-8714 [000] 13445354.703793: do: (__do_page_cache_readahead+0x0/0x180) nr_to_read=200
376
java-8716 [002] 13445354.819645: do: (__do_page_cache_readahead+0x0/0x180) nr_to_read=200
377
java-8734 [001] 13445354.820965: do: (__do_page_cache_readahead+0x0/0x180) nr_to_read=200
378
java-8709 [000] 13445354.825280: do: (__do_page_cache_readahead+0x0/0x180) nr_to_read=200
382
<p>I used <tt>-H</tt> to print the header, and <tt>p:</tt> to specify that we
383
will create a probe on function entry, which we'll call "do"
384
(that alias is optional). The rest of that line specifies the function and
387
Without kernel debuginfo, I can't refer to the <tt>nr_to_read</tt>
388
symbol, so I need to use registers instead. I guessed %cx: If true, then
389
our tuning hasn't taken hold, as 200 in hex is 512 pages: the original 2048KB.</p>
391
<h4 id="funcgraph">7. funcgraph</h4>
393
<p>To be sure, I read the code to see how this value is passed around, and
394
used <tt>funcgraph</tt> to illustrate it:</p>
397
# <b>./funcgraph -P filemap_fault | head -1000</b>
398
2) java-8248 | | filemap_fault() {
399
2) java-8248 | 0.568 us | find_get_page();
400
2) java-8248 | | do_sync_mmap_readahead.isra.24() {
401
2) java-8248 | 0.160 us | max_sane_readahead();
402
2) java-8248 | | ra_submit() {
403
2) java-8248 | | __do_page_cache_readahead() {
404
2) java-8248 | | __page_cache_alloc() {
405
2) java-8248 | | alloc_pages_current() {
406
2) java-8248 | 0.228 us | interleave_nodes();
407
2) java-8248 | | alloc_page_interleave() {
408
2) java-8248 | | __alloc_pages_nodemask() {
409
2) java-8248 | 0.105 us | next_zones_zonelist();
410
2) java-8248 | | get_page_from_freelist() {
411
2) java-8248 | 0.093 us | next_zones_zonelist();
412
2) java-8248 | 0.101 us | zone_watermark_ok();
413
2) java-8248 | | zone_statistics() {
414
2) java-8248 | 0.073 us | __inc_zone_state();
415
2) java-8248 | 0.074 us | __inc_zone_state();
416
2) java-8248 | 1.209 us | }
417
2) java-8248 | 0.142 us | prep_new_page();
418
2) java-8248 | 3.582 us | }
419
2) java-8248 | 4.810 us | }
420
2) java-8248 | 0.094 us | inc_zone_page_state();
424
<p><tt>funcgraph</tt> uses another ftrace feature: the function graph
425
profiler. It has moderate overhead, since it traces all kernel functions,
426
so I only use it for exploratory purposes like this one-off. The output
427
shows the code-flow in the kernel, and even has time deltas in
428
microseconds. It's the call to <tt>max_sane_readahead()</tt> that is
429
interesting, as that fetches the readahead value it wants to use.</p>
431
<h4 id="kprobe-again">8. kprobe (again)</h4>
433
<p>This time I'll trace the return of the <tt>max_sane_readahead()</tt>
436
# <b>./kprobe 'r:m max_sane_readahead $retval'</b>
437
Tracing kprobe m. Ctrl-C to end.
438
java-8700 [000] 13445377.393895: m: (do_sync_mmap_readahead.isra.24+0x62/0x9c <- \
439
max_sane_readahead) arg1=200
440
java-8723 [003] 13445377.396362: m: (do_sync_mmap_readahead.isra.24+0x62/0x9c <- \
441
max_sane_readahead) arg1=200
442
java-8701 [001] 13445377.398216: m: (do_sync_mmap_readahead.isra.24+0x62/0x9c <- \
443
max_sane_readahead) arg1=200
444
java-8738 [000] 13445377.399793: m: (do_sync_mmap_readahead.isra.24+0x62/0x9c <- \
445
max_sane_readahead) arg1=200
446
java-8728 [000] 13445377.408529: m: (do_sync_mmap_readahead.isra.24+0x62/0x9c <- \
447
max_sane_readahead) arg1=200
451
<p>This is also 0x200 pages: 2048KB, and this time I used the <tt>$retval</tt>
452
alias instead of guessing registers. So the tuning really did not take
454
Studying the kernel source, I saw that the readahead property was set by
455
a function called <tt>file_ra_state_init()</tt>. Under what circumstances is
456
<em>that</em> called, and how do I trigger it? ftrace/kprobes to the rescue
460
# <b>./kprobe -s p:file_ra_state_init</b>
461
Tracing kprobe m. Ctrl-C to end.
462
kprobe-20331 [002] 13454836.914913: file_ra_state_init: (file_ra_state_init+0x0/0x30)
463
kprobe-20331 [002] 13454836.914918: <stack trace>
465
=> nameidata_to_filp
471
=> system_call_fastpath
472
kprobe-20332 [007] 13454836.915191: file_ra_state_init: (file_ra_state_init+0x0/0x30)
473
kprobe-20332 [007] 13454836.915194: <stack trace>
475
=> nameidata_to_filp
479
<p>This time I used <tt>-s</tt> to print stack traces, which showed that
481
happens from the <tt>open()</tt> syscall. As I'd left Cassandra running
483
readahead, it may not have reopened its files and run
484
<tt>file_ra_stat_init()</tt>.
486
So I restarted Cassandra to see if the readahead tuning would then take
487
effect, and re-measured:</p>
490
# <b>./kprobe 'r:m max_sane_readahead $retval'</b>
491
Tracing kprobe m. Ctrl-C to end.
492
java-11918 [007] 13445663.126999: m: (ondemand_readahead+0x3b/0x230 <- \
493
max_sane_readahead) arg1=80
494
java-11918 [007] 13445663.128329: m: (ondemand_readahead+0x3b/0x230 <- \
495
max_sane_readahead) arg1=80
496
java-11918 [007] 13445663.129795: m: (ondemand_readahead+0x3b/0x230 <- \
497
max_sane_readahead) arg1=80
498
java-11918 [007] 13445663.131164: m: (ondemand_readahead+0x3b/0x230 <- \
499
max_sane_readahead) arg1=80
505
<p><tt>iostat</tt> showed a large drop in disk I/O, and the database latency
506
measurements were much better. This was simply a readahead change, where
507
the new Ubuntu instances defaulted to 2048KB. What had misled us
508
earlier was that tuning it had not appeared to make a difference, as the
509
setting wasn't taking effect.</p>
511
<p>Another tunable we checked was the I/O scheduler, and whether changing
512
it from "deadline" to "noop" was immediate. Using:
515
./funccount -i 1 'deadline*'
516
./funccount -i 1 'noop*'
519
gave us the answer: they showed the rate of the related
520
kernel functions, and that the tuning was indeed immediate.</p>
522
<h4 id="ftrace-and-perf-tools">ftrace and perf-tools</h4>
524
<p>All the tools I used here are from my
526
collection, which are front-ends to ftrace and related tracers (kprobes,
527
tracepoints, and the function profiler). I've described some of them as
528
hacks, which they are, as they use creative workarounds for the lack of
529
some in-kernel features. For example, <tt>iosnoop</tt> reads both issue and
530
completion events in user space, and calculates the latencies there,
531
instead of doing that more efficiently in kernel context.</p>
533
<p>These tools are for older Linux systems without kernel debuginfo, and
534
show what Linux can do using only its built-in ftrace. There's even more to
535
ftrace that I didn't show here: profiling function average latency, tracing
536
wakeup events, function probe triggers, etc. There is also Steven Rostedt's
537
own front-end to ftrace, a multi-tool called <a
538
href="http://git.kernel.org/cgit/linux/kernel/git/rostedt/trace-cmd.git">trace-cmd</a>
539
(<a href="/Articles/410200/">covered on LWN</a> in 2010),
540
which can do more than my collection of smaller tools, and is also much
541
easier to use than operating on the <tt>/sys</tt> files directly.</p>
543
<p>To do even more, perf_events (which is also part of the Linux kernel
544
source) with kernel debuginfo lets me examine complex data structures, and
545
even trace kernel code by line number and watch local variables. What I still
546
can't do, however, is perform complex in-kernel aggregations until I add
547
other advanced tracers like <a
548
href="https://sourceware.org/systemtap/">SystemTap</a>, <a
549
href="/Articles/551314/">ktap</a>, or coming up: <a
550
href="/Articles/603983/">eBPF</a>.</p>
552
<h4 id="future-work">Future Work</h4>
554
<p>Changes are on the way for the Linux kernel, as regular LWN
555
readers likely know. The capabilities
556
I need, to avoid using hacks, may be provided by eBPF, which may be merged
557
<a href="http://lwn.net/Articles/606089/">as soon as Linux 3.18</a>. If
558
that happens, I'll be happy to create new versions of these tools making
559
use of eBPF. They will likely look and feel the same, but their
560
implementation will be much more efficient and reliable. I'll also be able
561
to create many more tools without the fear of maintaining too many
564
<p>Although, even if Linux does bring these capabilities in an upcoming
565
release, it’ll be some time before I can really use them in production in
566
my work environment, depending on how quickly we can or want to be on the
567
latest kernel. So, while my hacks are temporary workarounds, they may be
568
useful for some time to come.</p>
570
<h4 id="conclusion">Conclusion</h4>
572
<p>We have a large number of Linux cloud instances to analyze at Netflix,
573
and some interesting and advanced performance issues to solve. While we've
574
been looking at using advanced tracers like SystemTap, I've also been
575
studying what's there already, including the ftrace profilers. The
577
this post use ftrace, so they work on our our existing instances as-is,
578
without even installing kernel debuginfo.</p>
580
<p>In this particular example, a Cassandra database experienced a disk I/O
581
inflation issue caused by an increased readahead setting. I traced disk
582
I/O events and latencies, and how these were created in the kernel. This
583
included examining stack traces, counting function-call rates, measuring
584
slow function times, tracing call graphs, and dynamic tracing of function
585
calls and returns, with their arguments and return values.</p>
587
<p>I did all of this using ftrace, which has been in the Linux kernel for
588
years. I found the hidden light switches.</p>
590
<p>If you are curious about the inner workings of these ftrace tools, see
591
"<a href="http://lwn.net/Articles/370423/">Secrets of the Ftrace function
592
tracer</a>" by Steven Rostedt, "Debugging the kernel using Ftrace" <a
593
href="http://lwn.net/Articles/365835/">part 1</a> and <a
594
href="http://lwn.net/Articles/366796/">part 2</a>, and <a
595
href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/ftrace.txt">Documentation/trace/ftrace.txt</a>
596
in the kernel source. The most important lesson here isn't about my tools,
597
but that this level of tracing is even possible on existing Linux
598
kernels. And if eBPF is added, a <em>lot</em> more will be possible.</p><hr width="60%" align="left">
599
(<a href="https://lwn.net/login?target=/Articles/608497/">Log in</a> to post comments)
602
</div> <!-- ArticleText -->
603
<p><a name="Comments"></a>
605
<a name="CommAnchor608810"></a>
606
<div class="CommentBox">
607
<p class="CommentTitle">Ftrace: The hidden light switch</p>
608
<div class="CommentBody">
609
<p class="CommentPoster">Posted Aug 17, 2014 19:45 UTC (Sun) by <b>bernat</b> (subscriber, #51658)
610
[<a href="/Articles/608810/">Link</a>]
612
<div class="FormattedComment">
615
Unfortunately, unlike Ubuntu kernels, Debian kernels come with most useful tracers disabled. This seems due to this bug:<br>
616
<a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568025">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568025</a><br>
618
I have opened a bug report to ask for them to be enabled again:<br>
619
<a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=758469">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=758469</a><br>
621
With the fact that Debian does not come with debug symbols for all packages (something that I hope to address in the future), this makes Debian a bit more unfriendly with respect to all those performance tools, compared to Ubuntu.<br>
626
<div class="CommentReplyButton">
627
<form action="/Articles/608810/comment" method="post">
628
<input type="submit" value="Reply to this comment">
633
<div class="Comment">
635
<a name="CommAnchor609449"></a>
636
<div class="CommentBox">
637
<p class="CommentTitle">Ftrace: The hidden light switch</p>
638
<div class="CommentBody">
639
<p class="CommentPoster">Posted Aug 25, 2014 9:11 UTC (Mon) by <b>Shugyousha</b> (subscriber, #93672)
640
[<a href="/Articles/609449/">Link</a>]
642
<div class="FormattedComment">
643
A great article indeed.<br>
645
There is a lot of tracing infrastructure in the Kernel and a lot of tools for tracing available. It would be nice if we could get an(other?) article comparing the different tools and their capabilities. Candidates for such a comparison would be SystemTap, ktap, eBPF, the Kernel perf tool and maybe the newer sysdig (<a href="http://www.sysdig.org/">http://www.sysdig.org/</a>) for example.<br>
652
<div class="CommentReplyButton">
653
<form action="/Articles/609449/comment" method="post">
654
<input type="submit" value="Reply to this comment">
659
<div class="Comment">
661
<a name="CommAnchor611687"></a>
662
<div class="CommentBox">
663
<p class="CommentTitle">Ftrace: The hidden light switch</p>
664
<div class="CommentBody">
665
<p class="CommentPoster">Posted Sep 12, 2014 6:45 UTC (Fri) by <b>bgregg</b> (subscriber, #46639)
666
[<a href="/Articles/611687/">Link</a>]
668
<div class="FormattedComment">
669
I'd like to write a comparative article at some point, although it is challenging to compare them all, since there's so many factors to take into account. And I'd like to do more research first, such as using SystemTap's non-debuginfo features more (which are practically undocumented, but should be an important factor for real world usage), spend more time with LTTng, try out the ftrace function triggers, try writing a kernel module for tracing based on samples/kprobes/*.c (has no one done this? I can find zero examples other than the kernel source), etc.<br>
671
The best place for tracing system comparisons has, historically, been here on lwn.net, like the ktap vs eBPF article, and others by Jonathan Corbet. So, keep reading... :-)<br>
676
<div class="CommentReplyButton">
677
<form action="/Articles/611687/comment" method="post">
678
<input type="submit" value="Reply to this comment">
686
<a name="CommAnchor635324"></a>
687
<div class="CommentBox">
688
<p class="CommentTitle">Ftrace: The hidden light switch</p>
689
<div class="CommentBody">
690
<p class="CommentPoster">Posted Mar 3, 2015 9:24 UTC (Tue) by <b>ituralde</b> (guest, #101279)
691
[<a href="/Articles/635324/">Link</a>]
693
<div class="FormattedComment">
694
Is it possible to use perf-tools (or merely ftrace of perf_events) to profile functions of kernel modules? By modules I mean once that are inserted to a stable kernel.<br>
695
The Q might be silly, I am quite a novice in this matter.<br>
700
<div class="CommentReplyButton">
701
<form action="/Articles/635324/comment" method="post">
702
<input type="submit" value="Reply to this comment">
708
<td class="RightColumn">
709
<div id="azk93271_right_zone"></div>
712
</tr></table><!-- endpage -->
718
Copyright © 2014, Eklektix, Inc.<BR>
720
Comments and public postings are copyrighted by their creators.<br>
721
Linux is a registered trademark of Linus Torvalds<br>
b'\\ No newline at end of file'