As you should be aware of by now, collectl will try to display everything you've selected in brief format if it can, writing all your data out on a single line for each sampling interval. This can get quite wide depending on how many subsystems you choose to monitor and while you can cerainly specify collectl -s+l to add lustre to the default subsystems, the output width is too cumbersome for this tutorial. Therfore I'll use some other subsystems and switches in the examples to mix things up, showing that there are a lot of possible combinations. In this first example, we see what happens when you run collectl on a lustre client and request cpu and memory data along with lustre.
$ collectl -scml #<--------CPU--------><-----------Memory----------><-------Lustre Client------> #cpu sys inter ctxsw free buff cach inac slab map Reads KBRead Writes KBWrite 0 0 101 26 3G 6M 29M 12M 43M 65M 0 0 0 0
Collectl is actually very intelligent about dealing with lustre because if you were to run the identical command on an OST, it would recognize that too and change what it shows accordingly as you can see below.
$ collectl -scdl #<--------CPU--------><-----------Disks-----------><--------Lustre OST-------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBRead Reads KBWrit Writes 0 0 100 28 0 0 0 0 0 0 0 0
$ collectl -scl -oT # <--------CPU--------><--------Lustre OST-------><-------Lustre Client------> #Time cpu sys inter ctxsw KBRead Reads KBWrit Writes Reads KBRead Writes KBWrite 14:35:32 0 0 103 24 0 0 0 0 0 0 0 0 14:35:33 0 0 123 53 0 0 0 0 0 0 0 0
And finally, don't forget any of this data can be written to a file for continuous logging, played back later and even converted to a format suitable for plotting. In fact collectl is configured to monitor lustre by default when run as a daemon so all you need to do to begin collecting the basic data shown above is service collectl start.
CLIENTS
Let's start off by looking more closely at client data. As with all other collectl data, one can switch between summary and detail data by simply entering an upper case L instead of a lower case one as I've done below. The actual content of what is displayed will depend on whether you are on a client, OSS (there is currently no detail data for an MDS), but as you can see for clients, this data is broken down by filesystem and just to make the display a little more interesting I decided to show the timestamps in milli-seconds. Naturally if there is only one filesystem, the data with match that displayed in summary mode.
$ collectl -sL -oTm # LUSTRE CLIENT DETAIL # Filsys Reads ReadKB Writes WriteKB 15:10:06.009 spfs1 0 0 0 0 15:10:06.009 spfs2 0 0 0 0
$ collectl -sL --lustopts O -oTm # LUSTRE CLIENT DETAIL # Filsys Ost Reads ReadKB Writes WriteKB 15:19:17.007 spfs1 OST0000 0 0 0 0 15:19:17.007 spfs1 OST0001 0 0 0 0 15:19:17.007 spfs2 OST0000 0 0 0 0 15:19:17.007 spfs2 OST0001 0 0 0 0
$ collectl -sl --lustopts R -oD # <-------------Lustre Client--------------> #Date Time Reads KBRead Writes KBWrite Hits Misses 20080319 15:20:12 0 0 0 0 0 0 20080319 15:20:13 0 0 0 0 0 0
$ collectl -sl --lustopts R --verbose
# LUSTRE CLIENT SUMMARY: READAHEAD
# Reads ReadKB  Writes WriteKB  Pend  Hits Misses NotCon MisWin LckFal  Discrd ZFile ZerWin RA2Eof HitMax
      0      0       0       0     0     0      0      0      0      0      0      0      0      0      0
      0      0       0       0     0     0      0      0      0      0      0      0      0      0      0
$ collectl -sl --lustopts BM
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages) METADATA
#Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   2P   4P   8P  16P  32P  64P 128P 256P
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
# Reads ReadKB  Writes WriteKB  Open Close GAttr SAttr  Seek Fsynk DrtHit DrtMis
      0      0       0       0     0     0     0     0     0     0      0      0
Object Storage Server
If you haven't figured it out yet, it's also possible to display OST level information on an OSS by simply using the uppercase subsystem specification as you can see here, noting just to be different I've chosen a second form of the date/timestamp.
$ collectl -sL -od # LUSTRE FILESYSTEM SINGLE OST STATISTICS # Ost Read Ops Read KB Write Ops Write KB 03/19 15:32:14 spfs1-OST0000 0 0 0 0 03/19 15:32:14 spfs2-OST0000 0 0 0 0 03/19 15:32:15 spfs1-OST0000 0 0 0 0 03/19 15:32:15 spfs2-OST0000 0 0 0 0
Metadata Server
As mentioned earlier, there is no detail data for an MDS nor are there any other types of data other than that which can be displayed in summary mode.
I just want to close with an example of a problem with readaheads and Lustre 1.4 which demonstrates the power of a tool like collectl that can show multiple types of data at once because I know a lot of people may still not be convinced. Consider the following sample which was collected while doing random 32KB reads of a large file. See anything wrong? Do you know why the network bandwidth is so much higher than the lustre client read rate? How long would it have taken to even realize there is a problem?
$ collectl -snl -oT # <----------Network----------><-------------Lustre Client--------------> #Time netKBi pkt-in netKBo pkt-out Reads KBRead Writes KBWrite Hits Misses 08:14:18 41776 28310 1065 14786 50 200 0 0 30 20 08:14:19 38328 25987 1032 14078 62 248 0 0 35 19 08:14:20 44763 30337 1167 16114 58 232 0 0 30 20 08:14:21 43666 29596 1137 15632 46 184 0 0 30 16 08:14:22 33777 22905 891 12191 58 232 0 0 35 23
$ collectl -snl -oT # <----------Network----------><-------------Lustre Client--------------> #Time netKBi pkt-in netKBo pkt-out Reads KBRead Writes KBWrite Hits Misses 08:21:22 0 3 0 3 59 236 0 0 0 0 08:21:23 317 335 58 298 91 364 0 0 0 0 08:21:24 442 457 77 388 107 428 0 0 0 0 08:21:25 442 457 77 383 97 388 0 0 0 0 08:21:26 432 446 75 373 89 356 0 0 0 0
Note: This readahead algorithm has changed with V1.6 and lustre no longer triggers readahead on the third page read but rather on the third sequential read.
| updated Mar 25, 2010 |