I’ll be the first to admit that I’m an SCP addict. It doesn’t matter what kind of data I’m working with, if it can be turned into an object that I move around with scp I’m in! One thing I’ve always noticed with scp is the dismal out of the box performance. I read quite some time back on the openssh mailing list that there were some fixed buffers inside openssh that prevented copy operations from fully utilizing high speed network links.
This past week my fellow bloging partner Mike sent me a link to the high performance ssh project. This nifty project created a patch that replaces the fixed length buffers with values determined at runtime. This is pretty fricking cool, and I hope they get their changes merged into the openssh source code! The site that currently hosts the HPN-SSH code also has a good article on tuning your TCP/IP network stack. Good stuff!
I have been supporting postfix mail relays for several years. When you are tasked with ensuring that your SMTP relays work without issue, you need to stay on top of the logs and the current operational state of each server to prevent major problems from occurring. Problems can take the form of blacklisted relays, mail routing loops, spam attacks or people trying to exploit a known issue in the SMTP software. All of these items suck for a site that relies on mail to function.
To help me keep on top of the relays I support, I use a multi-faceted approach. I summarize the log files daily, create scripts to notify me of problems before they cause issues, and try to be proactive about my relays (as much as you can be). To summarize my logs I use a combination of custom developed scripts and the incredibly awesome pflogsumm Perl script. The output of the scripts I developed provides me with specific data, and pflogsum produces a beautiful report that summarizes the overall operation of my Postfix SMTP relays:
$ zcat /var/log/maillog-20110710.gz | pflogsumm.pl
Grand Totals
------------
messages
19004 received
20824 delivered
696 forwarded
3 deferred (3 deferrals)
0 bounced
0 rejected (0%)
0 reject warnings
0 held
0 discarded (0%)
57945k bytes received
139101k bytes delivered
66 senders
28 sending hosts/domains
175 recipients
31 recipient hosts/domains
Per-Day Traffic Summary
date received delivered deferred bounced rejected
--------------------------------------------------------------------
Jul 3 2011 2006 1875
Jul 4 2011 2212 1992 2
Jul 5 2011 2855 3568
Jul 6 2011 3212 3823
Jul 7 2011 3212 3962 1
Jul 8 2011 2581 2747
Jul 9 2011 2665 2654
Jul 10 2011 261 203
Per-Hour Traffic Daily Average
time received delivered deferred bounced rejected
--------------------------------------------------------------------
0000-0100 78 51 0 0 0
0100-0200 74 51 0 0 0
0200-0300 99 87 0 0 0
0300-0400 73 52 0 0 0
0400-0500 73 53 0 0 0
0500-0600 76 68 0 0 0
0600-0700 110 129 0 0 0
0700-0800 106 134 0 0 0
0800-0900 84 70 0 0 0
0900-1000 92 103 0 0 0
1000-1100 103 106 0 0 0
1100-1200 102 103 0 0 0
1200-1300 95 93 0 0 0
1300-1400 103 116 0 0 0
1400-1500 108 124 0 0 0
1500-1600 112 142 0 0 0
1600-1700 114 129 0 0 0
1700-1800 111 125 0 0 0
1800-1900 135 176 0 0 0
1900-2000 144 194 0 0 0
2000-2100 134 224 0 0 0
2100-2200 88 101 0 0 0
2200-2300 82 85 0 0 0
2300-2400 82 88 0 0 0
Host/Domain Summary: Message Delivery
sent cnt bytes defers avg dly max dly host/domain
-------- ------- ------- ------- ------- -----------
10247 81716k 1 0.5 s 5.5 m prefetch.net
.....
Host/Domain Summary: Messages Received
msg cnt bytes host/domain
-------- ------- -----------
3094 22088k server1.prefetch.net
.....
Senders by message count
------------------------
2288 badperson1@prefetch.net
1630 badperson2@prefetch.net
.....
Recipients by message count
---------------------------
4415 donotspam1@prefetch.net
3046 donotspam2@prefetch.net
.....
Senders by message size
-----------------------
15786k badperson1@prefetch.net
13414k badperson2@prefetch.net
.....
Recipients by message size
--------------------------
29981k donotspam1@prefetch.net
25492k donotspam2@prefetch.net
.....
Messages with no size data
--------------------------
D07FA650 fubar@prefetch.net
.....
message deferral detail
-----------------------
smtp (total: 3)
2 25: Connection timed out
1 lost connection with prefetch.net[1.1.1.1...
message bounce detail (by relay): none
message reject detail: none
message reject warning detail: none
message hold detail: none
message discard detail: none
smtp delivery failures: none
Warnings
--------
smtpd (total: 14)
14 non-SMTP command from unknown[1.1.1.1]: To:
trivial-rewrite (total: 430)
430 database /etc/postfix/transport.db is older than source file /e...
Fatal Errors: none
Panics: none
Master daemon messages: none
I can’t tell you how many times I’ve run this script and seen something in the output that left me wondering WTF?!? Several times I’ve been able to use the output from this script to nip a problem before it becomes a major issue. The output is also handy for developing some additional monitoring for your SMTP services. If you are using postfix you should definitely check out pflogsumm. You will be glad you did!!!
Anyone that is running a modern Operating System is most likely utilizing TCP/IP to send and receive data. Modern TCP/IP stacks are somewhat complex and have a slew of tunables to control their behavior. The choice of when and when not to tune is not always super clear cut, since documentation and the advice of various network “experts” doesn’t always jive.
When I’m looking into performance problems that are network related one of the first things I review is the netstat “-s” output:
$ netstat -s
Ip:
25030820 total packets received
269 with invalid addresses
0 forwarded
0 incoming packets discarded
21629075 incoming packets delivered
21110503 requests sent out
Icmp:
12814 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 2
echo requests: 12809
echo replies: 3
12834 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 22
echo request: 3
echo replies: 12809
IcmpMsg:
InType0: 3
InType3: 2
InType8: 12809
OutType0: 12809
OutType3: 22
OutType8: 3
Tcp:
138062 active connections openings
1440632 passive connection openings
7 failed connection attempts
2262 connection resets received
8 connections established
12225207 segments received
10785279 segments send out
10269 segments retransmited
0 bad segments received.
69115 resets sent
Udp:
553643 packets received
22 packets to unknown port received.
0 packet receive errors
6911684 packets sent
UdpLite:
TcpExt:
33773 invalid SYN cookies received
154132 TCP sockets finished time wait in fast timer
6 time wait sockets recycled by time stamp
72284 delayed acks sent
3 delayed acks further delayed because of locked socket
Quick ack mode was activated 269 times
3359 packets directly queued to recvmsg prequeue.
2592713 packets directly received from backlog
4021 packets directly received from prequeue
3557638 packets header predicted
1732 packets header predicted and directly queued to user
1939991 acknowledgments not containing data received
3179859 predicted acknowledgments
1631 times recovered from packet loss due to SACK data
Detected reordering 1034 times using FACK
Detected reordering 1007 times using SACK
Detected reordering 622 times using time stamp
1557 congestion windows fully recovered
4236 congestion windows partially recovered using Hoe heuristic
299 congestion windows recovered after partial ack
2 TCP data loss events
5 timeouts after SACK recovery
5 timeouts in loss state
2511 fast retransmits
2025 forward retransmits
88 retransmits in slow start
5518 other TCP timeouts
295 DSACKs sent for old packets
35 DSACKs sent for out of order packets
251 DSACKs received
25247 connections reset due to unexpected data
2248 connections reset due to early user close
6 connections aborted due to timeout
TCPSACKDiscard: 2707
TCPDSACKIgnoredOld: 65
TCPDSACKIgnoredNoUndo: 12
TCPSackShifted: 4176
TCPSackMerged: 2301
TCPSackShiftFallback: 98834
IpExt:
InMcastPkts: 2
OutMcastPkts: 3390453
InBcastPkts: 8837402
InOctets: 5156017179
OutOctets: 2509510134
InMcastOctets: 80
OutMcastOctets: 135618120
InBcastOctets: 2127986990
The netstat output contains a slew of data you can be used to see how much data your host is processing, if it’s accepting and processing data efficiently and if the buffers that link the various layers (Ethernet -> IP -> TCP -> APP) are working optimally.
When I build new Linux machines via kickstart, I make sure my profile contains the ktune package. That is all the tuning I do to start, unless an application or database requires a specific setting (think large pages and SysV IPC settings for Oracle).
Once I’ve met with an application resource and a business analyst, I like to pound the application with a representative benchmark and compare the system performance before and after the stress test was run. By comparing the before and after results I can see where exactly the system is choking (this is very rare), or if the application needs to be modified to accommodate additional load. If the application is a standard TCP/IP based application that utilizes HTTP, I’ll typically turn to siege and iPerf to stress my applications and systems.
If during load-testing I notice that data is being dropped in one or more queues, I’ll fire up dropwatch to observe where in the TCP/IP stack data is being dropped:
$ dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
1 drops at netlink_sendskb+14d (0xffffffff813df30e)
1 drops at ip_rcv_finish+32e (0xffffffff813f0c93)
4 drops at ip_local_deliver+291 (0xffffffff813f12d7)
64 drops at unix_stream_recvmsg+44a (0xffffffff81440fb9)
32 drops at ip_local_deliver+291 (0xffffffff813f12d7)
23 drops at unix_stream_recvmsg+44a (0xffffffff81440fb9)
1 drops at ip_rcv_finish+32e (0xffffffff813f0c93)
4 drops at .brk.dmi_alloc+1e60bd47 (0xffffffffa045fd47)
2 drops at skb_queue_purge+60 (0xffffffff813b6542)
64 drops at unix_stream_recvmsg+44a (0xffffffff81440fb9)
This allows you to see if data is being dropped at the link layer, the IP layer, the UDP/TCP layer or the application layer. If the drops are occurring somewhere in TCP/IP (i.e. inside the kernel) I will review the kernel documentation and source code to see what occurs at the specific areas of the kernel listed in the dropwatch output, and find the sysctl values that control the sizes of the buffers at that layer (some are dynamic, some are fixed).
Tuning applications to perform optimally has filled dozens and dozens of books, and it’s a fine art that you learn from seeing problems erupt in the field. It also helps to know how to intepret all the values in the netstat output, and I cannot recommend TCP/IP Volume I, TCP/IP Volume II and TCP/IP Volume III enough! Everyone who runs an IP connected system should be required to read these before they are allowed access to the system. :)
While debugging a problem last week I needed to quickly see if a version of bind supported the edns-udp-size configuration directive. Since RHEL3, RHEL4 and RHEL5 ship with different releases of Bind, I needed to see when this directive was introduced (EDNS was introduced in Bind 9, and the directive appears to have been there from the start). While I could have read through a crap load of documentation and release notes, I decided an easier approach would be to run nm against all of the binaries and libraries that come with the bind package. Low and behold it was:
$ strings named | egrep 'edns-udp-size.;'
edns-udp-size 4096;
Both the RHEL4 and RHEL5 named binary contained the configuration directive, though RHEL3 did not (RHEL3 ships with Bind 8). The strings tool is a great utility for dumping out ascii strings that are part of the ELF executable, and this simple one-liner (combined with find and xargs) saved me a whole lot of hassle.
I’ve been toying with the idea of reading one RFC a week and developing cliff notes so I can remember everything I read down the road. While I can always recall WHERE I read something, sometimes I need to go out and read the section in the RFC a second time to verify the details. All of the major protocols have a slew of RFCs associated with them, and I’m thinking about starting with DNS, moving on to SMTP, HTTP and NFS. While I’ve used solutions that heavily rely on these protocols, I’ve never read the RFCs from beginning to end. I’ve read large sections here and there to understand how an aspect of the protocol works, but never the entire thing. If you are interested in a set of cliff notes please add your comments. It takes a lot of time to write up stuff on the blog, so I don’t want to waste my time if there isn’t sufficient demand. :)