From tim@olorin.elsevier.nl Mon Jul 22 09:39:42 1996 Received: from snowdon.elsevier.co.uk (snowdon [193.131.197.164]) by cadair.elsevier.co.uk (8.6.13/8.6.12) with ESMTP id JAA01705 for ; Mon, 22 Jul 1996 09:39:41 +0100 Received: from epprod.elsevier.co.uk (actually host epprod) by snowdon with SMTP (PP); Mon, 22 Jul 1996 09:40:53 +0100 Received: from ns.elsevier.nl (ns.elsevier.nl [145.36.5.1]) by epprod.elsevier.co.uk (8.6.13/8.6.12) with ESMTP id JAA05145 for ; Mon, 22 Jul 1996 09:39:07 +0100 Received: from olorin.elsevier.nl by ns.elsevier.nl with SMTP (PP); Mon, 22 Jul 1996 10:41:01 +0200 Received: (from tim@localhost) by olorin.elsevier.nl (8.7.5/8.7.3) id KAA08133; Mon, 22 Jul 1996 10:40:57 +0200 (MET DST) From: Tim Rylance Message-Id: <199607220840.KAA08133@olorin.elsevier.nl> Subject: Re: PPP thruput on Solaris 2.5 (fwd) To: ajr@cadair.elsevier.co.uk (Ade Rixon), bill@li.net (Bill Groppe) Date: Mon, 22 Jul 1996 10:40:56 +0200 (MET DST) X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Status: OR Here's a good description of the TCP performance problems fixed by the recent patches. An interesting read if you like that sort of thing (I do). Tim Forwarded message: From: bryan@panix.com (Bryan Althaus) Newsgroups: comp.unix.solaris Subject: Re: PPP thruput on Solaris 2.5 Date: 15 Jul 1996 21:57:27 -0400 Organization: North Star Pub, NYC Lines: 305 Distribution: na Message-ID: <4sesu7$79f@panix.com> References: <4selb6$cjn@lyorn.mdd.comm.mot.com> N-Duc Le (nduc@df15h.mdd.comm.mot.com) wrote: : Hi, : Sometimes ago, someone posted a problem regarding the thruput : of PPP running on Solaris 2.5 (Sparc machine). According to that : person, thruput on PPP (thru ftp) was much slower than expected. : I've been having the similar problem. I am running PPP on solaris : 2.5 on SPARC LX. Everytime I do an ftp, the average thruput is : around 0.5 Kbytes/sec. The modems I am using both ends are : 28.8 kbit/sec = 3600 bytes/sec = 3.6 Kbytes/sec; Hence the : average ftp thruput is 1/6 of the maximum of modem. This is : very bad. : : The person who posted this problem originally had carried out : a lot of experiment and was able to fix (increase the thruput : of ftp). Please let me know if you have any suggestion, comments : or ideas. Also I do not mind contact the original person if : I am given the email address of the original person. : This is a bug, BUGID: 1233827, which has been fixed in a patch to be made available to everyone, with or without support contracts. Look for another post from Cathe Ray in the next week or so. Below is her last post stating a slight delay in the release, and the second post describing the actual problem. <<*>> ~From: cathe@beeblebrox.Eng.Sun.COM (Cathe A. Ray) ~Newsgroups: comp.unix.solaris,comp.sys.sun.admin,comp.infosystems.servers.unix ~Subject: Solaris TCP Retransmission patches - more info ~Date: 9 Jul 1996 05:13:39 GMT Organization: Sun Microsystems, Inc. I've gotten requests from several people for more information on the availability of the retransmission patches. The last posting had an ETA of today (7/8) for the 2.5 and 2.5.1 versions but they will be at least a couple of days late. We had to finish testing a late change that effected HTTP traffic retransmissions. This final fix has been tested by several customers and we are doing the final crank-turn on the patches for 2.5 and 2.5.1. They should be available by the end of the week at the latest and I'll post again as soon as they get pushed to the patch database. Because of the nature of the problem they will go to SunSolve and will be available to all, not just those with support contracts. The 2.4 version will still be at least 3 weeks away because it is part of the Kernel Jumbo Patch (KJP). The KJP contains fixes to all kernel modules, not just TCP or IP so the scale of testing is quite large. I will post when it is available but it probably won't be before 8/1. If you didn't see the original posting which contained all the gory details I can repost the technical piece when I let you know about the publich release of the patches. These will be the patch levels you will want to get when they are pushed to the patch database: BUGID: 1233827 SPARC: module 2.4 2.5 2.5.1 affected |-----------|-----------|-----------|-----------------| | 101945-42 | 103169-06 | 103630-01 | /kernel/drv/ip | | 101945-42 | 103447-03 | 103582-01 | /kernel/drv/tcp | |-----------|-----------|-----------|-----------------| X86: module 2.4 2.5 2.5.1 affected |-----------|-----------|-----------|-----------------| | 101946-36 | 103170-06 | 103631-01 | /kernel/drv/ip | | 101946-36 | 103448-03 | 103581-01 | /kernel/drv/tcp | |-----------|-----------|-----------|-----------------| PowerPC: module 2.5.1 affected |-----------|-----------------| | 103583-01 | /kernel/drv/ip | | 103632-01 | /kernel/drv/tcp | |-----------|-----------------| ETA for public availability of above patches: 2.5 & 2.5.1 - July 12, 1996 2.4 - August 1, 1996 <<*>> ~Newsgroups: comp.unix.solaris ~Subject: Announcing New TCP Performance Patch ~Date: 7 Jun 1996 23:36:21 GMT Organization: Sun Microsystems, Inc. Sun doesn't ordinarily announce patches when they're released. But we've just finished a series of TCP-related fixes and improvements, and we want to make sure that the news gets out as quickly as possible to the many people who can benefit from our work. This patch announcement will be of interest mostly to folks who use Sun workstations over "slow" links, like most dial-up lines. Please note, though, that you might benefit from the work we'll discuss here even if you've never used one of our workstations directly. (Many companies who provide Internet access use Suns as part of the communication path. And the patches are for Suns running Solaris 2.4 and up.) Also note: This message is coming to you directly from the engineers who did the work. We wanted to get the information out to you right away, but we really aren't trying to replace all the other Sun sources of information you might have access to. Please, don't send us lots of detailed questions--we're not volunteering to answer them (or even respond to many of the followups here). We just really wanted to make sure this message got out. Thanks. Cathe A. Ray Manager, Internet Engineering TCP Performance Improvements For Slow Network Links =================================================== Our Sun team is responsible for basic network communications software. We've been putting in a lot of work lately on improving the performance of TCP over slow network links. Now we're finished; testing is complete; and the patches (for Solaris 2.4 and later) will be available shortly. We undertook the work in response to feedback from customers serving WWW users over asynchronous PPP links. Users of LANs and WANs built on 10base-T and faster media never saw the problem behavior, which actually affected FTP and other TCP-based applications as well. With the new patches in place, slow links will operate with roughly the same efficiency as fast links. Without the patches, efficiency of very slow links could, under Solaris 2.5, sink to as low as 5 per cent of the theoretical maximum. In the following sections we will describe in detail what was wrong and how we fixed it. If you don't need to know all that, just check the table below for the patch numbers. They'll be available soon from our usual patch sources. We're confident that customers who have seen the problem will now observe a remarkable improvement. Others will see no change. HISTORY Strangely, the decline in throughput was the result of several improvements we made over the years to the TCP retransmission algorithms and parameters. Every change improved performance for systems with fast links. The cumulative effect for slow links was just the reverse; but almost all our systems--and our customers'--were hooked up to fast links, and the drawbacks went largely unnoticed. That was the state of affairs at the time 2.4 was released. By the time 2.5 came out, async hookups to the Web had exploded. We had implemented another relatively minor TCP bug fix. Customers with fast links were better off. The efficiency of slow links declined. We quickly learned we had a problem. We tracked down the inconsistencies and rewrote the code. We've redesigned the algorithm for good behavior across all supported configurations. We've added slow links and a wide mix of simulated platforms to our test beds, and tested the fixes in both high-speed and slow-speed networks. The problem is resolved. Excellence is a moving target. TECHNICAL DETAILS Here are some technical details. As you'll see, we've made it a pretty frank discussion. (Please be aware, though, that we do not intend to spend much time debating our decisions here.) The throughput troubles on slow lines result from an excessive rate of retransmissions. The rate, in turn, is caused by a mis-tuning adaptive algorithm. TCP packets are retransmitted if no response is received before a timeout period has expired. Our routines implement a variant of the familiar Karn and Jacobson adaptive algorithms, which attempts to predict an efficient timeout value based on the time it took previous packets to complete a roundtrip. Elapsed values are combined into a smoothed average roundtrip time ("RTT") and variance. The key elements in this calculation are the initial RTT value and the subsequent RTT's factored in. The changes we have made involve both of these key areas. INITIAL RTT VALUES As an unintended result of several cumulative changes, the kernel parameter "tcp_rexmit_interval_initial" was actually not being used. In fact, all Internet Routing Entry (IRE) RTT values were being initialized to 512 milliseconds. TCP was using that as an initial setting. For connections which flow through a route with a roundtrip time less than that (such as a LAN or WAN built on 10base-T) all was well. When the connection closed, the actual IRE RTT value was updated and the predictive timeout value successfully adjusted. For connections with an RTT greater then 512 ms, however, the timeout would necessarily trip, and retransmissions occur. If the actual time differed sufficiently from the original estimated value, TCP was never able to send a segment without one or more retransmissions. A realistic RTT for the route could never be established. This scenario is the beginning of the explanation of what has been happening on several-hop Internet or asynchronous PPP links. Our solution is to initialize all IRE RTT's to zero instead of 512 ms. Any new connection for a route will now, when lookup discloses the zero value, get the value of the "tcp_rexmit_interval_initial" parameter instead. (And it's been increased to 3 seconds.) So in most cases the adaptive algorithm will now be able to adjust timeout values effectively. RTO (RETRANSMIT TIMEOUT) ALGORITHM INTERACTION Another factor contributing to packet congestion and retransmission was a change to the RTO algorithm, introduced in a 2.4 Kernel Patch. The intent was to make the behavior more "conservative"--that is, lower the risk of poor timeout values. The effect on low-speed links was unexpectedly contrary. A key (and unintended) effect of the code change was that RTT data from retransmitted packets was discarded. This behavior, together with the poor initial RTT values described earlier, meant that the adaptive algorithm was deprived of the information needed to adjust the RTO. Our solution is to keep the RTO RTT update still conservative, but now update the RTO after no more than one receive window's worth of valid RTT's. Further, when an invalid RTT is seen--an ACK of a retransmitted segment, for example--any valid RTT information is fed into the RTO algorithm. ZERO WINDOW PROBE BUG FIX The problems described so far affect Solaris 2.4 and 2.5 equally. What changed with 2.5? One important fix we included in 2.5 was for the "zero window probe" bug, a well-publicized problem affecting just about all versions of UNIX. As part of that rewrite, we removed a nondescript piece of logic that implemented a simple "backoff" scheme. The excised code caused the RTO to be lengthened by one-eighth as a result of certain failures. It seemed not to be needed; but it had concealed the presence of the other bugs by providing a means for the RTO to reach a successful value. When this code was removed the other underlying problems were exposed. IRE RTT LOGIC This last part of the problem concerns the interaction between TCP and the Solaris-specific Internet Routing Entries. The IRE RTT logic caches RTT values to be re-used when a new connection is made over a familiar link. This is a fine approach. The implementation, however, had a flaw: the IRE RTT was updated regardless of the RTT value supplied by TCP. As you will have guessed by now, users of high-speed links saw no effect. But in highly variable RTT routes, when a connection dominated by small segments was closed, a problem could result. An RTT too short for large segments was used to update the IRE RTT, and a subsequent connection dominated by large segments (like FTP) experienced an excessive retransmission rate. It was a different path to a familiar dilemma: too small a timeout value. Naturally the most highly variable RTT's tend to be seen on async PPP links, where the RTT of the route is compounded from (1) wire latency, (2) low bandwidth, and (3) congestion/queuing delays as more than one segment is transmited by TCP. Our solution is to add an new ndd variable "tcp_rtt_updates". It allows tuning or disabling of IRE RTT updates. A value of zero disables IRE RTT updates. A value greater than zero specifies how many RTT updates to the RTO are required--that is, how many chances the algorithm has had to adapt the timeout--before a closing connection will be allowed to update the RTT in the IRE. CONCLUSION We've fished out, fixed, and explained some subtle flaws in our adaptive retransmission algorithm. We take the responsibility for introducing them--and the credit, too, for practically every piece was, by itself, a successful response to our customers' needs. Better and exhaustive testing would have shown up the flaws earlier, privately, harmlessly. That's always our goal, and our customers have a right to expect the best. Yes. There's always tomorrow. In the meantime: we killed this one, folks. Our sincere thanks for your attention--and your business. -- Cathe A. Ray | Love makes the world go `round SunSoft (415) 786-5178 | but Chocolate makes the trip worthwhile! cathe.ray@Eng.sun.com | -- Tim Rylance