From tim@olorin.elsevier.nl  Mon Jul 22 09:39:42 1996
Received: from snowdon.elsevier.co.uk (snowdon [193.131.197.164]) by cadair.elsevier.co.uk (8.6.13/8.6.12) with ESMTP id JAA01705 for <ajr@cadair.elsevier.co.uk>; Mon, 22 Jul 1996 09:39:41 +0100
Received: from epprod.elsevier.co.uk (actually host epprod) by snowdon 
          with SMTP (PP); Mon, 22 Jul 1996 09:40:53 +0100
Received: from ns.elsevier.nl (ns.elsevier.nl [145.36.5.1]) 
          by epprod.elsevier.co.uk (8.6.13/8.6.12) with ESMTP id JAA05145 
          for <a.rixon@elsevier.co.uk>; Mon, 22 Jul 1996 09:39:07 +0100
Received: from olorin.elsevier.nl by ns.elsevier.nl with SMTP (PP);
          Mon, 22 Jul 1996 10:41:01 +0200
Received: (from tim@localhost) by olorin.elsevier.nl (8.7.5/8.7.3) id KAA08133;
          Mon, 22 Jul 1996 10:40:57 +0200 (MET DST)
From: Tim Rylance <t.rylance@elsevier.nl>
Message-Id: <199607220840.KAA08133@olorin.elsevier.nl>
Subject: Re: PPP thruput on Solaris 2.5 (fwd)
To: ajr@cadair.elsevier.co.uk (Ade Rixon), bill@li.net (Bill Groppe)
Date: Mon, 22 Jul 1996 10:40:56 +0200 (MET DST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Status: OR

Here's a good description of the TCP performance problems fixed by the
recent patches.  An interesting read if you like that sort of thing (I do).

Tim

Forwarded message:
From: bryan@panix.com (Bryan Althaus)
Newsgroups: comp.unix.solaris
Subject: Re: PPP thruput on Solaris 2.5
Date: 15 Jul 1996 21:57:27 -0400
Organization: North Star Pub, NYC
Lines: 305
Distribution: na
Message-ID: <4sesu7$79f@panix.com>
References: <4selb6$cjn@lyorn.mdd.comm.mot.com>

N-Duc Le (nduc@df15h.mdd.comm.mot.com) wrote:
: Hi,
:   Sometimes ago, someone posted a problem regarding the thruput
:   of PPP running on Solaris 2.5 (Sparc machine). According to that
:   person, thruput on PPP (thru ftp) was much slower than expected.
:   I've been having the similar problem. I am running PPP on solaris
:   2.5 on SPARC LX. Everytime I do an ftp, the average thruput is
:   around 0.5 Kbytes/sec. The modems I am using both ends are
:   28.8 kbit/sec = 3600 bytes/sec = 3.6 Kbytes/sec; Hence the 
:   average ftp thruput is 1/6 of the maximum of modem. This is
:   very bad.
: 
:   The person who posted this problem originally had carried out
:   a lot of experiment and was able to fix (increase the thruput 
:   of ftp). Please let me know if you have any suggestion, comments
:   or ideas. Also I do not mind contact the original person if
:   I am given the email address of the original person.
: 

This is a bug,  BUGID: 1233827, which has been fixed in a patch to be
made available to everyone, with or without support contracts.  Look
for another post from  Cathe Ray in the next week or so. Below is her
last post stating a slight delay in the release, and the second post
describing the actual problem.

<<*>>
~From: cathe@beeblebrox.Eng.Sun.COM (Cathe A. Ray)
~Newsgroups: comp.unix.solaris,comp.sys.sun.admin,comp.infosystems.servers.unix
~Subject: Solaris TCP Retransmission patches - more info
~Date: 9 Jul 1996 05:13:39 GMT
Organization: Sun Microsystems, Inc.

I've gotten requests from several people for more information on the
availability of the retransmission patches.  The last posting had an
ETA of today (7/8) for the 2.5 and 2.5.1 versions but they will be at
least a couple of days late.  We had to finish testing a late change that 
effected HTTP traffic retransmissions.  This final fix has been tested
by several customers and we are doing the final crank-turn on the
patches for 2.5 and 2.5.1.  They should be available by the end of the
week at the latest and I'll post again as soon as they get pushed to
the patch database.  Because of the nature of the problem they will
go to SunSolve and will be available to all, not just those with support
contracts.

The 2.4 version will still be at least 3 weeks away because it is part
of the Kernel Jumbo Patch (KJP).  The KJP contains fixes to all kernel
modules, not just TCP or IP so the scale of testing is quite large.  
I will post when it is available but it probably won't be before 8/1.

If you didn't see the original posting which contained all the gory
details I can repost the technical piece when I let you know about
the publich release of the patches.

These will be the patch levels you will want to get when they are
pushed to the patch database:

	BUGID: 1233827

	SPARC:
	                                         module
	     2.4          2.5        2.5.1      affected	
	|-----------|-----------|-----------|-----------------|
	| 101945-42 | 103169-06 | 103630-01 | /kernel/drv/ip  |
	| 101945-42 | 103447-03 | 103582-01 | /kernel/drv/tcp |
	|-----------|-----------|-----------|-----------------|

	X86:
	                                         module
	     2.4          2.5        2.5.1      affected
	|-----------|-----------|-----------|-----------------|
	| 101946-36 | 103170-06 | 103631-01 | /kernel/drv/ip  |
	| 101946-36 | 103448-03 | 103581-01 | /kernel/drv/tcp |
	|-----------|-----------|-----------|-----------------|

	PowerPC:
	                 module
	    2.5.1       affected
	|-----------|-----------------|
	| 103583-01 | /kernel/drv/ip  |
	| 103632-01 | /kernel/drv/tcp |
	|-----------|-----------------|


	ETA for public availability of above patches:

	2.5 & 2.5.1 	- July 12, 1996

	2.4		- August 1, 1996

<<*>>

~Newsgroups: comp.unix.solaris
~Subject: Announcing New TCP Performance Patch
~Date: 7 Jun 1996 23:36:21 GMT
Organization: Sun Microsystems, Inc.

Sun doesn't ordinarily announce patches when they're released. But
we've just finished a series of TCP-related fixes and improvements, and
we want to make sure that the news gets out as quickly as possible to
the many people who can benefit from our work.

This patch announcement will be of interest mostly to folks who use Sun
workstations over "slow" links, like most dial-up lines.  Please note,
though, that you might benefit from the work we'll discuss here even if
you've never used one of our workstations directly.  (Many companies
who provide Internet access use Suns as part of the communication path.
And the patches are for Suns running Solaris 2.4 and up.)

Also note: This message is coming to you directly from the engineers
who did the work. We wanted to get the information out to you right
away, but we really aren't trying to replace all the other Sun sources
of information you might have access to. Please, don't send us lots of
detailed questions--we're not volunteering to answer them (or even
respond to many of the followups here). We just really wanted to make
sure this message got out. Thanks.

Cathe A. Ray
Manager, Internet Engineering


	TCP Performance Improvements For Slow Network Links
	===================================================

Our Sun team is responsible for basic network communications software.
We've been putting in a lot of work lately on improving the performance
of TCP over slow network links. Now we're finished; testing is
complete; and the patches (for Solaris 2.4 and later) will be available
shortly.

We undertook the work in response to feedback from customers serving
WWW users over asynchronous PPP links. Users of LANs and WANs built on
10base-T and faster media never saw the problem behavior, which
actually affected FTP and other TCP-based applications as well.

With the new patches in place, slow links will operate with roughly the
same efficiency as fast links. Without the patches, efficiency of very
slow links could, under Solaris 2.5, sink to as low as 5 per cent of the
theoretical maximum.

In the following sections we will describe in detail what was wrong and
how we fixed it. If you don't need to know all that, just check the
table below for the patch numbers. They'll be available soon from our
usual patch sources. We're confident that customers who have seen the
problem will now observe a remarkable improvement. Others will see no
change.

HISTORY

Strangely, the decline in throughput was the result of several
improvements we made over the years to the TCP retransmission
algorithms and parameters. Every change improved performance for
systems with fast links. The cumulative effect for slow links was just
the reverse; but almost all our systems--and our customers'--were
hooked up to fast links, and the drawbacks went largely unnoticed. That
was the state of affairs at the time 2.4 was released.

By the time 2.5 came out, async hookups to the Web had exploded. We had
implemented another relatively minor TCP bug fix. Customers with fast
links were better off. The efficiency of slow links declined. We
quickly learned we had a problem.

We tracked down the inconsistencies and rewrote the code. We've
redesigned the algorithm for good behavior across all supported
configurations. We've added slow links and a wide mix of simulated
platforms to our test beds, and tested the fixes in both high-speed and
slow-speed networks. The problem is resolved.

Excellence is a moving target.


TECHNICAL DETAILS

Here are some technical details. As you'll see, we've made it a pretty
frank discussion. (Please be aware, though, that we do not intend to
spend much time debating our decisions here.)

The throughput troubles on slow lines result from an excessive rate of
retransmissions. The rate, in turn, is caused by a mis-tuning adaptive
algorithm.

TCP packets are retransmitted if no response is received before a
timeout period has expired. Our routines implement a variant of the
familiar Karn and Jacobson adaptive algorithms, which attempts to
predict an efficient timeout value based on the time it took previous
packets to complete a roundtrip. Elapsed values are combined into a
smoothed average roundtrip time ("RTT") and variance.

The key elements in this calculation are the initial RTT value and the
subsequent RTT's factored in. The changes we have made involve both of
these key areas.


INITIAL RTT VALUES

As an unintended result of several cumulative changes, the kernel
parameter "tcp_rexmit_interval_initial" was actually not being used. In
fact, all Internet Routing Entry (IRE) RTT values were being
initialized to 512 milliseconds. TCP was using that as an initial
setting.

For connections which flow through a route with a roundtrip time less
than that (such as a LAN or WAN built on 10base-T) all was well. When
the connection closed, the actual IRE RTT value was updated and the
predictive timeout value successfully adjusted.

For connections with an RTT greater then 512 ms, however, the timeout
would necessarily trip, and retransmissions occur. If the actual time
differed sufficiently from the original estimated value, TCP was never
able to send a segment without one or more retransmissions. A realistic
RTT for the route could never be established.  This scenario is the
beginning of the explanation of what has been happening on several-hop
Internet or asynchronous PPP links.

Our solution is to initialize all IRE RTT's to zero instead of 512 ms.
Any new connection for a route will now, when lookup discloses the zero
value, get the value of the "tcp_rexmit_interval_initial" parameter
instead.  (And it's been increased to 3 seconds.)  So in most cases the
adaptive algorithm will now be able to adjust timeout values effectively.


RTO (RETRANSMIT TIMEOUT) ALGORITHM INTERACTION

Another factor contributing to packet congestion and retransmission was
a change to the RTO algorithm, introduced in a 2.4 Kernel Patch.  The
intent was to make the behavior more "conservative"--that is, lower the
risk of poor timeout values. The effect on low-speed links was
unexpectedly contrary.

A key (and unintended) effect of the code change was that RTT data from
retransmitted packets was discarded. This behavior, together with the
poor initial RTT values described earlier, meant that the adaptive
algorithm was deprived of the information needed to adjust the RTO.

Our solution is to keep the RTO RTT update still conservative, but now
update the RTO after no more than one receive window's worth of valid
RTT's. Further, when an invalid RTT is seen--an ACK of a retransmitted
segment, for example--any valid RTT information is fed into the RTO
algorithm.


ZERO WINDOW PROBE BUG FIX

The problems described so far affect Solaris 2.4 and 2.5 equally. What
changed with 2.5?

One important fix we included in 2.5 was for the "zero window probe"
bug, a  well-publicized problem affecting just about all versions of
UNIX. As part of that rewrite, we removed a nondescript piece of logic
that implemented a simple "backoff" scheme. The excised code caused the
RTO to be lengthened by one-eighth as a result of certain failures. It
seemed not to be needed; but it had concealed the presence of the other
bugs by providing a means for the RTO to reach a successful value. When
this code was removed the other underlying problems were exposed.


IRE RTT LOGIC

This last part of the problem concerns the interaction between TCP and
the Solaris-specific Internet Routing Entries. The IRE RTT logic caches
RTT values to be re-used when a new connection is made over a familiar
link.

This is a fine approach. The implementation, however, had a flaw: the
IRE RTT was updated regardless of the RTT value supplied by TCP.

As you will have guessed by now, users of high-speed links saw no
effect.  But in highly variable RTT routes, when a connection dominated
by small segments was closed, a problem could result. An RTT too short
for large segments was used to update the IRE RTT, and a subsequent
connection dominated by large segments (like FTP) experienced an
excessive retransmission rate. It was a different path to a familiar
dilemma: too small a timeout value.

Naturally the most highly variable RTT's tend to be seen on async PPP
links, where the RTT of the route is compounded from (1) wire latency,
(2) low bandwidth, and (3) congestion/queuing delays as more than one
segment is transmited by TCP.

Our solution is to add an new ndd variable "tcp_rtt_updates". It allows
tuning or disabling of IRE RTT updates. A value of zero disables IRE
RTT updates. A value greater than zero specifies how many RTT updates
to the RTO are required--that is, how many chances the algorithm has
had to adapt the timeout--before a closing connection will be allowed
to update the RTT in the IRE.


CONCLUSION

We've fished out, fixed, and explained some subtle flaws in our
adaptive retransmission algorithm. We take the responsibility for
introducing them--and the credit, too, for practically every piece was,
by itself, a successful response to our customers' needs. Better and
exhaustive testing would have shown up the flaws earlier, privately,
harmlessly.  That's always our goal, and our customers have a right to
expect the best. Yes.

There's always tomorrow. In the meantime: we killed this one, folks.
Our sincere thanks for your attention--and your business.

-- 

Cathe A. Ray                    | Love makes the world go `round
SunSoft (415) 786-5178		| but Chocolate makes the trip worthwhile!
cathe.ray@Eng.sun.com    	|


-- 
Tim Rylance <t.rylance@elsevier.nl>