[Top] [All Lists]

non-growing tcp window size

To: linux-net@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxx
Subject: non-growing tcp window size
From: Håkon Bugge <Hakon.Bugge@xxxxxxxxx>
Date: Tue, 21 Jan 2003 17:48:32 +0100
Sender: netdev-bounce@xxxxxxxxxxx
To whom it might concern,

We have observed bandwidth problems of TCP when running benchmarks which gradually increase its payload and having (most of its) data flowing unidirectional. The degradation is observed between two dual XEON/E7500 machines, using Gbe and a cross-over cable. The Gbe is eth1 and has an MTU of 9000. eth0 is a FE with an MTU of 1500. The machines is running linux 2.4.20, and the NIC in question is an Intel Corp. 82544GC Gigabit Ethernet Controller (rev 02).

During an attempt to change TCP parameter settings, and also systematically changing socket rx/tx buffer sizes, I discovered that once in a while, the benchmark ran well. The OK run was not deterministic, and had nothing to do with change of the parameter settings. Hence, I let the machine work during the week-end, and took tcpdumps which I saved for the successful run. An analysis of the "OK" run vs. the "BAD" run, discovers a couple of interesting things. The problem is that the advertised window of the receiver does not increase. The second problem, which also affects the "OK" run, is that the ratio of packets sent (from src to dst) and the number of packets received (#advertisements) is 1 for the "BAD" scenario (actually this is probably a consequence of the window being only ~2xMTU size). Surprisingly, it is as large as 0.5 for the "OK" scenario. IMHO, this violates RFC813, which states:

   There are two reasons  for  prompt  acknowledgement.    One  is  to
prevent  retransmission.  We will discuss later how to determine whether
unnecessary  retransmission  is  occurring.    The  other   reason   one
acknowledges  promptly  is  to permit further data to be sent.  However,
the previous section makes quite clear that it is not  always  desirable
to send a little bit of data, even though the receiver may have room for
it.    Therefore,  one  can  state  a  general  rule  that  under normal
operation, the receiver of data need not,  and  for  efficiency  reasons
should  not,  acknowledge  the data unless either the acknowledgement is
intended to produce an increased useable window, is necessary  in  order
to  prevent  retransmission  or  is  being  sent  as  part  of a reverse
direction segment being sent for some other reason.  We will consider an
algorithm to achieve these goals.
The two tcpdumps has been analyzed by a simple awk scripts, which gives information for every 1/10 of a second of the runtime:

time: elapsed time relative to the first packet
MB/s: sum of TCP payload (based on TCP sequence numbers) *1e-6 / delta time
avg_len: average packet length (TCP payload) sent
nsent: no of packets sent from src to dst
avg_win: average window size advertised by the src
adv/sent: ratio between #advertisements (sum of prompt acks and piggybacked acks) and #packets sent

The extract of the analysis is enclosed as "ok.txt" and "bad.txt". Also, the information is presented as graphs in "bw_winsiz.png" and "bw_adv-sent.png". In the graphs, the "OK" data uses the bottom x-axis, the "BAD" data uses the upper x-axis (the "BAD" run-time is longer due to the lower bandwidth). Bandwidth uses the left y-axes, whereas the average advertised window size and the ratio of the #advertisements and #packets sent uses the rightmost x-axis.

I do hope this information is useful 4u and that the problem can be fixed. Since I do not read the mailing lists, I would appreciate a note back by email of someone triggers on this.

Cheers, Håkon

Håkon Bugge; VP Product Development; Scali AS;
mailto:hob@xxxxxxxx;; fax: +47 22 62 89 51;
Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514;
Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway;
Mail Addr:  Scali AS, Postboks 150, Oppsal, N-0619  Oslo, Norway;

Attachment: bw_adv-sent.png
Description: PNG image

Attachment: bad.txt
Description: Text document

Attachment: bw_winsiz.png
Description: PNG image

Attachment: ok.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>