Wednesday, November 12, 2008

End-to-End Internet Packet Dynamics

This long paper analyzes the results of an unprecedented large-scale study of internet packets, using traces obtained by sampling at 35 large internet AS's and a specialized measurement framework, resulting in thousands of measurements over the course of several days. This occurred two times, separated by a year. Overall the paper is a compendium of illuminating analyses, and I think is a necessary read. I'll comment a bit on the results I found most interesting.

Using these traces, the paper looks at network pathologies then analyzes aspects of performance, including packet loss, delay, and bottleneck bandwidth. Each of these sections are quite interesting given that there are nearly no studies that look at these on the scale of this paper. For example, they find that packet reordering is actually quite common (even though routers individually try very hard to avoid reordering) due to route changes, but that in fact TCP as implemented does not suffer much in terms of performance due to reordering. Even though the duplicate ack parameter was not chosen experimentally, it seems to do quite well (perhaps even ideally) in avoiding unnecessary retransmissions due to reordering. Observations such as this are very interesting and not present in many other papers.

Another observation is that packet corruption is relatively rare even though TCP's checksumming capabilities are not sufficient. Doubling the checksum at the cost of two bytes would make undected corruption negligibly rare, but it's not necessarily clear that this is big enough of a problem to matter: the E2E principle points to this being taken care of at a higher level being a better solution.

I also found the packet loss section very interesting. Loss rates in the two traces are both pretty high, and the trend is not downward. In addition, the difference between data packet loss and ack loss shows how data packets are controlled by TCP congestion avoidance but acks do not follow the same window-size lowering; also, most interestingly, having congestion in data does not necessarily mean congestion exists for acks and vice versa: this points to the prevalence of asymmetric routing due to the structure of inter-AS routing.

No comments: