Monday, November 24, 2008

Policy-Aware Switching Layer for Data Centers

In this paper, the authors attempt to build a new infrastructure for data centers that allows policy-aware arbitrary chainings of middleboxes, which, as we have seen in previous papers, are useful but do not fit into the clean layering semantics of the network protocols. Traditionally, physical connectivity is used to ensure that messages traverse a specific set of middleboxes in a data center, but this is prone to many problems, as the paper spends a section pointing out.

As we've studied the various problems of middleboxes, I'll only mention the one new thing that I hadn't thought about in this paper was the fact that spanning-tree algorithms etc. at layer-2 can be very difficult to coax into the proper topology for actually ensuring the logical topology of middleboxes is enforced; I hadn't thought of the fact that some middleboxes rewrite ethernet frames and/or are not accessibile via IP address, causing this reliance on layer-2.

The actual idea is fairly simple: using modified commodity switches divided into a "switching core" and "policy core" where the two pieces look like normal switch components to each other in order to express a centralized policy that is distributed to each of the so-called pLayer switches in a manner that ensures the policy is enforced.

However, the requirement that unmodified middleboxes work with the system causes a number of headaches. Firstly, stateful boxes must always get the frames/packets that require the state (such a specific instance of a firewall getting all packets associated with a flow). In addition, the authors wish to make sure the policies are respected under all kinds of churn: allowing some laxness in the churn rules would make the resulting implementation quite a bit easier. This is the area where the most "hacky" thing in the paper comes out: the use of intermediate middleboxes of each type in order to ensure robustness under churn. I see no alternative way to implement this, without modifying the middleboxes or allowing some failures when churn occurs.

Overall this paper is a thorough design document for a reasonable system and is useful as such. Performance is less important in this phase of implementation, but I believe that as long as switching decisions can be done quickly, intra-datacenter latencies are low, and pSwitches implement fast matching, the system will not add an appreciable amount of latency when compared to internet latencies, while introducing a mechanism that is useful to almost every datacenter of decent size.

Monday, November 17, 2008

Delay-Tolerant Network Architecture for Challenged Internets

"Challenged Internets" are a class of network that have characteristics such as long delays, low reliability, and periodic disconnection that make the IP protocol an infeasible way to interconnect such networks; each network uses ad-hoc protocols that do not necessarily provide the baseline best-effort packetized service required by IP. Furthermore, TCP and similar protocols require end-to-end connectivity, which may rarely happen in these challenged networks. Internetworking them, in the vision of the paper, requires a new set of underlying protocols similar in spirit to IP but one that takes into account the differing requirements of these networks.

The argument in the paper is that using PEPs or middleboxes to make these challenged networks appear as "normal" IP networks is insufficient and only really works if the challenged networks are on the edges. I agree with the latter part of the argument; it doesn't seem very easy to connect two different such networks through IP middleboxes in the middle of an internet.

Basically, the new internetworking protocol for challenged networks needs to provide several classes of service that are unlike IP services. In the paper they use USPS as a basic point of departure for the kinds of service these networks require, and propose to have the same basic set of services. To me, the difference between these networks and the internet is that intermediate nodes must have more than just transient state in the connection; it really is a hop-to-hop connection where at each point, the endpoints OF THE HOP have all the state.

One interesting thing is how they propose to handle naming. In this approach, there is a global set of names that are world-resolvable, and within each set, a locally-resolvable name is used. Thus a tuple represents "route to this network" followed by the address of the resource as a locally-resolvable name in the network. This is probably doable given the limited size (in number of networks) such an internet would have.

Architecture for Internet Data Transfer

This paper advocates for and designs a service-oriented approach to data transfer, in which applications rely on a lower-layer transfer service to to send data from one location to another, instead of reimplementing different transfer methods in each application. The idea is that the service would be the only implementation of each transfer backend, and that by using the service, disparate applications can benefit without requiring reengineering of each application for each transfer type.

This service, called DOT, is designed to not only perform transfers, but to also cache data in case it can be used to shortcut future transfers; combined with the right kind of hashing, this allows a reduction in redundant data transfers even if the data do not look exactly the same. To me, that was the most interesting part of the paper; the parts outlining the interfaces and APIs are relatively straightforward. The service is receiver-pull, and data is "pointed to" by its hash as in other systems we've studied this semester. Applications pass OIDs (which contain the hash and some other data) and the receiver initiates chunk-oriented transfers; the chunking allows caching to be more effective (for example, email message replies combined with proper chunking algorithms allow chunk caching to bypass sending the previous message's text).

DOT also exports a multi-path plugin which allows multiple transfer methods to be used to speed up transfers, something that shows the power of the system: implementing something like that would be much more difficult in a less-modular, application-specific backend for each application. Benchmark results show savings has high as 52%, which is is substantial.

Lastly, the authors do a case study with the postfix email server that demonstrates the ease of changing apps to use DOT, as well as the potential savings.

Overall, the authors have a nice idea that could be useful for reducing the amount of work for adding new transfer methods to applications. However, it seems to me that they have a system where the problem is not egregious enough to neccessarily convince application writers to adopt their system. I don't believe that transfer methods are seen as a major problem in the internet, and thus the need for DOT seems less than obvious.

Wednesday, November 12, 2008

X-Trace

This paper describes the X-Trace network tracing framework, which operates at all protocol layers in concert to obtain comprehensive information about network activity, including causal information as well as propagation of calls from layer to layer. This is done by propagating trace information to each layer and ensuring all trace information is conveyed to some accessible location.

One major goal of the system is to ensure that each domain can control their trace data, which means that although a user may want comprehensive information for the entire life of their application, they may be limited by what each domain conveys. This makes sense from an economic and security standpoint, and I think is a good idea.

The basic way the system works is that each protocol implementation is modified to enable X-Trace tracking. This is highly invasive, but the results make the effort worthwhile in many cases. The degree of modification is different for the different layers and protocols; some become very complicated and I'm not sure they're worth modifying. Still, the information obtained is quite compelling, including causality and the ability to trace from the beginning of a call all the way to the end, through each layer.

One thing that stood out was that the modification of these protocols, basically by interfacing with libraries, does have a security impact. The libraries had better robust against all kinds of attacks; even if the infrastructure itself is safe from attack, the library implementations having a flaw could make *all* protocols vulnerable to a universal attack if a bug is found. However, it may be that the library layers are thin enough that they can be extensively security-audited and ensured to be "safe."

End-to-End Internet Packet Dynamics

This long paper analyzes the results of an unprecedented large-scale study of internet packets, using traces obtained by sampling at 35 large internet AS's and a specialized measurement framework, resulting in thousands of measurements over the course of several days. This occurred two times, separated by a year. Overall the paper is a compendium of illuminating analyses, and I think is a necessary read. I'll comment a bit on the results I found most interesting.

Using these traces, the paper looks at network pathologies then analyzes aspects of performance, including packet loss, delay, and bottleneck bandwidth. Each of these sections are quite interesting given that there are nearly no studies that look at these on the scale of this paper. For example, they find that packet reordering is actually quite common (even though routers individually try very hard to avoid reordering) due to route changes, but that in fact TCP as implemented does not suffer much in terms of performance due to reordering. Even though the duplicate ack parameter was not chosen experimentally, it seems to do quite well (perhaps even ideally) in avoiding unnecessary retransmissions due to reordering. Observations such as this are very interesting and not present in many other papers.

Another observation is that packet corruption is relatively rare even though TCP's checksumming capabilities are not sufficient. Doubling the checksum at the cost of two bytes would make undected corruption negligibly rare, but it's not necessarily clear that this is big enough of a problem to matter: the E2E principle points to this being taken care of at a higher level being a better solution.

I also found the packet loss section very interesting. Loss rates in the two traces are both pretty high, and the trend is not downward. In addition, the difference between data packet loss and ack loss shows how data packets are controlled by TCP congestion avoidance but acks do not follow the same window-size lowering; also, most interestingly, having congestion in data does not necessarily mean congestion exists for acks and vice versa: this points to the prevalence of asymmetric routing due to the structure of inter-AS routing.

Thursday, November 6, 2008

Internet Indirection Architecture

Similar in goals to the previous paper, this work presents i3, which uses an API to build composable rendevouz-based communication to result in middleboxes that do not need to be physicall interposed between the two communicating ends. A receiver inserts "triggers" which live in the network an can be matched to packets that are being sent into the network; matching triggers cause the intended action to occur (which can be a send to an endpoint or set of endpoints). Like the previous paper, they use a DHT to store the triggers and ensure that all triggers with the same prefix match (which is used for multicast and anycast) gets stored on the same server for better performance.

The example applications here are mobility, multicast, and anycast, but from my reading, in principle one could create a firewall. I'm not sure about NAT-like machinery, however.

The thing I liked best about this paper is that the basic algorithm is very simple, and allows both senders and receivers to define intermediaries that are composable. With some additional complexity, a few optimizations can be introduced.

However, like the previous paper, the security concerns are many. The paper considers protection against some of the possible attacks, but any time we introduce a level of indirection into the network, the potential attack vectors must increase. I think some of the issues they bring up can be resolved pretty easily, but things like having to check for loops in the routing makes insertion of triggers take much longer since you need to check if the insertion creates a loop.

Overall, this paper has a simpler algorithm, but the security issues remain.


Middleboxes No Longer Considered Harmful

In this paper, the authors attempt to build a system that allows "middleboxes" such as NATs and firewalls to be used without violating two architectural rules that current middleboxes ignore: each entity in the internet has a unique global fixed IP address, and that network elements only process packets addressed to them (which is odd, given that routers MUST process elements not addressed to them, and one could certainly call them network entities, given the fact they have IP addresses). It seems to me that rule 1 is more interesting, in that I can see how not having globally-addressable elements makes things difficult in the internet, but I'm not sure about rule 2's importance.

Nevertheless, the authors are building an infrastructure to allow intermediaries, which are middleboxes that do not physically sit in front of a network endpoint and can be composed. Their system, called DOA (an ironic name) functions somewhat like a DNS system at a routing level: packets are addressed to EIDs which resolve either to other EIDs (the intermediaries) or to an IP address. EIDs are stored in a DHT and are self-certified to help with security; however, even the authors point out that this does not guard against MITM attacks, nor does it lessen the need for DHT security (which is already a difficult problem).

The paper goes on to describe implementations of NAT-like and firewall-like DOA boxes. To be honest, I found these sections incredibly dense and bogged down in details; each of the two types of boxes requires what seems like an immense amount of complexity. The performance isn't terrible, although the numbers reported are somewhat odd: for example, they report min and max for DNS but median for EID lookup in the DHT, which does not mean the numbers are comparable.

Overall, DOA seems like a very complex system with many security issues to deal with.

Monday, November 3, 2008

Development of the Domain Name System

This paper outlines the process of designing the Domain Name System as the Internet grew to a size that made a centralized name-to-IP address mapping impossible. As a result, designers attempted to create a distributed database that is decentralized and performant enough to scale to large numbers of names. Reflecting the direction that the network was taking, the DNS system was designed in a hierarchical manner where organizations are responsible for their own subset of the naming world.

In addition, DNS was supposed to be extensible to cover more than just name-to-IP mappings. As far as I know, the only areas where this was successful was for MX mappings and reverse DNS; as the paper points out, because creating new mappings require agreement between the community to give semantics to the new mapping. The most difficult part is community-centric work, not anything technical.

DNS has clearly been successful for email and for name resolution. However, in terms of the extensibility, I would say that the difficulties outlined in the paper still exist and have not been resolved. In addition, the caching, although it is important to obtain decent performance (or, as the paper says, "tolerable performance") I am not convinced that the caching actually helps for most cases: since DNS has become a mechanism for load balancing, the most popular sites will have low TTL values to ensure proper load balance between actual machines; in addition, these are the same names that could benefit from caching.

The paper touches slightly on the security issues, but caching necessarily can cause problems due to spoofing; if someone spoofs a name server and separately ensures that a query is cached at a local name server with a high TTL, they can nearly permanently override the proper owners; I believe similar attacks have occurred in practice. To address this, I believe security has been added through DNSSEC although I'm not familiar with how it works.

One interesting thing in the paper was that they envisioned the DNS also being used for things like having a universal address for files; I think this basically came about through the URI scheme which leverages naming from DNS without having to add another record type; names map to IP addresses and the path in the rest of the URI results in a protocol-specific path on the server. Still, this was an interesting forward-thinking item that did not happen due to the problems outlined above in defining a well-understood semantic meaning to the mapping in the DNS database records.

Wednesday, October 29, 2008

Chord

This paper presents Chord, one of the first implementations of a DHT. Using consistent hashing, where both the keys and the nodes live in the same single-dimensional space, the DHT assigns keys to nodes by looking at the closest node in the space that follows the key. Thus, there is no ambiguity in a stable DHT when it comes to which node is responsible for which key, and, because the nodes are distributed evenly in the space, with high probability each node is responsible for about the same number of keys.

Dealing with a query is done by either responsing to the query (if the current node is responsible for the queried key) or by forwarding the query to a node closer to the key in the identifier space. Using a skiplist like structure (called "finger pointers" here), Chord can use log N queries to find an item in the hash because the skiplist maintains pointers to power-of-2 distances; this means that in each forwarding operation, Chord makes the query get at least half the remaining distance closer than it was before.

The evaluation in this paper is interesting. One optimization they make is to have "virtual nodes" such that a physical node is logically several nodes at once; the one problem I see this causing (which doesn't seem to be addressed in the paper) is that this results in larger amounts of nodes going down when a single physical node disconnects or fails. An evaluation of this effect is important. The nodes that fail, then, are not in fact random as in section 6.4; nonrandom failures may cause Chord to fail.

Lastly, there is no thought given to robustness in the face of malicious nodes. This seems like an important aspect of a DHT, or any overlay network because these overlays are over the public internet.

Overall however, this is a nice paper and although it is missing some important considerations, I think it is a classic in terms of presenting a simple, elegant data structure that uses an overlay network and some performance results.

Looking Up Data in P2P Systems

This paper introduces the Distributed Hash Table data structure, which is a mapping from key to which node in an overlay network has the corresponding data to that key. DHTs are useful in creating a distributed store of key-value pairs existing in an overlay network, while remaining resilient to nodes entering and leaving the network as well as node failures.

The DHT implementations in the paper work by, when receiving a request for a lookup, either answering the lookup if the node is responsible for that key, or by forwarding the request to a node "closer" to the node responsible for the key. As a result, the paper divides the DHT implementations into those looking at distances in a single dimension (Chord, Pastry) and those looking at distances in multiple dimensions (CAN). Of these, the simplest is Chord, and the most complex routing algorithm described is that of CAN.

The open questions section is the most useful here: real-world fault tolerance under many simultaneous connections, disconnections, and failures is difficult to get right and it is unclear if any of the existing DHTs do a particularly good job. For example, the routing algorithm as described for CAN requires, after some disconnections and/or failures, a redistribution of the space which can be expensive. Although the paper does not say as much, it seems that perhaps the DHTs are not as robust (to failures as well as to malicious users) as their authors would like, nor are they as performant as users would like. These seem like the major issues to address.

Tuesday, October 28, 2008

Active Networks Vision and Reality

This paper presents lessons learned through the ANTS implementation of active networks, which are a kind of overlay network that allow users to write the routing code executed on each node within the network. The basic architecture is that each node has a runtime executing, while packets choose (by reference: an MD5 hash is a "pointer" to the code) which routing program to run. Such programs can be arbitrarily injected into the network, as long as they have been certified by some central authority that says they are not malicious.

The security issues of ANTS take much consideration in the design. They attempt to mitigate the threat of running arbitrary code by limiting the kinds of operations that can occur, the state available to read and write, and by using a central certifying authority to guarantee safety. In fact, if one is going to use a certifying authority, it may make sense to shortcut the safety guarantees of the runtime and allow more efficient machine code to run on each router instead of java bytecodes running in a VM. It seems the reason the implementation works as it does is that the designers wanted to have arbitrary non-certified code running, but could not figure out how to make global guarantees in terms of not allowing a single user or program from consuming more than a fair share of global resources.

This could be tackled by having more coupling between routing nodes, using some sort of gossip protocol or other way to exchange information. Such algorithms could guarantee that only some fraction of the routers can lie at a time, making the information somewhat reliable. However, there may just be no way to represent global resource usage in a way that reliably guarantees the information is accurate enough to make a decision whether to run a program that is not certified.

Resilient Overlay Networks

In this paper, an application-layer overlay network is built using UDP in order to build a more resilient routing infrastructure that, because of its smaller size, can respond to disruptions and slowdowns faster and more flexibly than the traditional BGP approach between ASs. The particular implementation of these application-level routers in this paper uses a single-hop intermediary between each node in order to route around failures; this gives essentiall n-2 paths where before there was as single path (as controlled by information obtained via BGP).

With this path multiplicity, applications can choose how which paths (or under which metrics paths should be chosen) depending on their requirements. In addition to this application-level choice, routers can be more powerful in terms of having much more detailed policies for routing, since the network is essentially a private one built on top of the public internet and because the networks targeted are small relative to the size of ASs. Thus routers can spend more time per packet than a gateway router in the internet can.

One thing that jumps out to me is that RON (unlike the next paper) is geared towards a network that has nodes that can be implicitly trusted. For example, the use of policy tags to bypass some of the per-packet computation is subject to trusting the policy tags you receive; there is nothing to prevent a malicious user from rewriting policy tags to ensure they get the highest level of service while others do not.

The evaluation results are very interesting. In almost every metric, RON improves the metric *in most cases*, but there are always some measurements under which RON does substantially worse. For example, in dealing with packet loss, RON helps improve path reliability in most cases, but there are always a few percentage of samples that show worse behavior during outages: that is, some packets that would be routed if using just underlying connectivity do not get routed with RON. Similarly, RON can make loss rates worse (although the authors believe this is due to using bidirectional information instead of unidirectional; it remains to be seen if that would make a difference).

Monday, October 20, 2008

First-Principles Approach to Understanding Internet Router-Level Topology

This paper also attempts to understand how routers interconnect in the Internet, but instead of using power laws, they use a "first principles" approach by taking into account both technology restrictions of routers and the economic principles guiding ASs to interconnect routers--- this means looking at hierarchies of ASs and routers as well. That is, taking into account that ASs can be level-1, level-2, or level-3 and therefore have proportional traffic to their level.

Criticizing the previous paper's methodology, the authors rightly claim that power laws hold for graphs that look much different from each other; edge rerouting schemes can take one power law graph and turn it into another with very different connectivity and radius but with the same distribution, for example, of ranks. They also point out that there is little evidence the power law means anything for the topologies since there were so few data points.

I think the methodology here is much more sound. First, they actually use subsets of the internet as well as a model of how routers can connect hierarchically in the ideal case; the internet topology is found to be in a region that, measured with their "likelihood" metric, can only have come about by using non-random strategies to connect routers. Therefore generating graphs randomly, even if they follow some distribution of ranks and other metrics, does not seem like a good way to create a representative topology.

Power Law Relationships of the Internet Topology

This paper investigates aspects of internet topology, determining whether power laws of the form "x is proportional to y^a" for quantities x, y, and constant "a" exist in the graphs representing the Internet at the inter-AS and inter-router level. The motivation of this is that using averages/min/max distills the data too far: it gives very little information about the actual distribution of values.

As an alternative, this paper proposes essentially ordering the nodes (whether they are ASs or routers) by some metric, and then expressing the slope of the distribution on a log-log scale (which is "a" in the power law). Then, to evaluate the accuracy/effectiveness of the power law, the authors correlate the slope with the actual values (i.e. they measure the closeness of the fit). In addition, it is important to remember the paper's goal is to come up with some good metrics to measure the validity of internet graph model generators.

The first thing that jumps out to me about the paper is that the dataset used to determine the fits is quite small: three inter-AS topologies from 6-month intervals and a single router topology from a different, earlier time period. Along with the fact that the datasets are so small, there is no measure of accuracy for any of the data points; the fact that the router topology does not correspond to the inter-AS data also makes it difficult to tell whether the fits mean anything.

Admittedly, I'm out of my depth in trying to evaluate the soundness of their statistical methods, but the authors seem conclude that the close correlations are evidence enough that the power laws actually mean something. Further, it is difficult to tell exactly what the power laws mean in terms of how ASs change their interconnections over time according to the goals and restrictions of each. For example, their 1.45x number of nodes increase per year seems like an interesting number; although it may hold for ASs, I think for routers it is unlikely to hold (not that they could test it, given their one data point) since ASs may add routers or replace them.

Wednesday, October 8, 2008

XORs in the Air

This paper extends the idea of using the broadcast nature of the wireless medium by adding another source of optimization: routers can encode packets if they know that the next hop can decode the packets; a router can know this by listening in on other transmissions. This basically allows the algorithm to increase the theoretical capacity of the wireless links, since in some cases multiple packets destined for several next hops can be sent as a single packet, given knowledge of what each destination has heard.

The routing algorithm, called COPE, eavesdrops on neighboring transmissions. Each transmission contains information about what packets that node has heard, as well as acknowledgements for packets it has received, along with the packet(s) it is transmitting (some XORed together). In some cases, it aggressively encodes packets without knowing for sure that the receivers can decode them; this utilizes ETX to figure out the forward probability of each link. One interesting thing here is that unlike ExOR, it seems all packets are sent this way and none use traditional routing (there is no arbitrary 90% metric of where to go from one regime to another).

COPE utilizes bitmaps in a number of cases to reduce the overhead of the routing algorithm. One place this occurs is in the ACKing mechanism; there is an independent link-layer ACK algorithm for COPE, necessary because the aggressive encoding may be too aggressive (and, of course, the links are shared/unreliable).

COPE for UDP increases throughput, sometimes by large amounts, depending on the particular topology. However, the major problem with COPE is that it interacts badly with TCP due to to TCP's problems with hidden nodes; COPE cannot help throughput because hidden nodes combined with congestion control gives small queues and little opportunity for coding. This weakness is pretty major, but I think that some obvious refinements may change this. For example, having TCP-awareness in the link algorithm (as in a previous paper we read) could make the ACKing mechanism of the link layer work better with TCP ACKs to avoid unnecessary congestion control.

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks

ExOR is the first wireless algorithm we look at that exploits the broadcast nature of wireless networks to build a new routing algorithm. In ExOR, the next hop is chosen after the transmission for that hop; that is, the route depends on who receives the message and not JUST on the current routing state of the network (as in DSDV) or the route information at the sender (like DSR). The major part of the algorithm is limiting duplicate transmissions from multiple nodes that think they should be sending the next hop; for this, each node listens to make sure any higher-priority intermediate node has not already forwarded the packet. Thus there is a tradeoff between waiting for a higher-priority routes and reducing overall routing latency.

ExOR's performance gain comes from essentially having many potential routes to choose from, unlike traditional routing which only has an all-or-nothing path for each hop. This works best when there is independent interference at each.

Looking at the algorithm as described, there's several design points that seem somewhat arbitrary. For example, the backoff when no information is available about higher-priority nodes is five packet times; I fail to see why this is a reasonable assumption, although it may work in practice. That is, there is no evidence this parameter is well-chosen. Another arbitrary number is the ultimate destination sending 10 batch map packets; again, there's no reason to expect this is optimal, nor is there reason to believe that the algorithm is insensitive to this parameter.

In their tests with UDP, ExOR improves throughput. Some of this is undoubtedly due to ExOR being able to choose paths as they are routed, but I would like to see how ExOR compares to a modified 802.11 algorithm that integrates two changes to ACKing: one, having ten ACKs sent instead of one, and second, having ACKs flow backwards to the source. I'm curious to see if the impact of these two decisions can be quantified, perhaps leading to a slight change in 802.11 routing to make it perform almost as well as ExOR.

Tuesday, October 7, 2008

Performance Comparison of Multihop Wireless Ad-Hoc Routing Protocols

This paper compares several different routing protocols (more accurately, it compares optimized versions of several different routing protocols) in a simulated wireless environment. It is interesting in that it is a first study of the relative performance of these protocols; I don't have too much criticism except that I'm not necessarily convinved that the protocols were "equally" optimized by the group. That is, the one that seemed most optimized is the protocol that performed best overall--- DSR.

Still, this is at least partially due to the use of source routing which seems to be a better fit than per-hop routing that some of the other algorithms use. Combined with eavesdropping, which allows state updates with less overhead, this gives each node an idea about the accessibility and route to each other node; with DSDV etc, the route information at each node is still about routes to all other nodes, but only keeps the next hop information.

The eavesdropping and caching employed by DSR are the reasons it performs best; both of these are optimizations to the original DSR algorithm.

In addition, I thought it interesting that TORA's short-lived routing loops created during route reversal lead to many packet drops; in hindsight, this seems like a poor decision in a routing algorithm (creating loops). Finally, the last interesting thing in this paper is that large packets are undesirable for the scenario used because of unfairness in MAC protocols; thus wireless routing should use smaller packets than those used in wired networks, where large packets can be desirable for amortizing the bandwidth-latency product.

Monday, October 6, 2008

HIgh-Throughput Path Metric for Multi-Hop Wireless Routing

This paper introduces a metric for discovering routes on wireless ad-hoc networks which performs better than the usual minimum hop-count metrics from wired networks. The previous paper used a metric that looked at the "expected transmission time" of a message based on the throughput and reliability of each link; in this paper, the presented algorithm refines this by looking at the expected transmission count. Unlike other metrics, this is not dependent on the momentary load characteristics of the network.

Minimum hop routing works when the shortest path is also the fastest; however, because of the characteristics of wireless networks, this is often not the case (for example, intermediate nodes in routing can interfere with acknowledgements from the previous node because transmitting the packet may override transmitting an ACK).

The algorithm looks at delivery ratios both TO other nodes and the delivery ratios of the ACKS, which is a good refinement. Based on these ratios, it decides which route to take. Because the link probes are sent at some fixed known rate, each node can calculate the delivery ratios even without full knowledge of the link state. However, I noticed this calculation could be somewhat off since the probes are jittered; couldn't this lead to artificially low ratios in some cases?

The paper analyzes adding ETX to DSDV and DSR. I thought their test methodology was a little odd because they use a snapshotted route table instead of allowing it to evolve through the test for DSDV. I'm not sure if this makes the numbers suspect, but it is something to keep in mind. Nevertheless, the measurements show ETX+DSDV is better than plain DSDV (although this may not hold if the tables evolve under heavy load...). However, ETX+DSR is not better than DSR; ETX uses rather small probes even to check for how the network route performs for large messages.

A simple modification to this algorithm could make performance better. Even doing something like using exponentially distributed probe sizes (and having them occur far apart for larger sizes) would give a better image of what the network routing actually will do for various size messages. Routes could be picked based on message size. Even using two different sizes ("small" and "large") may give a better view.

Thursday, October 2, 2008

Architecture and Evaluation of Roofnet

Roofnet is a 802.11 mesh network near MIT that uses volunteers to host access points, providing multi-hop wireless links to the internet via NAT. The major goal here is to ease deployment by not constraining node placement, forcing configuration to occur automatically, and use multi-hop rather than single-hop routing to extend the footprint of the network even to nodes that by themselves do not have a wired connection. The use of multi-hop is one major difference between this network and those we have studied to date.

A custom route discovery mechanism called Srcr, is used to periodically detect best routes from an access point through the network. Although the algorithm is very simple, it seems to work, albeit in an unscalable way: the flood packets combined with the re-forwarding when new faster paths are discovered are unscalable and would probably hurt performance as the number of nodes increased or if nodes were so closely packed together that the local path explosion problem occurred (i.e. the best path keeps changing marginally because all paths are essentially equal with some jitter). Another custom algorithm manages bitrate selection, using heuristics that are not the same as 802.11: Roofnet will tolerate high packet loss if the obtained bandwidth is higher than a low-bitrate selection where loss is low. IIRC, 802.11 considers packet loss very bad and attempts to reduce bitrate even at the cost of overall throughput.

Interestingly, the network functions despite a number of shortcomings. For one, most paths go through the same "important" short high-speed links. Even though the network could survive losing some nodes, the loss of these most important nodes would disproportionately reduce connectivity. In addition, even in the full network, some pairs cannot route from one to another; this isn't so much of a problem if all communication is to the internet.

The major interesting result for me was the fact that this network uses the original CSMA/CA algorithm from 802.11 (not b) and has CTS/RTS disabled. It seems to me that with this kind of network, the hidden node problem would be a big deal; in fact that was my first thought when the network was described in section 2. However, their experimental results show that CTS/RTS doesn't help at all.

I would like to know why this is: it seems counterintuitive. Is it due to the particular implementation? I believe the CA part of CSMA/CA implements essentially what DS does, so perhaps CTS/RTS is redundant here. Still, it would be interesting to see how prevalent the hidden node problem is on their network and whether 802.11's CA is sufficient.

Wednesday, October 1, 2008

Modeling Wireless Links for Transport Protocols

This was an enjoyable paper about shortcomings and suggestions for improvement in previous work that modeled the performance of transport layers on wireless links. The paper begins by describing cellular, WLAN, and satellite networks and their various parameters; in this section, the cellular traffic is considered mostly to be WAP/low-BW internet access. Although this is reasonable for the time the paper was written, this is definitely changing with first-class browsers appearing on modern cellular networks; the traffic overall in the cellular networks probably looks like some mix of WAP-like and HTTP-like traffic (I assume that browse-over-proxy such as Opera Mini looks something in the middle of the two, but probably not entirely dissimilar).

The paper goes on to cite specific shortcomings in some previous models, with a range of egregiousness. Some are extreme: 40% packet loss, for example is one thing cited here (but I disagree with this paper: it is realistic since the subsequent paper we look at has similar loss rates). I was very surprised that some studies considered symmetric models for WLAN, given the inability of radio to send/recv at the same time (a physical limitation, I believe).

Lastly, the most useful sections were about modeling the various aspects of wireless links. In particular, these sections are well-organized, describing the effects, whether modeling them is important, and how to model them. I believe one simple modification to their On-Demand Resource Alloc model would make it more realistic: it seems that introducing a variable delay would make the latency spread out in the model as it does in the measured data; a simple uniform random delay between two thresholds seems enough.

Overall, the paper was very useful in helping understand issues of wireless link layer protocols and some of the ways they interact with the transport protocols. This even helped make the previous readings more clear.

Monday, September 29, 2008

Improving TCP Performance Over Wireless Links

In this paper, the authors compare several methods for improving wireless TCP performance, divided into three major types: end-to-end, link-layer, and split-connection. The end-to-end protocols require both ends to modify normal TCP; the link-layer protocols use the wireless link layer to improve performance, mostly by being aware of TCP's semantics; and the split-connection protocols give the non-wireless end the appearance of a non-lossy network connection, while dealing with loss by itself.

The major failing of TCP when it comes to wireless lossy networks is that packet loss in TCP indicates congestion, and rarely in wired networks do packets actually get lost. Thus, the basic strategy is to deal with packet loss through some other mechanism, not indicating that congestion is occurring.

My major criticism is that the division of these modifications is dealt with by having each category treaded in its own section, with not as much comparison across the various strategies until the last section. I think more detail in how the different strategies compare would have been useful; this is more of a organizational problem and less a research question.

Overall, I think the most promising is the link-layer schemes which perform the best (I believe). The end-to-end protocols give good performance, but because they require changes at the ends, it seems impractical to implement. In addition, the ELN enhancement seems difficult to implement, even given their suggestions. The split-connection protocols, at least as implemented here, do not perform well.

MACAW

This paper presents a protocol for a specific kind of wireless network and its interactions with wired TCP. In particular, the test case is an actual network of wireless pads that are mobile and move room-to-room, with relatively small footprints of communication. The base stations are in the ceiling of the rooms, leading to the low range (since the communication is always >6 ft apart). Interestingly, the paper says that noise is not one of their major design points; I think this is somewhat unrealistic, given the noise sources present in all offices (microwaves, speakers, etc.)

The protocol designed starts off as something relatively simple: an RTS-CTS-MSG sequence.

After that, the designers use a number of scenarios to add on to their protocol, including adding ACKs, adding a Data-Send to let others know a message exchange is occurring, adding an RRTS, and so on. Although each is motivated by a specific example, the end result is a convoluted algorithm which has quite a complicated flow even in the usual case. Something so complicated gives many opportunities for flaws.

In addition, I'm not sure how handoff works in this algorithm. Since the ranges are so small, it seems like handoff would be something that occurs often.

Wednesday, September 17, 2008

Supporting Real-Time Applications in ISPN

Written around the same time as the previous paper, this one attempts to design a network infrastructure for three different classes of service: guaranteed (delivery within a certain advertised latency limit), predicted (give some guarantee based on the level of traffic), and usual datagram traffic. Based on the desire to route these classes, the network is built to work with two kinds of real-time applications: ones that adapt and can tolerate some increased latency, as well as applications that do not adapt and are intolerant to increased latency.

WFQ is used for the guaranteed class, but the paper points out that FIFO is actually better than an FQ-type algorithm when it comes to predicted service because FIFO, in some sense, "distributes" the bursty penalty over all the flows in the router, while WFQ penalizes the bursting one only; predicted service prefers a small penalty to all over a large penalty to a single flow.

One major result here is a new kind of FIFO (called FIFO+) queuing that attempts to make packets operate in a FIFO manner over all links in the path. This allows the network to route
the predicted class well. However, such an algorithm requires distributed control data; for each flow one has to have enough control to know which packets deserve to be routed first. In addition, any advertised delay must be an aggregate of delays on the path; this also involves communication and control record-keeping across any flow before it is actually allocated. These seem like large inefficiencies.

Overall, the algorithms in the paper do meet their goals in some sense. Given, however, the large increases in bandwidth as well as the major changes required in this proposal, it is no wonder it has not been implemented. Along with the large changes required in routing etc., there is also the need for additional record keeping for billing as well as other non-technical aspects that just seem impossible to overcome.

Fundamental Design Issues for the Future Internet

Looking at the future from 1995, about the time the Internet was being commercialized, this paper examines design issues going forward from services that are latency-tolerant ("elastic" in their designations) to services like video and audio, which require a better "class" of service. The paper advocates changing the network to accommodate different service types. Building a framework to evaluate different implementations, the paper introduces a measure that maximizes the "utility" of all flows--- with some flows requiring real-time service and others with less strict requirements.

A result of this paper is that sharing between flows that need fast service and elastic flows actually is more efficient than having separate networks for the two types of service. In addition, it is unrealistic to have the network "detect" the required type of service, so the network has to expose some kind of interface for the applications and users to request the types of service.

Two major problems with having different kinds of service is 1) how do you get people to settle for lower-levels of service and 2) how do you get ethernet/ATM/etc to support different classes of service. The first problem is fundamental in that one must incentivize things such that higher levels of service cost more--- the author rightly points out that this is going to require a huge culture change. In fact, the culture change is so large that I'm not sure it could be brought about, except if it was through services people already pay for (cable tv, for example). Asking for more money for services users are used to having for free is pretty insurmountable in the context of the internet.

Moreover, as Shenkar points out over and over, his analysis is based on bandwidth continuing to be expensive. Many of these concerns go away if bandwidth can be increased by a huge amount; however, I think the desire for low-latency class of service still remains today, but for applications such as gaming, the issue is not high bandwidth, but just having low latency. Thus there is a spectrum of latency and bandwidth requirements that would benefit from having different classes of service.

But while the Internet remains "good enough," this won't happen, especially given the invasiveness of the changes required. Even Ethernet wouldn't work. I don't have much critical to say about the paper, as it is more setting up a framework, one that I believe is reasonable given the concerns at the time.

Monday, September 15, 2008

Congestion Control for High Bandwidth-Delay Product Networks

EXplicit Control Protocol, XCP, is introduced in this paper as a bottom-up redesign of congestion control without the limitations of TCP. By allowing a better feedback mechanism than dropping packets (in fact making the feedback an explicit part of the underlying protocol, and not dropping packets at all), XCP better communicates both over- and under-utilization of resources to clients by modifying packet headers at intermediate gateways and having clients react based on the feedback. Separating out efficiency and fairness issues into two different controllers, an XCP gateway attempts to gain the best efficiency in a fair manner; the EC decisions are implemented by feedback given by the FC.

The best summarized explanation of the two controllers is that the EC uses Multiple-Increase Multiple-Decrease to rapidly hit efficiency targets, and the usual AIMD to converge to fairness. Most interestingly, the parameters for XCP can be set once for all workloads, and do not depend on the kinds of traffic the gateway receives, unlike RED.

The performance of XCP is explored in several simulations, and it achieves extremely high efficiency for any number of flows, as well as good fairness--- always much fairer than TCP. The one downside in the performance section is that it seems to be a bit less efficient than some of the other algorithms; the paper says it "trades a few percent of utilization for a considerably smaller queue size" but I fail to see the advantage of having smaller queue sizes if utilization is not being fully reached.

Of course, XCP is not TCP and integrating the two is covered in the paper to some extent. Gradual deployment would require gateways that understand both protocols and performed some fairness calculations between the two; this intermediate level of implementation seems much too complex, but unavoidable given TCP's hold.

The paper shows how congestion control "should" have been designed into routers and the protocol, as opposed to how it was integrated after the fact into TCP due to the discovery of failings of the original underlying algorithm.

Random Early Detection Gateways for Congestion Avoidance

Taking the viewpoint that the gateway is the best place to determine whether congestion is occurring, and that a gateway-based congestion avoidance scheme is tractable if it has little per-connection state, the RED paper proposes to control congestion by maintaining a stable average queue size that can temporarily increase in response to transient congestion. RED gateways keep the queue sizes between two threshold values by marking packets with some probability when in the correct range (to prevent queues from getting too big), and marking/dropping all packets when the maximum threshold is exceeded.

This results in essentially two algorithms, one for accommodating burstiness, and another for maintaining a particular level of traffic. I found it somewhat odd that these algorithms are packet based, as opposed to being byte-based; in fact the way they attempt to mitigate for this is modify the algorithm to mark proportional to the packet size; this is not quite the same as making the queue calculation byte-based, however.

The simulation results show that RED accomplishes its goals of maintaining average queue size and dropping packets with less bias against burstiness. Given TCP's limited feedback mechanism (dropping packets is really the only indication of congestion), the simulations demonstrate that RED utilizes packet dropping well in order to avoid congestion, by showing that when congestion begins to occur, the system drops packets, and the result is a reduction in average queue size.

RED is not fair in the max-min sense. In fact, the authors say fairness is not "well-defined," but I think there are reasonable fairness metrics--- RED does not in anyway guarantee fairness or ensure equitable utilization of gateway resources. Like all the congestion avoidance schemes we've looked at, it assumes cooperative clients and does not deal with malicious users.

In addition, it is quite sensitive to parameter selection, and different parameters seem to work for different workloads. The authors do provide some guidelines for parameter selection, but it is not clear how the interaction of the different parameters works with different traffic characteristics. Thus it seems more of a matter of "guess and check" to figure the optimal way to configure the routers for each kind of traffic.

Overall, given the limitations of TCP and the desire to have a simple algorithm that prevents congestion, RED is an interesting and useful simple way to maintain performance in the face of approaching congestion.

Wednesday, September 10, 2008

Core-Stateless Fair Queueing

In this paper, a simpler queueing algorithm is proposed that divides routers into "core" and "edge" routers: the edge ones essentially are inter-AS while the core ones are within an AS. In this regime, only edge routers require per-connection state, and such routers insert some state into packets as they traverse into an AS (or, in this paper's terms, an "island" of routers). Core routers are stateless with respect to outside connections, but do queueing based on the information from the headers.

The basic outline of the algorithm uses flow arrival rates, estimated and weighted using exponential averages. This is interesting, but I am not clear on why it is the right thing to do, other than the authors asserting that using constant parameters results in wrong behavior.

The overall algorithm seems somewhat complicated, but since the edge routers are much fewer than the core routers, most routers can be implemented without having to do complicated state manipulation and computations. It is not as fair as the previous paper's algorithm, but requires less computation per packet. The algorithm does punish ill-behaved hosts, which is a good property.

In simulation, the CSFQ algorithm does quite well, and although it is not as fair as the more complicated WFQ, it behaves rather fairly. However, although some routers don't have to be super-complex, the edge ones do, requiring packet classification mechanisms. How much of an improvement this is over FQ is not clear. In addition, I believe Moore's Law outpaces bandwidth increases, so all the computation involved in FQ may not be a problem.

Analsis and Simulation of Fair Queueing

This paper introduces a FQ algorithm that replaces the usual first-come first-served queueing of routers. The FQ algorithm gives each active connection a "fair" share under the max-min definition of fairness, and, with the promptness modification, prioritizes low-bandwidth links. This makes sense since such links usually are interactive connections that should be serviced in a prompt manner.

I would say the analysis of the algorithm is somewhat simplistic, but the fact is that there are many possible ways to analyze any queueing algorithm, and having just two types of connections is good enough to show some of the behavior characteristics. The most interesting result here is that the queueing algorithm on its own is not the only important parameter; one must also take into account the flow control algorithm. In fact, queueing by itself does not "fix" network behavior; interaction with flow control is important.

Under FQ, using "generic" flow control gives you less than your fair share--- the authors frame this as an incentive, but a contrarian view could look at it as a weakness of the algorithm. I think a better algorithm would penalize "generic" flow control but would only penalize misbehaving hosts which do not perform flow control or try to overuse their fair share.

In addition, the authors say that under other scenarios result in confusing behavior; an example would be useful. Overall, the paper is good in showing that more sophisticated queueing and flow control can result in fairer service, but the large weakness here is that routers must perform quite a bit of work, including maintaining per-connection state, that may slow down the routing overall.

Monday, September 8, 2008

Congestion Avoidance and Control

In response to the 1986 overcongestion of the internet, the 4.4BSD stack was changed to control congestion. This paper explores the major changes and reports on the results through empirical experiments. In particular, the paper uses a concept from flow theory that says that a flow should be conservative, which is violated under three conditions; each condition is dealt with by a separate algorithm. First, the implement "slow start" that allows communications to reach equilibrium by starting out sending a single packet and then ramping that up as acks are received.

To avoid unnecessary injections, the authors point out the retransmit timer must be well-designed and take into account the variability of the network as well; a simple modification to the usual filter fixes this by estimating the variance at each point. Lastly, they introduce the congestion avoidance technique used in TCP stacks, which has its roots in the previous paper. Both use additive increase/multiplicative decrease, but the TCP algorithm is clever in that it requires no global state and just uses lost packets as the network congestion signal. No feedback from the gateway or router is required.

This paper is quite interesting in that it presents refinements to each algorithm (much of it is not new) and then presents how the new algorithms fare in actual network performance. It is a validation of the congestion avoidance regimes in the previous paper as well. In terms of its importance historically, the paper was of course instrumental in making the internet more robust in the face of capacity. Again, however, the paper assumes friendly/non-malicious clients; how this works in the face of malice is unclear. In particular, the gateway would probably need to be involved; not following the endpoint behaviors outlined could be a simple signal for gateways to shut down a malicious user.

Analysis of Increase and Decrease Algorithms for Congestion Avoidance

The authors of this paper analyze algorithms for congestion avoidance using very limited feedback from the global resource--- a single bit sent from the router that indicates whether the network is overloaded or underloaded. Each sender reacts independently (and in a trustworthy manner) with some policy that reflects the goals for the network (such as efficiency, fairness, etc.). In analyzing the potential algorithms, the authors look at policies are reflecting linear space (they do talk a little bit about nonlinear versions, but rightly point out that they may be too complicated); that is, you can add or multiply or do a combination of the two.

The corresponding diagrams on a plane are extremely informative, and intuitively explain some of the conclusions of the paper. For example, the authors advocate additive increases and multiplicative decreases, showing that they converge to optimality in terms of efficiency and fairness. This is because the multiplicative decreases take the network closer to fairness, while the additiveness makes the network more efficient.

Although this paper does not deal with TCP per se (the next one does), the congestion avoidance here is important to study because it was later used to improve TCP's performance, which suffered several meltdowns as the early internet reached higher percentages of capacity.

The paper is essential even just for the graphical explanations of the various congestion avoidance schemes. These help explain the algorithm's relative effectiveness, and, when the algorithm less than ideal, where/why it breaks down. However, the paper does not go into the specifics of what the parameters should be--- this would require some kind of simulation. Still, a specific example that shows behavior for some sets of patterns would be useful. In addition, how are the parameters related to the network parameters (things like bandwidth, latency, number of nodes, etc.)? Even broad relationships between network and optimum congestion avoidance policy would be interesting. Lastly, of course, this requires some sort of global state at the router level in order to send feedback to the endpoints.

Given all of that, I think the essential point of the paper is important to remember: to converge to optimum efficiency and fairness, one should use additive increase and multiplicative decrease.


Wednesday, September 3, 2008

Inferring AS Relationships in the Internet

In this paper, the authors attempt to "map" the relationships between autonomous systems and the ways they interact through routing. First, the paper defines the different kinds of relationships between AS's, similar to how the Interdomain paper defines them, although this paper has several additional relationship types. Then, the paper shows how to construct a graph of the AS's where each AS is a vertex and their relationships are expressed as edges.

This is done by considering the way routing is related to the relationship. For example a customer will not want to route its provider's routes; seeing how the routing between the two is expressed tells us something about how the AS's are related. By using heuristics, the authors are able to map the AS's quite well.

Admittedly, some of the details were over my head in terms of how the graph is constructed, but I understood the gist. This is not a fault of the paper, but moreso due to my relative ignorance of BGP. I understood most of how the policies express relationships; I think the paper does a good job of formally detailing how policy and relationship can be used to interchangably.

The confirmation of their results, however, left me scratching my head. "As much as 99.1%" is quite an ambiguous number; this could be anything between 0 and 99.1. Furthermore, they don't specify what kinds of information AT&T internally has, or how it got it--- are their relationships accurate? If they only deal with AT&T's connectivity, then we can assume they are accurate, but what percentage of inferences does that cover? I was unsatisfied by their metrics in this sense.

Interdomain Ineternet Routing

This paper discusses the post-commercialization routing between different networks occurs by describing the possible relationships between domains and how BGP is used to exchange/propagate information from one to another. In addition, the paper describes how policies for inter-domain routing are implemented by modifying BGP entries when importing or exporting them.

The entire process is quite complex, and there are no rules per se for how providers import/export routes. I think the paper does a good job explaining the complexities by starting with an explanation of different kinds of inter-AS agreements; this helps explain why distributing routing information is not simple and intuitive. BGP is explained well, and although is a simple point-to-point algorithm, has complexities that arise mostly in import and export rules.

Clearly BGP is one of the aspects of the internet that has room for improvement. Because of all the odd complexities of the internet architecture, I'm not convinced that a point-to-point protocol is the only way. Even within a domain, having some kinds of collective decision-making would possibly be more efficient in the end; some kind of distributed data structure (with sufficient replication) such as a dist hash could make routing more robust against failures.

The paper itself was excellent for introducing the issues with interdomain routing and BGP; I knew very little of the details of the post-commercialization realities of routing.

Monday, September 1, 2008

Design Philosophy of the DARPA Internet Protocols

(link to paper)

By outlining the goals of the Internet protocols and explaining how the features of the protocols come from the design goals, this paper gives insight into the historical decisions made in developing IP, TCP, and UDP. Since the major purpose was to connect disparate networks with very different underlying media and protocol guarantees, IP took little functionality as given. Based on secondary purposes, a number of decisions, such as using a "stateless" protocol (for survivability) and layering protocols with stronger features on top of IP.

The paper also goes into some of the goals the protocols did not meet very well, some of which has changed since the paper was written. For example, accountability has become much more robust, with routers etc. having the ability to record detailed usage statistics. Others, however, like engineering issues, remain. Implementing a protocol stack is still quite difficult, requiring more than just following a spec due to performance issues.

As a historical document, the paper is essential because it shows some of the background in the decision-making process for IP, but a more updated version would go into some detail explaining how the state-of-the-art has changed, enhancing some of the goals while making others less important. For example, I think the modern IP implementors do keep some state ("soft state") at intermediate points. In addition, some detail on TCP's problems and solutions, although possibly out of scope, would drive home some of the kinds of things implementors need to be cognizant of.

An additional paper going into TCP motivations and issues would be sufficient to cover that shortcoming.

End-to-End Arguments in System Design

(link to paper)

The recurring argument/theme in this paper is that lower-level layers of system design are often not the best place to provide functionality like encryption, guaranteed message delivery, etc. because higher-level applications have more knowledge and may therefore need to replicate the functionality. In addition, the cost of providing some of these services at a low level is higher than building them at the application ("end") level.

As an example, a banking application may need to know that a withdrawal message was received by the server; the best way to do this is probably not through a low-level guaranteed message layer, since the message could be corrupted or tampered with in between the time it reaches the low-level messaging system of the server and the time the application receives it. What really needs to occur is that the application must acknowledge receiving (and acting on!) the withdrawal message.

I believe the general principle the paper espouses. However, I think (and the paper acknowledges this) the difficulty is in figuring out the "ends." How do we determine what they are? In addition, what is the argument in the context of layered messaging such as that in TCP/IP? Does it make sense to, instead of treating applications as the ends, treat layers as ends and build applications on top of the layers? When should a layering be used?

I think the paper leaves the determination of ends up for discussion, but as a discussion tool, it is perfect for the syllabus because it opens up space to discuss which functionality is essential and whether the cost of providing further features is worth the cost at each level. In that sense, the goal of the paper is to push designers to examine the costs of each function and whether it must be replicated before implementing it into the lower-level layers of the design.