This paper presents Chord, one of the first implementations of a DHT. Using consistent hashing, where both the keys and the nodes live in the same single-dimensional space, the DHT assigns keys to nodes by looking at the closest node in the space that follows the key. Thus, there is no ambiguity in a stable DHT when it comes to which node is responsible for which key, and, because the nodes are distributed evenly in the space, with high probability each node is responsible for about the same number of keys.
Dealing with a query is done by either responsing to the query (if the current node is responsible for the queried key) or by forwarding the query to a node closer to the key in the identifier space. Using a skiplist like structure (called "finger pointers" here), Chord can use log N queries to find an item in the hash because the skiplist maintains pointers to power-of-2 distances; this means that in each forwarding operation, Chord makes the query get at least half the remaining distance closer than it was before.
The evaluation in this paper is interesting. One optimization they make is to have "virtual nodes" such that a physical node is logically several nodes at once; the one problem I see this causing (which doesn't seem to be addressed in the paper) is that this results in larger amounts of nodes going down when a single physical node disconnects or fails. An evaluation of this effect is important. The nodes that fail, then, are not in fact random as in section 6.4; nonrandom failures may cause Chord to fail.
Lastly, there is no thought given to robustness in the face of malicious nodes. This seems like an important aspect of a DHT, or any overlay network because these overlays are over the public internet.
Overall however, this is a nice paper and although it is missing some important considerations, I think it is a classic in terms of presenting a simple, elegant data structure that uses an overlay network and some performance results.
Wednesday, October 29, 2008
Looking Up Data in P2P Systems
This paper introduces the Distributed Hash Table data structure, which is a mapping from key to which node in an overlay network has the corresponding data to that key. DHTs are useful in creating a distributed store of key-value pairs existing in an overlay network, while remaining resilient to nodes entering and leaving the network as well as node failures.
The DHT implementations in the paper work by, when receiving a request for a lookup, either answering the lookup if the node is responsible for that key, or by forwarding the request to a node "closer" to the node responsible for the key. As a result, the paper divides the DHT implementations into those looking at distances in a single dimension (Chord, Pastry) and those looking at distances in multiple dimensions (CAN). Of these, the simplest is Chord, and the most complex routing algorithm described is that of CAN.
The open questions section is the most useful here: real-world fault tolerance under many simultaneous connections, disconnections, and failures is difficult to get right and it is unclear if any of the existing DHTs do a particularly good job. For example, the routing algorithm as described for CAN requires, after some disconnections and/or failures, a redistribution of the space which can be expensive. Although the paper does not say as much, it seems that perhaps the DHTs are not as robust (to failures as well as to malicious users) as their authors would like, nor are they as performant as users would like. These seem like the major issues to address.
The DHT implementations in the paper work by, when receiving a request for a lookup, either answering the lookup if the node is responsible for that key, or by forwarding the request to a node "closer" to the node responsible for the key. As a result, the paper divides the DHT implementations into those looking at distances in a single dimension (Chord, Pastry) and those looking at distances in multiple dimensions (CAN). Of these, the simplest is Chord, and the most complex routing algorithm described is that of CAN.
The open questions section is the most useful here: real-world fault tolerance under many simultaneous connections, disconnections, and failures is difficult to get right and it is unclear if any of the existing DHTs do a particularly good job. For example, the routing algorithm as described for CAN requires, after some disconnections and/or failures, a redistribution of the space which can be expensive. Although the paper does not say as much, it seems that perhaps the DHTs are not as robust (to failures as well as to malicious users) as their authors would like, nor are they as performant as users would like. These seem like the major issues to address.
Tuesday, October 28, 2008
Active Networks Vision and Reality
This paper presents lessons learned through the ANTS implementation of active networks, which are a kind of overlay network that allow users to write the routing code executed on each node within the network. The basic architecture is that each node has a runtime executing, while packets choose (by reference: an MD5 hash is a "pointer" to the code) which routing program to run. Such programs can be arbitrarily injected into the network, as long as they have been certified by some central authority that says they are not malicious.
The security issues of ANTS take much consideration in the design. They attempt to mitigate the threat of running arbitrary code by limiting the kinds of operations that can occur, the state available to read and write, and by using a central certifying authority to guarantee safety. In fact, if one is going to use a certifying authority, it may make sense to shortcut the safety guarantees of the runtime and allow more efficient machine code to run on each router instead of java bytecodes running in a VM. It seems the reason the implementation works as it does is that the designers wanted to have arbitrary non-certified code running, but could not figure out how to make global guarantees in terms of not allowing a single user or program from consuming more than a fair share of global resources.
This could be tackled by having more coupling between routing nodes, using some sort of gossip protocol or other way to exchange information. Such algorithms could guarantee that only some fraction of the routers can lie at a time, making the information somewhat reliable. However, there may just be no way to represent global resource usage in a way that reliably guarantees the information is accurate enough to make a decision whether to run a program that is not certified.
The security issues of ANTS take much consideration in the design. They attempt to mitigate the threat of running arbitrary code by limiting the kinds of operations that can occur, the state available to read and write, and by using a central certifying authority to guarantee safety. In fact, if one is going to use a certifying authority, it may make sense to shortcut the safety guarantees of the runtime and allow more efficient machine code to run on each router instead of java bytecodes running in a VM. It seems the reason the implementation works as it does is that the designers wanted to have arbitrary non-certified code running, but could not figure out how to make global guarantees in terms of not allowing a single user or program from consuming more than a fair share of global resources.
This could be tackled by having more coupling between routing nodes, using some sort of gossip protocol or other way to exchange information. Such algorithms could guarantee that only some fraction of the routers can lie at a time, making the information somewhat reliable. However, there may just be no way to represent global resource usage in a way that reliably guarantees the information is accurate enough to make a decision whether to run a program that is not certified.
Resilient Overlay Networks
In this paper, an application-layer overlay network is built using UDP in order to build a more resilient routing infrastructure that, because of its smaller size, can respond to disruptions and slowdowns faster and more flexibly than the traditional BGP approach between ASs. The particular implementation of these application-level routers in this paper uses a single-hop intermediary between each node in order to route around failures; this gives essentiall n-2 paths where before there was as single path (as controlled by information obtained via BGP).
With this path multiplicity, applications can choose how which paths (or under which metrics paths should be chosen) depending on their requirements. In addition to this application-level choice, routers can be more powerful in terms of having much more detailed policies for routing, since the network is essentially a private one built on top of the public internet and because the networks targeted are small relative to the size of ASs. Thus routers can spend more time per packet than a gateway router in the internet can.
One thing that jumps out to me is that RON (unlike the next paper) is geared towards a network that has nodes that can be implicitly trusted. For example, the use of policy tags to bypass some of the per-packet computation is subject to trusting the policy tags you receive; there is nothing to prevent a malicious user from rewriting policy tags to ensure they get the highest level of service while others do not.
The evaluation results are very interesting. In almost every metric, RON improves the metric *in most cases*, but there are always some measurements under which RON does substantially worse. For example, in dealing with packet loss, RON helps improve path reliability in most cases, but there are always a few percentage of samples that show worse behavior during outages: that is, some packets that would be routed if using just underlying connectivity do not get routed with RON. Similarly, RON can make loss rates worse (although the authors believe this is due to using bidirectional information instead of unidirectional; it remains to be seen if that would make a difference).
With this path multiplicity, applications can choose how which paths (or under which metrics paths should be chosen) depending on their requirements. In addition to this application-level choice, routers can be more powerful in terms of having much more detailed policies for routing, since the network is essentially a private one built on top of the public internet and because the networks targeted are small relative to the size of ASs. Thus routers can spend more time per packet than a gateway router in the internet can.
One thing that jumps out to me is that RON (unlike the next paper) is geared towards a network that has nodes that can be implicitly trusted. For example, the use of policy tags to bypass some of the per-packet computation is subject to trusting the policy tags you receive; there is nothing to prevent a malicious user from rewriting policy tags to ensure they get the highest level of service while others do not.
The evaluation results are very interesting. In almost every metric, RON improves the metric *in most cases*, but there are always some measurements under which RON does substantially worse. For example, in dealing with packet loss, RON helps improve path reliability in most cases, but there are always a few percentage of samples that show worse behavior during outages: that is, some packets that would be routed if using just underlying connectivity do not get routed with RON. Similarly, RON can make loss rates worse (although the authors believe this is due to using bidirectional information instead of unidirectional; it remains to be seen if that would make a difference).
Monday, October 20, 2008
First-Principles Approach to Understanding Internet Router-Level Topology
This paper also attempts to understand how routers interconnect in the Internet, but instead of using power laws, they use a "first principles" approach by taking into account both technology restrictions of routers and the economic principles guiding ASs to interconnect routers--- this means looking at hierarchies of ASs and routers as well. That is, taking into account that ASs can be level-1, level-2, or level-3 and therefore have proportional traffic to their level.
Criticizing the previous paper's methodology, the authors rightly claim that power laws hold for graphs that look much different from each other; edge rerouting schemes can take one power law graph and turn it into another with very different connectivity and radius but with the same distribution, for example, of ranks. They also point out that there is little evidence the power law means anything for the topologies since there were so few data points.
I think the methodology here is much more sound. First, they actually use subsets of the internet as well as a model of how routers can connect hierarchically in the ideal case; the internet topology is found to be in a region that, measured with their "likelihood" metric, can only have come about by using non-random strategies to connect routers. Therefore generating graphs randomly, even if they follow some distribution of ranks and other metrics, does not seem like a good way to create a representative topology.
Power Law Relationships of the Internet Topology
This paper investigates aspects of internet topology, determining whether power laws of the form "x is proportional to y^a" for quantities x, y, and constant "a" exist in the graphs representing the Internet at the inter-AS and inter-router level. The motivation of this is that using averages/min/max distills the data too far: it gives very little information about the actual distribution of values.
As an alternative, this paper proposes essentially ordering the nodes (whether they are ASs or routers) by some metric, and then expressing the slope of the distribution on a log-log scale (which is "a" in the power law). Then, to evaluate the accuracy/effectiveness of the power law, the authors correlate the slope with the actual values (i.e. they measure the closeness of the fit). In addition, it is important to remember the paper's goal is to come up with some good metrics to measure the validity of internet graph model generators.
The first thing that jumps out to me about the paper is that the dataset used to determine the fits is quite small: three inter-AS topologies from 6-month intervals and a single router topology from a different, earlier time period. Along with the fact that the datasets are so small, there is no measure of accuracy for any of the data points; the fact that the router topology does not correspond to the inter-AS data also makes it difficult to tell whether the fits mean anything.
Admittedly, I'm out of my depth in trying to evaluate the soundness of their statistical methods, but the authors seem conclude that the close correlations are evidence enough that the power laws actually mean something. Further, it is difficult to tell exactly what the power laws mean in terms of how ASs change their interconnections over time according to the goals and restrictions of each. For example, their 1.45x number of nodes increase per year seems like an interesting number; although it may hold for ASs, I think for routers it is unlikely to hold (not that they could test it, given their one data point) since ASs may add routers or replace them.
Wednesday, October 8, 2008
XORs in the Air
This paper extends the idea of using the broadcast nature of the wireless medium by adding another source of optimization: routers can encode packets if they know that the next hop can decode the packets; a router can know this by listening in on other transmissions. This basically allows the algorithm to increase the theoretical capacity of the wireless links, since in some cases multiple packets destined for several next hops can be sent as a single packet, given knowledge of what each destination has heard.
The routing algorithm, called COPE, eavesdrops on neighboring transmissions. Each transmission contains information about what packets that node has heard, as well as acknowledgements for packets it has received, along with the packet(s) it is transmitting (some XORed together). In some cases, it aggressively encodes packets without knowing for sure that the receivers can decode them; this utilizes ETX to figure out the forward probability of each link. One interesting thing here is that unlike ExOR, it seems all packets are sent this way and none use traditional routing (there is no arbitrary 90% metric of where to go from one regime to another).
COPE utilizes bitmaps in a number of cases to reduce the overhead of the routing algorithm. One place this occurs is in the ACKing mechanism; there is an independent link-layer ACK algorithm for COPE, necessary because the aggressive encoding may be too aggressive (and, of course, the links are shared/unreliable).
COPE for UDP increases throughput, sometimes by large amounts, depending on the particular topology. However, the major problem with COPE is that it interacts badly with TCP due to to TCP's problems with hidden nodes; COPE cannot help throughput because hidden nodes combined with congestion control gives small queues and little opportunity for coding. This weakness is pretty major, but I think that some obvious refinements may change this. For example, having TCP-awareness in the link algorithm (as in a previous paper we read) could make the ACKing mechanism of the link layer work better with TCP ACKs to avoid unnecessary congestion control.
ExOR: Opportunistic Multi-Hop Routing for Wireless Networks
ExOR is the first wireless algorithm we look at that exploits the broadcast nature of wireless networks to build a new routing algorithm. In ExOR, the next hop is chosen after the transmission for that hop; that is, the route depends on who receives the message and not JUST on the current routing state of the network (as in DSDV) or the route information at the sender (like DSR). The major part of the algorithm is limiting duplicate transmissions from multiple nodes that think they should be sending the next hop; for this, each node listens to make sure any higher-priority intermediate node has not already forwarded the packet. Thus there is a tradeoff between waiting for a higher-priority routes and reducing overall routing latency.
ExOR's performance gain comes from essentially having many potential routes to choose from, unlike traditional routing which only has an all-or-nothing path for each hop. This works best when there is independent interference at each.
Looking at the algorithm as described, there's several design points that seem somewhat arbitrary. For example, the backoff when no information is available about higher-priority nodes is five packet times; I fail to see why this is a reasonable assumption, although it may work in practice. That is, there is no evidence this parameter is well-chosen. Another arbitrary number is the ultimate destination sending 10 batch map packets; again, there's no reason to expect this is optimal, nor is there reason to believe that the algorithm is insensitive to this parameter.
In their tests with UDP, ExOR improves throughput. Some of this is undoubtedly due to ExOR being able to choose paths as they are routed, but I would like to see how ExOR compares to a modified 802.11 algorithm that integrates two changes to ACKing: one, having ten ACKs sent instead of one, and second, having ACKs flow backwards to the source. I'm curious to see if the impact of these two decisions can be quantified, perhaps leading to a slight change in 802.11 routing to make it perform almost as well as ExOR.
Tuesday, October 7, 2008
Performance Comparison of Multihop Wireless Ad-Hoc Routing Protocols
This paper compares several different routing protocols (more accurately, it compares optimized versions of several different routing protocols) in a simulated wireless environment. It is interesting in that it is a first study of the relative performance of these protocols; I don't have too much criticism except that I'm not necessarily convinved that the protocols were "equally" optimized by the group. That is, the one that seemed most optimized is the protocol that performed best overall--- DSR.
Still, this is at least partially due to the use of source routing which seems to be a better fit than per-hop routing that some of the other algorithms use. Combined with eavesdropping, which allows state updates with less overhead, this gives each node an idea about the accessibility and route to each other node; with DSDV etc, the route information at each node is still about routes to all other nodes, but only keeps the next hop information.
The eavesdropping and caching employed by DSR are the reasons it performs best; both of these are optimizations to the original DSR algorithm.
In addition, I thought it interesting that TORA's short-lived routing loops created during route reversal lead to many packet drops; in hindsight, this seems like a poor decision in a routing algorithm (creating loops). Finally, the last interesting thing in this paper is that large packets are undesirable for the scenario used because of unfairness in MAC protocols; thus wireless routing should use smaller packets than those used in wired networks, where large packets can be desirable for amortizing the bandwidth-latency product.
Monday, October 6, 2008
HIgh-Throughput Path Metric for Multi-Hop Wireless Routing
This paper introduces a metric for discovering routes on wireless ad-hoc networks which performs better than the usual minimum hop-count metrics from wired networks. The previous paper used a metric that looked at the "expected transmission time" of a message based on the throughput and reliability of each link; in this paper, the presented algorithm refines this by looking at the expected transmission count. Unlike other metrics, this is not dependent on the momentary load characteristics of the network.
Minimum hop routing works when the shortest path is also the fastest; however, because of the characteristics of wireless networks, this is often not the case (for example, intermediate nodes in routing can interfere with acknowledgements from the previous node because transmitting the packet may override transmitting an ACK).
The algorithm looks at delivery ratios both TO other nodes and the delivery ratios of the ACKS, which is a good refinement. Based on these ratios, it decides which route to take. Because the link probes are sent at some fixed known rate, each node can calculate the delivery ratios even without full knowledge of the link state. However, I noticed this calculation could be somewhat off since the probes are jittered; couldn't this lead to artificially low ratios in some cases?
The paper analyzes adding ETX to DSDV and DSR. I thought their test methodology was a little odd because they use a snapshotted route table instead of allowing it to evolve through the test for DSDV. I'm not sure if this makes the numbers suspect, but it is something to keep in mind. Nevertheless, the measurements show ETX+DSDV is better than plain DSDV (although this may not hold if the tables evolve under heavy load...). However, ETX+DSR is not better than DSR; ETX uses rather small probes even to check for how the network route performs for large messages.
A simple modification to this algorithm could make performance better. Even doing something like using exponentially distributed probe sizes (and having them occur far apart for larger sizes) would give a better image of what the network routing actually will do for various size messages. Routes could be picked based on message size. Even using two different sizes ("small" and "large") may give a better view.
Thursday, October 2, 2008
Architecture and Evaluation of Roofnet
Roofnet is a 802.11 mesh network near MIT that uses volunteers to host access points, providing multi-hop wireless links to the internet via NAT. The major goal here is to ease deployment by not constraining node placement, forcing configuration to occur automatically, and use multi-hop rather than single-hop routing to extend the footprint of the network even to nodes that by themselves do not have a wired connection. The use of multi-hop is one major difference between this network and those we have studied to date.
A custom route discovery mechanism called Srcr, is used to periodically detect best routes from an access point through the network. Although the algorithm is very simple, it seems to work, albeit in an unscalable way: the flood packets combined with the re-forwarding when new faster paths are discovered are unscalable and would probably hurt performance as the number of nodes increased or if nodes were so closely packed together that the local path explosion problem occurred (i.e. the best path keeps changing marginally because all paths are essentially equal with some jitter). Another custom algorithm manages bitrate selection, using heuristics that are not the same as 802.11: Roofnet will tolerate high packet loss if the obtained bandwidth is higher than a low-bitrate selection where loss is low. IIRC, 802.11 considers packet loss very bad and attempts to reduce bitrate even at the cost of overall throughput.
Interestingly, the network functions despite a number of shortcomings. For one, most paths go through the same "important" short high-speed links. Even though the network could survive losing some nodes, the loss of these most important nodes would disproportionately reduce connectivity. In addition, even in the full network, some pairs cannot route from one to another; this isn't so much of a problem if all communication is to the internet.
The major interesting result for me was the fact that this network uses the original CSMA/CA algorithm from 802.11 (not b) and has CTS/RTS disabled. It seems to me that with this kind of network, the hidden node problem would be a big deal; in fact that was my first thought when the network was described in section 2. However, their experimental results show that CTS/RTS doesn't help at all.
I would like to know why this is: it seems counterintuitive. Is it due to the particular implementation? I believe the CA part of CSMA/CA implements essentially what DS does, so perhaps CTS/RTS is redundant here. Still, it would be interesting to see how prevalent the hidden node problem is on their network and whether 802.11's CA is sufficient.
Wednesday, October 1, 2008
Modeling Wireless Links for Transport Protocols
This was an enjoyable paper about shortcomings and suggestions for improvement in previous work that modeled the performance of transport layers on wireless links. The paper begins by describing cellular, WLAN, and satellite networks and their various parameters; in this section, the cellular traffic is considered mostly to be WAP/low-BW internet access. Although this is reasonable for the time the paper was written, this is definitely changing with first-class browsers appearing on modern cellular networks; the traffic overall in the cellular networks probably looks like some mix of WAP-like and HTTP-like traffic (I assume that browse-over-proxy such as Opera Mini looks something in the middle of the two, but probably not entirely dissimilar).
The paper goes on to cite specific shortcomings in some previous models, with a range of egregiousness. Some are extreme: 40% packet loss, for example is one thing cited here (but I disagree with this paper: it is realistic since the subsequent paper we look at has similar loss rates). I was very surprised that some studies considered symmetric models for WLAN, given the inability of radio to send/recv at the same time (a physical limitation, I believe).
Lastly, the most useful sections were about modeling the various aspects of wireless links. In particular, these sections are well-organized, describing the effects, whether modeling them is important, and how to model them. I believe one simple modification to their On-Demand Resource Alloc model would make it more realistic: it seems that introducing a variable delay would make the latency spread out in the model as it does in the measured data; a simple uniform random delay between two thresholds seems enough.
Overall, the paper was very useful in helping understand issues of wireless link layer protocols and some of the ways they interact with the transport protocols. This even helped make the previous readings more clear.
Subscribe to:
Posts (Atom)