Monday, November 24, 2008

Policy-Aware Switching Layer for Data Centers

In this paper, the authors attempt to build a new infrastructure for data centers that allows policy-aware arbitrary chainings of middleboxes, which, as we have seen in previous papers, are useful but do not fit into the clean layering semantics of the network protocols. Traditionally, physical connectivity is used to ensure that messages traverse a specific set of middleboxes in a data center, but this is prone to many problems, as the paper spends a section pointing out.

As we've studied the various problems of middleboxes, I'll only mention the one new thing that I hadn't thought about in this paper was the fact that spanning-tree algorithms etc. at layer-2 can be very difficult to coax into the proper topology for actually ensuring the logical topology of middleboxes is enforced; I hadn't thought of the fact that some middleboxes rewrite ethernet frames and/or are not accessibile via IP address, causing this reliance on layer-2.

The actual idea is fairly simple: using modified commodity switches divided into a "switching core" and "policy core" where the two pieces look like normal switch components to each other in order to express a centralized policy that is distributed to each of the so-called pLayer switches in a manner that ensures the policy is enforced.

However, the requirement that unmodified middleboxes work with the system causes a number of headaches. Firstly, stateful boxes must always get the frames/packets that require the state (such a specific instance of a firewall getting all packets associated with a flow). In addition, the authors wish to make sure the policies are respected under all kinds of churn: allowing some laxness in the churn rules would make the resulting implementation quite a bit easier. This is the area where the most "hacky" thing in the paper comes out: the use of intermediate middleboxes of each type in order to ensure robustness under churn. I see no alternative way to implement this, without modifying the middleboxes or allowing some failures when churn occurs.

Overall this paper is a thorough design document for a reasonable system and is useful as such. Performance is less important in this phase of implementation, but I believe that as long as switching decisions can be done quickly, intra-datacenter latencies are low, and pSwitches implement fast matching, the system will not add an appreciable amount of latency when compared to internet latencies, while introducing a mechanism that is useful to almost every datacenter of decent size.

Monday, November 17, 2008

Delay-Tolerant Network Architecture for Challenged Internets

"Challenged Internets" are a class of network that have characteristics such as long delays, low reliability, and periodic disconnection that make the IP protocol an infeasible way to interconnect such networks; each network uses ad-hoc protocols that do not necessarily provide the baseline best-effort packetized service required by IP. Furthermore, TCP and similar protocols require end-to-end connectivity, which may rarely happen in these challenged networks. Internetworking them, in the vision of the paper, requires a new set of underlying protocols similar in spirit to IP but one that takes into account the differing requirements of these networks.

The argument in the paper is that using PEPs or middleboxes to make these challenged networks appear as "normal" IP networks is insufficient and only really works if the challenged networks are on the edges. I agree with the latter part of the argument; it doesn't seem very easy to connect two different such networks through IP middleboxes in the middle of an internet.

Basically, the new internetworking protocol for challenged networks needs to provide several classes of service that are unlike IP services. In the paper they use USPS as a basic point of departure for the kinds of service these networks require, and propose to have the same basic set of services. To me, the difference between these networks and the internet is that intermediate nodes must have more than just transient state in the connection; it really is a hop-to-hop connection where at each point, the endpoints OF THE HOP have all the state.

One interesting thing is how they propose to handle naming. In this approach, there is a global set of names that are world-resolvable, and within each set, a locally-resolvable name is used. Thus a tuple represents "route to this network" followed by the address of the resource as a locally-resolvable name in the network. This is probably doable given the limited size (in number of networks) such an internet would have.

Architecture for Internet Data Transfer

This paper advocates for and designs a service-oriented approach to data transfer, in which applications rely on a lower-layer transfer service to to send data from one location to another, instead of reimplementing different transfer methods in each application. The idea is that the service would be the only implementation of each transfer backend, and that by using the service, disparate applications can benefit without requiring reengineering of each application for each transfer type.

This service, called DOT, is designed to not only perform transfers, but to also cache data in case it can be used to shortcut future transfers; combined with the right kind of hashing, this allows a reduction in redundant data transfers even if the data do not look exactly the same. To me, that was the most interesting part of the paper; the parts outlining the interfaces and APIs are relatively straightforward. The service is receiver-pull, and data is "pointed to" by its hash as in other systems we've studied this semester. Applications pass OIDs (which contain the hash and some other data) and the receiver initiates chunk-oriented transfers; the chunking allows caching to be more effective (for example, email message replies combined with proper chunking algorithms allow chunk caching to bypass sending the previous message's text).

DOT also exports a multi-path plugin which allows multiple transfer methods to be used to speed up transfers, something that shows the power of the system: implementing something like that would be much more difficult in a less-modular, application-specific backend for each application. Benchmark results show savings has high as 52%, which is is substantial.

Lastly, the authors do a case study with the postfix email server that demonstrates the ease of changing apps to use DOT, as well as the potential savings.

Overall, the authors have a nice idea that could be useful for reducing the amount of work for adding new transfer methods to applications. However, it seems to me that they have a system where the problem is not egregious enough to neccessarily convince application writers to adopt their system. I don't believe that transfer methods are seen as a major problem in the internet, and thus the need for DOT seems less than obvious.

Wednesday, November 12, 2008


This paper describes the X-Trace network tracing framework, which operates at all protocol layers in concert to obtain comprehensive information about network activity, including causal information as well as propagation of calls from layer to layer. This is done by propagating trace information to each layer and ensuring all trace information is conveyed to some accessible location.

One major goal of the system is to ensure that each domain can control their trace data, which means that although a user may want comprehensive information for the entire life of their application, they may be limited by what each domain conveys. This makes sense from an economic and security standpoint, and I think is a good idea.

The basic way the system works is that each protocol implementation is modified to enable X-Trace tracking. This is highly invasive, but the results make the effort worthwhile in many cases. The degree of modification is different for the different layers and protocols; some become very complicated and I'm not sure they're worth modifying. Still, the information obtained is quite compelling, including causality and the ability to trace from the beginning of a call all the way to the end, through each layer.

One thing that stood out was that the modification of these protocols, basically by interfacing with libraries, does have a security impact. The libraries had better robust against all kinds of attacks; even if the infrastructure itself is safe from attack, the library implementations having a flaw could make *all* protocols vulnerable to a universal attack if a bug is found. However, it may be that the library layers are thin enough that they can be extensively security-audited and ensured to be "safe."

End-to-End Internet Packet Dynamics

This long paper analyzes the results of an unprecedented large-scale study of internet packets, using traces obtained by sampling at 35 large internet AS's and a specialized measurement framework, resulting in thousands of measurements over the course of several days. This occurred two times, separated by a year. Overall the paper is a compendium of illuminating analyses, and I think is a necessary read. I'll comment a bit on the results I found most interesting.

Using these traces, the paper looks at network pathologies then analyzes aspects of performance, including packet loss, delay, and bottleneck bandwidth. Each of these sections are quite interesting given that there are nearly no studies that look at these on the scale of this paper. For example, they find that packet reordering is actually quite common (even though routers individually try very hard to avoid reordering) due to route changes, but that in fact TCP as implemented does not suffer much in terms of performance due to reordering. Even though the duplicate ack parameter was not chosen experimentally, it seems to do quite well (perhaps even ideally) in avoiding unnecessary retransmissions due to reordering. Observations such as this are very interesting and not present in many other papers.

Another observation is that packet corruption is relatively rare even though TCP's checksumming capabilities are not sufficient. Doubling the checksum at the cost of two bytes would make undected corruption negligibly rare, but it's not necessarily clear that this is big enough of a problem to matter: the E2E principle points to this being taken care of at a higher level being a better solution.

I also found the packet loss section very interesting. Loss rates in the two traces are both pretty high, and the trend is not downward. In addition, the difference between data packet loss and ack loss shows how data packets are controlled by TCP congestion avoidance but acks do not follow the same window-size lowering; also, most interestingly, having congestion in data does not necessarily mean congestion exists for acks and vice versa: this points to the prevalence of asymmetric routing due to the structure of inter-AS routing.

Thursday, November 6, 2008

Internet Indirection Architecture

Similar in goals to the previous paper, this work presents i3, which uses an API to build composable rendevouz-based communication to result in middleboxes that do not need to be physicall interposed between the two communicating ends. A receiver inserts "triggers" which live in the network an can be matched to packets that are being sent into the network; matching triggers cause the intended action to occur (which can be a send to an endpoint or set of endpoints). Like the previous paper, they use a DHT to store the triggers and ensure that all triggers with the same prefix match (which is used for multicast and anycast) gets stored on the same server for better performance.

The example applications here are mobility, multicast, and anycast, but from my reading, in principle one could create a firewall. I'm not sure about NAT-like machinery, however.

The thing I liked best about this paper is that the basic algorithm is very simple, and allows both senders and receivers to define intermediaries that are composable. With some additional complexity, a few optimizations can be introduced.

However, like the previous paper, the security concerns are many. The paper considers protection against some of the possible attacks, but any time we introduce a level of indirection into the network, the potential attack vectors must increase. I think some of the issues they bring up can be resolved pretty easily, but things like having to check for loops in the routing makes insertion of triggers take much longer since you need to check if the insertion creates a loop.

Overall, this paper has a simpler algorithm, but the security issues remain.

Middleboxes No Longer Considered Harmful

In this paper, the authors attempt to build a system that allows "middleboxes" such as NATs and firewalls to be used without violating two architectural rules that current middleboxes ignore: each entity in the internet has a unique global fixed IP address, and that network elements only process packets addressed to them (which is odd, given that routers MUST process elements not addressed to them, and one could certainly call them network entities, given the fact they have IP addresses). It seems to me that rule 1 is more interesting, in that I can see how not having globally-addressable elements makes things difficult in the internet, but I'm not sure about rule 2's importance.

Nevertheless, the authors are building an infrastructure to allow intermediaries, which are middleboxes that do not physically sit in front of a network endpoint and can be composed. Their system, called DOA (an ironic name) functions somewhat like a DNS system at a routing level: packets are addressed to EIDs which resolve either to other EIDs (the intermediaries) or to an IP address. EIDs are stored in a DHT and are self-certified to help with security; however, even the authors point out that this does not guard against MITM attacks, nor does it lessen the need for DHT security (which is already a difficult problem).

The paper goes on to describe implementations of NAT-like and firewall-like DOA boxes. To be honest, I found these sections incredibly dense and bogged down in details; each of the two types of boxes requires what seems like an immense amount of complexity. The performance isn't terrible, although the numbers reported are somewhat odd: for example, they report min and max for DNS but median for EID lookup in the DHT, which does not mean the numbers are comparable.

Overall, DOA seems like a very complex system with many security issues to deal with.