Wednesday, October 29, 2008

Looking Up Data in P2P Systems

This paper introduces the Distributed Hash Table data structure, which is a mapping from key to which node in an overlay network has the corresponding data to that key. DHTs are useful in creating a distributed store of key-value pairs existing in an overlay network, while remaining resilient to nodes entering and leaving the network as well as node failures.

The DHT implementations in the paper work by, when receiving a request for a lookup, either answering the lookup if the node is responsible for that key, or by forwarding the request to a node "closer" to the node responsible for the key. As a result, the paper divides the DHT implementations into those looking at distances in a single dimension (Chord, Pastry) and those looking at distances in multiple dimensions (CAN). Of these, the simplest is Chord, and the most complex routing algorithm described is that of CAN.

The open questions section is the most useful here: real-world fault tolerance under many simultaneous connections, disconnections, and failures is difficult to get right and it is unclear if any of the existing DHTs do a particularly good job. For example, the routing algorithm as described for CAN requires, after some disconnections and/or failures, a redistribution of the space which can be expensive. Although the paper does not say as much, it seems that perhaps the DHTs are not as robust (to failures as well as to malicious users) as their authors would like, nor are they as performant as users would like. These seem like the major issues to address.

No comments: