Starblab

With the growth of the internet and increased demand for web services has come a heightened need for scalable wide-area group communication systems. Starblab is a peer-to-peer gossip-based communication protocol as described by Kermarrec, et al. In a gossip protocol, each node maintains a partial view of the group membership, to which it forwards ("gossips") the messages it receives. Gossip-based communication protocols have a number of desirable properties, such as scalability to very large numbers of nodes and resilience to node and communication faults. The communication protocol relies on a self-organizing group membership protocol developed by Ganesh, et al. The membership protocol operates in a completely decentralized manner, providing members with a partial view size that is appropriate for the size of the system but without the need for any node to know the group size.

Each node in the system maintains both an inview of nodes from which it receives gossip messages and an outview of nodes to which it forwards messages. The outview of each node provides a partial view of the entire system. Whenever a message is received at a node, that message is then forwarded on to all the nodes in the outview. The communication protocol provides probabilistic guarantees that messages will be delivered atomically, i.e., that the messages will reach every node in the system.

The implementation has been tested in a lab consisting of eight workstations running SUSE 10.2, each with a single 2.4 GHz AMD Athlon 64 4000+ processor and 1 GB RAM, connected through a 100 Mbit/s Ethernet. We also performed measurements over Emulab, an integrated experimental environment for distributed systems and networks. To run experiments on Emulab, we wrote a script that produces a tarball from the implementation in the latest subversion repository and makes it available online, and another script that takes various parameters such as the number of nodes desired, generates a network schematic script for Emulab, and accesses the tarball to run the desired tests on the specified network topology.


Staralert

Terrorism is a significant threat in the world today. It is imperative that major transportation centers and borders be monitored and able to communicate threats and receive updates on known or potential terrorists quickly and efficiently. The distributed nature of transportation centers and borders, the inherent difficulty in communicating information geographically, and the large amount of data being collected pose challenges in system design.

We have implemented Staralert, a distributed surveillance application that addresses these issues. The system utilizes a number of camera nodes and one or more collection nodes at each location. The information is made accessible to authorized law enforcement officials through specialized nodes that are permitted to query the system. To manage the large quantities of data, the system uses a distributed storage model. Our solution makes use of a peer-to-peer, scalable, and reliable group communication protocol for efficient communication over very large numbers of nodes distributed over a wide geographic area. The hierarchical structuring of the system, local caching abilities, and distributed data storage model are features that allow our system to be effective in meeting many of the inherent challenges.


StarblabFS

In a world that is increasingly dependent on computers, secure and reliable access to data is of the utmost import. The need for trust in such systems and the vulnerability of such systems to attack provide motivation to develop protocols that are intrusion-tolerant and able to provide critical services even during an attack or in the presence of arbitrary faults, errors, and accidents.

StarblabFS is a userspace mounted file system that uses the gossip-based Starblab or StarblabIT communication protocol to do data transfer. Because StarblabFS is built using the FUSE (File system in USErspace) library, integration with the local operating system is seamless, and has all the features of the native file system, including integration into the graphical user interface. StarblabFS has been implemented and performance testing is underway. StarblabFS is also being tested with IOzone, a file system benchmark tool. Future work will include using StarblabFS as an application over which to perform further fault injection and anomaly detection experiments.


StarblabIT

There are numerous complexities to be considered in designing a survivable application, i.e., one that must continue to provide useful services even in the event of intrusion, malicious attack, human error, faults, or physical damage that may damage some of the underlying system. The need for trust in critical systems and the vulnerability of such systems to attack provide motivation to develop protocols that are intrusion-tolerant and able to provide critical services even during an attack or in the presence of arbitrary faults, errors, and accidents.

StarblabIT is a new gossip-based group communication system that is scalable and resilient to process and communication failures, including arbitrary (Byzantine) faults and malicious attack. We employed a modular approach to the design, and built on prior work that generalizes the transformation of crash-tolerant protocols to Byzantine-tolerant protocols.

The protocol is quite scalable to a large number of nodes and yet provides a high level of secure and reliable delivery properties. Our system is able to operate efficiently by avoiding the need for signatures on most messages as well as the need for extra rounds of message exchange in normal operation. To achieve this, we make use of a chaining technique, in which a single signature can suffice to authenticate multiple messages.

The protocol also serves to validate prior work on transforming a crash fault-tolerant protocol to one that is able to withstand malicious attack, corruption, and arbitrary (Byzantine) faults. Demonstrating that these techniques are applicable to a wide range of protocols serves the general community by adding to the toolkits available to systems designers working in the area of intrusion-tolerant systems.


Fault Injection and Anomaly Detection

It is imperative to develop ways to reason about survivability and to model trustworthy survivable systems. We need to gain a basic understanding of the nature, complexities, and tradeoffs inherent in trustworthy critical systems, and develop models in order to characterize survivability in general and survivable systems in particular. It is critical that we describe properties that characterize intrusion tolerant systems, so that systems can be evaluated with respect to these properties.

Work done in collaboration with Carnegie Mellon University is contributing toward the larger goal of determining a general framework for reasoning about, characterizing, and evaluating survivable distributed systems, in order to render them more predictable and less vulnerable to attack and abuse. This work is providing a greater understanding of how, in the presence of a performance fault at a single node, group communication protocols can cause correlated performance degradations at non-faulty nodes.

Starblab is being used as a testbed for experiments in fault injection and anomaly detection in collaborative work with researchers in the Parallel Data Laboratory at Carnegie Mellon University. Their lab has been investigating the impact of performance-degradation faults on group communication protocols. Previously, their group has done testing on a token-ring protocol and a quorum-based protocol. Our gossip-based protocol is providing an opportunity for further study and comparison. In particular, the hypothesis is that a gossip-based protocol, being less tightly coupled than the other protocols examined, will provide greater fault isolation. Performance tests are currently underway on our local lab, and future work will include testing over Emulab.