- Overview
- Research Agenda
- Participants
- Major Publications
- Funding Source
- Related Links
Overview
The rapid proliferation and integration of computer and network
systems have connected infrastructures to one another in a complex
network of interdependence. Even the network system itself, e.g., the
civilian Internet, is undergoing dramatic changes in the underlying
technologies and services provided, in order to keep up with the
surpassing growth of demands from new users and applications. For
example, when the World Wide Web went into operation it consisted of
two types of interacting machines - the web servers and the client
machines (besides the DNS servers). When a user typed a URL in, the
browser at the client machine fetched the web page and other
associated objects directly from the web server. Over the past decade
however, several other tiers have been layered in, and connected via
the Internet, between the servers and the clients. Today, web
browsers have local caches, institutions maintain cache proxies,
content distribution networks such as Akamai serve data directly to
the clients, and web services are being hosted by server farms. A
recently added layer of complexity is peer to peer web caching,
enabling a client to fetch a cached copy of a web page from another
client on a nearby network.
The upshot of these multiple levels of complexity is that when such a
system is deployed, even incrementally, the end-user (e.g., the
browsing user) may not be delivered the best possible
performance. Under certain circumstances, interdependence and
pathological interaction between these multiple levels could result in
service outages. These outcomes would seem counter-intuitive to the
original motivation behind the development of each of these tiers of
complexity - to improve the end-user experience. Worse still, when
the delivered performance is not acceptable, it is usually very
difficult, if not impossible, for end-users to infer which layer(s)
are at fault.
We believe that end-user networking software that instruments itself
based on lightweight network measurement and diagnosis can address
many of the above concerns. Such intelligent networking software has
the potential to provide the end user with the ``best of all available
worlds'' experience at all times, while not unduly overloading the
network infrastructure itself with probes. In this project, we propose
to develop the following software as a means to validate and explore
this philosophy -
(i) end-to-end network measurement and diagnostics techniques to
diagnose the causes for certain sub-optimal or even abnormal systems
behaviors, allowing corrective actions to be taken in a timely
fashion, and
(ii) on-line systems control mechanisms that leverage measurement and
diagnostics results to configure/tune systems parameters and determine
policies used in the communication subsystems.
Our intention in this project is to develop software solutions for a
variety of networked applications, without the need to change any part
of the established infrastructure.
Research Agenda
(click here for details)
In this project, we aim to address key technical challenges of
measurement/diagnostics-based systems control. Specifically we will
carry out several innovative research tasks along the following two
synergistic R&D thrusts:
(1) Network measurement and diagnostics.
(2) On-line systems control based on network measurement/diagnostics.
Participants
- Dr. Indy Gupta (Affiliated faculty)
- Srikanth Kandula (Ph.D. student, MIT)
- Dr. Jong-Kwon Lee (Postdoc)
Major Publications
- Jennifer Hou and Indy Gupta,
"Design, implementation, and application of intelligent network
diagnostics software," white paper.
Funding Source
Related Links
- User-mode Linux(UML) kernel:
Instrumenting the networking stack of an UML kernel and
having the application agents run within the UML kernel
is an alternative mechanism for network diagnosis.
- tcpdump:
A popular tool used to echo packet information up to and including
payload content, to standard out or a file. The packet
information is gathered from the local network interfaces
after it has been placed in promiscuous mode.
Detailed Description of Research Agenda
(1) Network measurement and diagnostics:
To develop an application-level measurement tool that allows direct
access to fine-grained information across the protocol stack, we will
make the bulk of the kernel networking stack available at the user
level in the form of a library, with instrumentation to deliver
notifications about events of interest (Figure 1). The network library will
export an API through which either systems controllers or other
applications can express their interest in events from a predefined set
and provide the corresponding callback functions. Migrating the
networking stack to the user space avoids reliance on a particular
kernel configuration, or the presence of specific kernel support. It
also allows applications to (a) retrieve protocol stack information
(such as the round trip time and its variation, packet loss ratio and
patterns, various timeout values) at any desirable fine
granularity, and to (b) correlate lower-level network events to
application behaviors that triggered them. Although the notion of a
user-space protocol stack is not new, its use in comprehensive
network measurement and diagnostics has not been explored.

Figure 1: Architecture diagram of the user-level protocol stack
for network diagnostics.
In conjunction with the development of the network diagnostics
library, we will also devise non-intrusive, light-weight measurement
methods to track several network attributes, such as the available
bandwidth, the amount of cross traffic on an end-to-end path, and
packet loss statistics. These methods will be incorporated, along
with existing traceroute, ping, tcpdump, and pathrate tools, into the
user-space network diagnostics library, in order to provide another
dimension of network information.
With the abundant information available from the network library and
network attributes measured by the end-to-end measurement methods, we
will develop statistics-based strategies for comprehensive network
diagnostics, and will describe the components of an open-source
software system for networked systems diagnostics.
(2) On-line systems control based on network measurement/diagnostics:
Systems control encompasses a wide spectrum of operations, ranging
from parameter turning, to selection of alternative
algorithms/mechanisms, and to replacement/reconfiguration of
network/systems components/modules. These operations may be performed
in different layers across the protocol stack. As a proof of concept,
we will demonstrate empirically the use of
measurement/diagnostics-based control in computing a scheduling delay
in IEEE 802.11 to improve the system capacity, in devising a smart
browser that improves the user response time based on network
measurement and diagnostics results, and in general, facilitating
decentralized distributed system design that is robust, adaptive and
responsive. In particular, we will leverage the network measurement
and diagnostics system to build on the membership protocol, SWIM, and
the distributed resource location and discovery system, Kelips.
Protocols for membership maintenance and distributed resource location
are required by many distributed applications (e.g., cooperative web
caching), and decentralized solutions that are robust and responsive
can continually ensure that the user gets the ``best of all worlds''
performance. We will carefully consider mechanisms to ``lock'' the
networked systems into a desired equilibrium state in spite of abrupt
changes in systems workload. We will carry out experiments on
clustered testbeds, e.g., PlanetLab (http://www.planet-lab.org).
Representative deliverables will be (i) a cooperative (peer-to-peer)
web caching software that clients can use to fetch web objects from
each other, as well as to provide feedback to the other tiers (e.g.,
content distribution networks); and (ii) a smart browser takes
performance-related actions (e.g., deploying different versions of
HTTP) based on client-perceived Web response time.
|