The idea that you can build your own supercomputer using off-the-shelf
components and even old PCs may seem far-fetched, but it's happening today with
Linux clusters.
A cluster is a group of computers bound together. The computers in the
cluster work together, usually to solve mathematical problems or to provide
support in case some computers in the cluster fail.
In the different types of clusters, the main distinction is how tightly you
couple the computers. Many operating systems support clustering, including VMS,
Windows 2000, and Linux, where the interest in clustering has surged in recent
months.
Economics is a key factor leading to the popularity of Linux clustering. When
scientific labs request bids for new systems, most want as many COTS (commercial
off-the-shelf), systems as possible. The rationale is that commercially available
and commercially supported systems will be cheaper: In the short run with the
initial purchase, and in the long run with the ability to add computers to the
cluster. And this rationale has proven true.
Clusters of off-the-shelf PCs can often meet or exceed the performance of
terribly expensive commercial supercomputers, at one-third to one-tenth of the
price. Linux has proven popular as an operating system, given that it's free.
This means that the main costs are for the computer and networking hardware. And,
using off-the-shelf components and PCs means that the hardware costs are lower
than if you purchased specialized equipment.
Most of these new Linux clusters follow a setup pioneered at the Beowulf
project. The Beowulf project provides
extensive information on how you can set up your own cluster using off-the-shelf
components and free software.
The Beowulf software environment itself is distributed as a set of software
patches and add-ons that work with most Linux distributions, although the Red Hat
distribution seems to be a common base.
You can also download the source code to all these patches and add-ons, or
download the environment as a set of prebuilt binary packages, in RPM (Red Hat
Package Manager) format, for Red Hat Linux systems.
Beowulf is a type of cluster that isn't as tightly coupled as parallel
computers (such as the parallel Cray systems) but is more tightly coupled than
plain old networks of workstations. Unlike networks of workstations, each
computer in a Beowulf cluster is dedicated to the tasks performed by the cluster.
That means each computer in the cluster only performs work assigned by the
cluster, instead of providing a general-use computer. Typically, the computers in
a Beowulf cluster are stored in a server room and you access the entire cluster
through a single machine. In a more loosely coupled network of workstations, each
computer on the network can participate in shared tasks as well as perform the
tasks done by normal computers, such as word processing.
On top of the Beowulf cluster, you can run parallel-processing software such
as Parallel Virtual Machine (PVM) or Message Passing Interface (MPI)
technologies. Both PVM and MPI provide ways for parts of a parallel system to
communicate with the other parts.
MPI is a standard used on most parallel supercomputers. An implementation of
MPI called local-area multicomputer (LAM) is highly popular in the Beowulf
community. LAM is available from the University of Notre Dame.
The Beowulf type of cluster works best for problems that can be divided into
small pieces for individual computation-in other words, the types of problems
that work well for parallel computing. These problems include predicting the
weather, rendering images from computer geometry, and molecular analysis.
Traditional software, such as relational databases or Web servers, won't get
any speed up from a Beowulf cluster without a lot of modifications to the
programs' source code. You need an application that has been written for a
parallel computing environment to really make use of Beowulf, and these
applications, while highly specialized, are readily available, especially at
academic centers.
A number of academic centers use Beowulf clusters, including the University of Minnesota at
Duluth, and the Physics department at the University
of Wisconsin-Milwaukee.
The UW-M work, for example, is devoted to the detection and study of
gravitational waves. Its Beowulf cluster analyzes gravitational data, a
computationally intensive task. Interestingly, UW-M decided on using a Beowulf
cluster after a benchmark test showed that these clusters are the most
cost-effective way to analyze the data.
Interest in commercial usage also has grown, with companies such as Doubleclick running Beowulf clusters to
analyze vast quantities of data mined from user behavior.
Part of the surge in interest comes from how well Beowulf clusters seem to
perform with very low hardware costs. But interest in Beowulf clusters really
exploded in early 1999 when IBM demonstrated a cluster of 17 Netfinity servers
(with a total of 36 Pentium II processors) and off-the-shelf versions of Red Hat
Linux.
This cluster tied the performance of a parallel Cray T3t-900-AC64
supercomputer on the POVRay benchmark, a ray-tracing benchmark used to test the
speed of rendering images. The total cost of the IBM system was about $150,000,
compared to $5.5 million for the Cray system. With these results, the economics
of Beowulf clusters became clear.
Looking at the POVRay benchmark
results, nine of the top 10 systems run Linux. (The aforementioned Cray T3,
tied for second place, runs the UNICOS operating system.)
In addition to the POVRay benchmark, Beowulf clusters have performed well on
the Linpack Benchmark, one of many benchmarks
you can run against computers or clusters of computers. According to the site,
which lists the top 500 supercomputers based on their performance on the Linpack
Benchmark, a number of supercomputers on the list are Beowulf clusters. Interest
has grown so much that there's even a Web site devoted to Beowulf news.
With all this interest, a number of companies are trying to get in on the
action. Compaq, which inherited the Alpha processor when it purchased Digital
Equipment, has provided a lot of support for Linux clusters, especially clusters
of systems using Compaq's
Alpha processors.
Furthermore, Compaq produced a special Cluster
Management Utility that allows users to more easily manage Beowulf clusters.